Python 删除数据类型日期时间 NaT
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25141789/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove dtype datetime NaT
提问by user2643394
I am preparing a pandas df for output, and would like to remove the NaN and NaT in the table, and leave those table locations blank. An example would be
我正在准备用于输出的 Pandas df,并希望删除表中的 NaN 和 NaT,并将这些表位置留空。一个例子是
mydataframesample
col1 col2 timestamp
a b 2014-08-14
c NaN NaT
would become
会成为
col1 col2 timestamp
a b 2014-08-14
c
Most of the values are dtypes object, with the timestamp column being datetime64[ns]. In order to fix this, I attempted to use panda's mydataframesample.fillna(' ')to effectively leave a space in the location. However, this doesn't work with the datetime types. In order to get around this, I'm trying to convert the timestamp column back to object or string type.
大多数值是 dtypes 对象,时间戳列是 datetime64[ns]。为了解决这个问题,我尝试使用熊猫mydataframesample.fillna(' ')来有效地在该位置留出空间。但是,这不适用于日期时间类型。为了解决这个问题,我试图将时间戳列转换回对象或字符串类型。
Is it possible to remove the NaN/NaT without doing the type conversion? If not, how do I do the type conversion (tried str() and astype(str) but difficulty with datetime being the original format)?
是否可以在不进行类型转换的情况下删除 NaN/NaT?如果没有,我该如何进行类型转换(尝试过 str() 和 astype(str) 但日期时间作为原始格式的困难)?
采纳答案by unutbu
This won't win any speed awards, but if the DataFrame is not too long, reassignment using a list comprehension will do the job:
这不会赢得任何速度奖,但如果 DataFrame 不是太长,使用列表理解重新分配将完成这项工作:
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]
import numpy as np
import pandas as pd
Timestamp = pd.Timestamp
nan = np.nan
NaT = pd.NaT
df1 = pd.DataFrame({
'col1': list('ac'),
'col2': ['b', nan],
'date': (Timestamp('2014-08-14'), NaT)
})
df1['col2'] = df1['col2'].fillna('')
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]
print(df1)
yields
产量
col1 col2 date
0 a b 2014-08-14
1 c
回答by chrisb
@unutbu's answer will work fine, but if you don't want to modify the DataFrame, you could do something like this. to_htmltakes a parameter for how NaNis represented, to handle the NaTyou need to pass a custom formatting function.
@unutbu 的答案会正常工作,但如果您不想修改 DataFrame,则可以执行以下操作。 to_html接受一个NaN表示如何表示的参数,以处理NaT您需要传递自定义格式功能的问题。
date_format = lambda d : pd.to_datetime(d).strftime('%Y-%m-%d') if not pd.isnull(d) else ''
df1.to_html(na_rep='', formatters={'date': date_format})
回答by Jeff
If all you want to do is convert to a string:
如果您只想转换为字符串:
In [37]: df1.to_csv(None,sep=' ')
Out[37]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c \n'
To replace missing values with a string
用字符串替换缺失值
In [36]: df1.to_csv(None,sep=' ',na_rep='missing_value')
Out[36]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c missing_value missing_value\n'
回答by Alexander McFarlane
I had the same issue: This does it all in place using pandas apply function. Should be the fastest method.
我遇到了同样的问题:这一切都使用 pandas apply 函数完成。应该是最快的方法。
import pandas as pd
df['timestamp'] = df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')
if your timestamp field is not yet in datetimeformat then:
如果您的时间戳字段尚未datetime格式化,则:
import pandas as pd
df['timestamp'] = pd.to_datetime(df['timestamp']).apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')

