Python 删除数据类型日期时间 NaT

Question

提问by user2643394

I am preparing a pandas df for output, and would like to remove the NaN and NaT in the table, and leave those table locations blank. An example would be

我正在准备用于输出的 Pandas df，并希望删除表中的 NaN 和 NaT，并将这些表位置留空。一个例子是

mydataframesample 

col1    col2     timestamp
a       b        2014-08-14
c       NaN      NaT

would become

会成为

col1    col2     timestamp
a       b        2014-08-14
c

Most of the values are dtypes object, with the timestamp column being datetime64[ns]. In order to fix this, I attempted to use panda's mydataframesample.fillna(' ')to effectively leave a space in the location. However, this doesn't work with the datetime types. In order to get around this, I'm trying to convert the timestamp column back to object or string type.

大多数值是 dtypes 对象，时间戳列是 datetime64[ns]。为了解决这个问题，我尝试使用熊猫mydataframesample.fillna(' ')来有效地在该位置留出空间。但是，这不适用于日期时间类型。为了解决这个问题，我试图将时间戳列转换回对象或字符串类型。

Is it possible to remove the NaN/NaT without doing the type conversion? If not, how do I do the type conversion (tried str() and astype(str) but difficulty with datetime being the original format)?

是否可以在不进行类型转换的情况下删除 NaN/NaT？如果没有，我该如何进行类型转换（尝试过 str() 和 astype(str) 但日期时间作为原始格式的困难）？

Answer 1

采纳答案by unutbu

This won't win any speed awards, but if the DataFrame is not too long, reassignment using a list comprehension will do the job:

这不会赢得任何速度奖，但如果 DataFrame 不是太长，使用列表理解重新分配将完成这项工作：

df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]

import numpy as np
import pandas as pd
Timestamp = pd.Timestamp
nan = np.nan
NaT = pd.NaT
df1 = pd.DataFrame({
    'col1': list('ac'),
    'col2': ['b', nan],
    'date': (Timestamp('2014-08-14'), NaT)
    })

df1['col2'] = df1['col2'].fillna('')
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]

print(df1)

yields

产量

  col1 col2        date
0    a    b  2014-08-14
1    c

Answer 2

回答by chrisb

@unutbu's answer will work fine, but if you don't want to modify the DataFrame, you could do something like this. to_htmltakes a parameter for how NaNis represented, to handle the NaTyou need to pass a custom formatting function.

@unutbu 的答案会正常工作，但如果您不想修改 DataFrame，则可以执行以下操作。 to_html接受一个NaN表示如何表示的参数，以处理NaT您需要传递自定义格式功能的问题。

date_format = lambda d : pd.to_datetime(d).strftime('%Y-%m-%d') if not pd.isnull(d) else ''

df1.to_html(na_rep='', formatters={'date': date_format})

Answer 3

回答by Jeff

If all you want to do is convert to a string:

如果您只想转换为字符串：

In [37]: df1.to_csv(None,sep=' ')
Out[37]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c  \n'

To replace missing values with a string

用字符串替换缺失值

In [36]: df1.to_csv(None,sep=' ',na_rep='missing_value')
Out[36]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c missing_value missing_value\n'

Answer 4

回答by Alexander McFarlane

I had the same issue: This does it all in place using pandas apply function. Should be the fastest method.

我遇到了同样的问题：这一切都使用 pandas apply 函数完成。应该是最快的方法。

import pandas as pd
df['timestamp'] = df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')

if your timestamp field is not yet in datetimeformat then:

如果您的时间戳字段尚未datetime格式化，则：

import pandas as pd
df['timestamp'] = pd.to_datetime(df['timestamp']).apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')

Python 删除数据类型日期时间 NaT

提问by user2643394

采纳答案by unutbu

回答by chrisb

回答by Jeff

回答by Alexander McFarlane

相关推荐

最近更新

标签

Python 删除数据类型日期时间 NaT

提问by user2643394

采纳答案by unutbu

回答by chrisb

回答by Jeff

回答by Alexander McFarlane

相关推荐

Python Django反向查找外键

如何永远运行 Python 程序？

Python Matplotlib pyplot 轴格式化程序

Python 无法比较天真和有意识的 datetime.now() <= challenge.datetime_end

相关推荐

最近更新

标签