Python 熊猫:将日期“对象”转换为整数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50863691/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:37:18  来源:igfitidea点击:

Pandas: convert date 'object' to int

pythonpandastype-conversion

提问by jabba

I have a Pandas dataframe and I need to convert a column with dates to int but unfortunately all the given solutions end up with errors (below)

我有一个 Pandas 数据框,我需要将带有日期的列转换为 int 但不幸的是所有给定的解决方案最终都出现错误(如下)

test_df.info()

<class 'pandas.core.frame.DataFrame'>
Data columns (total 4 columns):
Date        1505 non-null object
Avg         1505 non-null float64
TotalVol    1505 non-null float64
Ranked      1505 non-null int32
dtypes: float64(2), int32(1), object(1) 

sample data:

样本数据:

    Date        Avg             TotalVol  Ranked
0   2014-03-29  4400.000000     0.011364    1
1   2014-03-30  1495.785714     4.309310    1
2   2014-03-31  1595.666667     0.298571    1
3   2014-04-01  1523.166667     0.270000    1
4   2014-04-02  1511.428571     0.523792    1

I think that I've tried everything but nothing works

我想我已经尝试了一切,但没有任何效果

test_df['Date'].astype(int):

TypeError: int() argument must be a string, a bytes-like object or a number, not 'datetime.date'

类型错误:int() 参数必须是字符串、类似字节的对象或数字,而不是“datetime.date”

test_df['Date']=pd.to_numeric(test_df['Date']):

TypeError: Invalid object type at position 0

类型错误:位置 0 处的对象类型无效

test_df['Date'].astype(str).astype(int):

ValueError: invalid literal for int() with base 10: '2014-03-29'

ValueError:int() 的无效文字,基数为 10:'2014-03-29'

test_df['Date'].apply(pd.to_numeric, errors='coerce'):

Converts the entire column to NaNs

将整列转换为 NaN

回答by Neroksi

The reason why test_df['Date'].astype(int)gives you an error is that your dates still contain hyphens "-". First suppress them by doing test_df['Date'].str.replace("-",""), then you can apply your first method to the resulting series. So the whole solution would be :

test_df['Date'].astype(int)给你一个错误的原因是你的日期仍然包含连字符“ -”。首先通过执行 来抑制它们test_df['Date'].str.replace("-",""),然后您可以将第一种方法应用于结果系列。所以整个解决方案是:

test_df['Date'].str.replace("-","").astype(int)Note that this won't work if your "Date" column is not a string object, typically when Pandas has already parsed your series as TimeStamp. In this case you can use :

test_df['Date'].str.replace("-","").astype(int)请注意,如果您的“日期”列不是字符串对象,这将不起作用,通常是当 Pandas 已经将您的系列解析为时间戳时。在这种情况下,您可以使用:

test_df['Date'].dt.strftime("%Y%m%d").astype(int)

回答by Rakesh

Looks like you need pd.to_datetime().dt.strftime("%Y%m%d").

看起来你需要pd.to_datetime().dt.strftime("%Y%m%d").

Demo:

演示:

import pandas as pd
df = pd.DataFrame({"Date": ["2014-03-29", "2014-03-30", "2014-03-31"]})
df["Date"] = pd.to_datetime(df["Date"]).dt.strftime("%Y%m%d")
print( df )

Output:

输出:

       Date
0  20140329
1  20140330
2  20140331

回答by msolomon87

This should work

这应该工作

df['Date'] = pd.to_numeric(df.Date.str.replace('-',''))
print(df['Date'])
0    20140329
1    20140330
2    20140331
3    20140401
4    20140402