pandas 熊猫从日期计算年龄

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26897098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:39:36  来源:igfitidea点击:

Pandas calculating age from a date

pythonnumpypandas

提问by Dave

I really need help with this one. My previous post was very bad and unclear - I'm sorry - I wish I could delete but hopefully this one will be better.

我真的需要这方面的帮助。我之前的帖子非常糟糕而且不清楚 - 对不起 - 我希望我可以删除但希望这个会更好。

I need to calculate the age based off of a date (see ANALYZE section and FINAL OUTCOME SECTION).

我需要根据日期计算年龄(请参阅分析部分和最终结果部分)。

ORIGINAL DATA SET

原始数据集

"JOLIE", 09091959,02051983
"PORTMAN",02111979,01272002
"MOORE", 01281975,01182009
"BEST", 04081973,07022008
"MONROE", 04161957,11231979

LOAD DATA

加载数据

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd

columns = ['lname','dob','scd_csr_mdy']

raw_data = pd.read_csv(r'C:\Users\davidlopez\Desktop\Folders\Standard Reports\HR Reports\eeprofil  \eeprofil.txt',` 
                       names=columns, parse_dates = ['dob','scd_csr_mdy'})

df1 = raw_data

In [1]: df1
Out [1]:

         lname          dob          scd_csr_mdy
    0    JOLIE          09091959     02051983
    1    PORTMAN        02111979     01272002
    2    MOORE          01281975     01182009
    3    BEST           04081973     07022008
    4    MONROE         04161957     11231979

ANALYZE

分析

I tried doing the following but received an error:

我尝试执行以下操作但收到错误消息:

now = datetime.now()
df1['age'] = now - df1['dob']

But I received the the error:

但我收到了错误:

TypeError:  unsported operant type(S) for -: 'datetime.datetime' and 'str'

FINAL OUTCOME

最终结果

     lname          dob          scd_csr_mdy    DOB_AGE     SCD_AGE
0    JOLIE          09091959     02051983       55          32
1    PORTMAN        02111979     01272002       36          13
2    MOORE          01281975     01182009       40          6
3    BEST           04081973     07022008       42          6
4    MONROE         04161957     11231979       58          35

Any suggestions.....?

有什么建议.....?

回答by user308827

Convert the dob column from string to a datetime object

将 dob 列从字符串转换为日期时间对象

df1['dob'] = pd.to_datetime(df1['dob'])
now = datetime.now()    
df1['age'] = now - df1['dob']

回答by Tomasz S?ota

Convert string to datetime with format

使用格式将字符串转换为日期时间

df1['age'] = now - datetime.strptime(df1['dob'], "%m%d%Y")

回答by acushner

if there's not too many entries, you can just do something like:

如果没有太多条目,您可以执行以下操作:

df['dob'] = df.dob.apply(lambda d: pd.to_datetime(d[-4:] + d[:4]))
now - df.dob