Python 熊猫从日期中获取年龄(例如:出生日期)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26788854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas get the age from a date (example: date of birth)
提问by Dave
How can I calculate the age of a person (based off the dob column) and add a column to the dataframe with the new value?
如何计算一个人的年龄(基于 dob 列)并使用新值向数据框中添加一列?
dataframe looks like the following:
数据框如下所示:
lname fname dob
0 DOE LAURIE 03011979
1 BOURNE JASON 06111978
2 GRINCH XMAS 12131988
3 DOE JOHN 11121986
I tried doing the following:
我尝试执行以下操作:
now = datetime.now()
df1['age'] = now - df1['dob']
But, received the following error:
但是,收到以下错误:
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'str'
类型错误:不支持的操作数类型 -:'datetime.datetime' 和 'str'
采纳答案by unutbu
import datetime as DT
import io
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = 'warn'
content = ''' ssno lname fname pos_title ser gender dob
0 23456789 PLILEY JODY BUDG ANAL 0560 F 031871
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F 120852
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F 010999
3 345678912 MANNING CYNTHIA SOC SCNTST 0101 F 081692
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 0326 F 031387'''
df = pd.read_csv(io.StringIO(content), sep='\s{2,}')
df['dob'] = df['dob'].apply('{:06}'.format)
now = pd.Timestamp('now')
df['dob'] = pd.to_datetime(df['dob'], format='%m%d%y') # 1
df['dob'] = df['dob'].where(df['dob'] < now, df['dob'] - np.timedelta64(100, 'Y')) # 2
df['age'] = (now - df['dob']).astype('<m8[Y]') # 3
print(df)
yields
产量
ssno lname fname pos_title ser gender \
0 23456789 PLILEY JODY BUDG ANAL 560 F
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F
3 345678912 MANNING CYNTHIA SOC SCNTST 101 F
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 326 F
dob age
0 1971-03-18 00:00:00 43
1 1952-12-08 18:00:00 61
2 1999-01-09 00:00:00 15
3 1992-08-16 00:00:00 22
4 1987-03-13 00:00:00 27
- It looks like your
dobcolumn are currently strings. First, convert them toTimestampsusingpd.to_datetime. - The format
'%m%d%y'converts the last two digits to years, but unfortunately assumes52means 2052. Since that's probably not Heather Noel's birthyear, let's subtract 100 years fromdobwhenever thedobis greater thannow. You may want to subtract a few years tonowin the conditiondf['dob'] < nowsince it may be slightly more likely to have a 101 year old worker than a 1 year old worker... - You can subtract
dobfromnowto obtain timedelta64[ns]. To convert that to years, useastype('<m8[Y]')orastype('timedelta64[Y]').
- 看起来您的
dob列当前是字符串。首先,将它们转换为Timestamps使用pd.to_datetime. - 该格式
'%m%d%y'将最后两位数字转换为年份,但不幸的是假定52意味着 2052。由于这可能不是 Heather Noel 的出生年份,让我们从大于 的dob任何时候减去 100 年。您可能想要减去几年的条件,因为 101 岁的工人比 1 岁的工人更有可能……dobnownowdf['dob'] < now - 您可以减去
dob从now获得timedelta64 [NS] 。要将其转换为年,请使用astype('<m8[Y]')或astype('timedelta64[Y]')。
回答by Brandon Humpert
First thought is that your years are two digit, which is a not great choice in this day and age. In any case, I'm going to assume that all years like 05are actually 1905. This is probably not correct(!) but coming up with the right rule is going to depend a lot on your data.
第一个想法是你的年龄是两位数,这在这个时代不是一个很好的选择。无论如何,我将假设所有年份05实际上都是1905. 这可能不正确(!)但是提出正确的规则将在很大程度上取决于您的数据。
from datetime import date
def age(date1, date2):
naive_yrs = date2.year - date1.year
if date1.replace(year=date2.year) > date2:
correction = -1
else:
correction = 0
return naive_yrs + correction
df1['age'] = df1['dob'].map(lambda x: age(date(int('19' + x[-2:]), int(x[:2]), int(x[2:-2])), date.today()))
回答by nnaqa
I found easier solution:
我找到了更简单的解决方案:
import pandas as pd
from datetime import datetime
from datetime import date
d = {'col0': [1, 2, 6], 'col1': [3, 8, 3], 'col2': ['17.02.1979',
'11.11.1993',
'01.08.1961']}
df = pd.DataFrame(data=d)
def calculate_age(born):
born = datetime.strptime(born, "%d.%m.%Y").date()
today = date.today()
return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
df['age'] = df['col6'].apply(calculate_age)
print(df)
output:
输出:
col0 col1 col3 age
0 1 3 17.02.1979 39
1 2 8 11.11.1993 24
2 6 3 01.08.1961 57
回答by cs95
# Data setup
df
lname fname dob
0 DOE LAURIE 1979-03-01
1 BOURNE JASON 1978-06-11
2 GRINCH XMAS 1988-12-13
3 DOE JOHN 1986-11-12
# Make sure to parse all datetime columns in advance
df['dob'] = pd.to_datetime(df['dob'], errors='coerce')
If you want only the year portion of the age, use @unutbu's solution. . .
如果您只想要年龄的年份部分,请使用@unutbu 的解决方案。. .
now = pd.to_datetime('now')
now
# Timestamp('2019-04-14 00:00:43.105892')
(now - df['dob']).astype('<m8[Y]')
0 40.0
1 40.0
2 30.0
3 32.0
Name: dob, dtype: float64
Another option is to subtract the year portion and account for the month difference using
另一种选择是减去年份部分并使用
(now.year - df['dob'].dt.year) - ((now.month - df['dob'].dt.month) < 0)
0 40
1 40
2 30
3 32
Name: dob, dtype: int64
If you want the (almost) precise age (including the fractional portion), query total_secondsand divide.
如果您想要(几乎)精确的年龄(包括小数部分),请查询total_seconds并除以。
(now - df['dob']).dt.total_seconds() / (60*60*24*365.25)
0 40.120446
1 40.840501
2 30.332630
3 32.418872
Name: dob, dtype: float64

