Python 从 SQL 数据库导入表并按日期过滤行时,将 Pandas 列解析为日期时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16412099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date
提问by Nyxynyx
I have a DataFramewith column named date. How can we convert/parse the 'date' column to a DateTimeobject?
我有一个DataFrame名为date. 我们如何将“日期”列转换/解析为DateTime对象?
I loaded the date column from a Postgresql database using sql.read_frame(). An example of the datecolumn is 2013-04-04.
我使用sql.read_frame(). 该date列的一个示例是2013-04-04。
What I am trying to do is to select all rows in a dataframe that has their date columns within a certain period, like after 2013-04-01and before 2013-04-04.
我想要做的是选择数据框中在特定时间段内具有日期列的所有行,例如 after2013-04-01和 before 2013-04-04。
My attempt below gives the error 'Series' object has no attribute 'read'
我在下面的尝试给出了错误 'Series' object has no attribute 'read'
Attempt
试图
import dateutil
df['date'] = dateutil.parser.parse(df['date'])
Error
错误
AttributeError Traceback (most recent call last)
<ipython-input-636-9b19aa5f989c> in <module>()
15
16 # Parse 'Date' Column to Datetime
---> 17 df['date'] = dateutil.parser.parse(df['date'])
18
19 # SELECT RECENT SALES
C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(timestr, parserinfo, **kwargs)
695 return parser(parserinfo).parse(timestr, **kwargs)
696 else:
--> 697 return DEFAULTPARSER.parse(timestr, **kwargs)
698
699
C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
299 default = datetime.datetime.now().replace(hour=0, minute=0,
300 second=0, microsecond=0)
--> 301 res = self._parse(timestr, **kwargs)
302 if res is None:
303 raise ValueError, "unknown string format"
C:\Python27\lib\site-packages\dateutil\parser.pyc in _parse(self, timestr, dayfirst, yearfirst, fuzzy)
347 yearfirst = info.yearfirst
348 res = self._result()
--> 349 l = _timelex.split(timestr)
350 try:
351
C:\Python27\lib\site-packages\dateutil\parser.pyc in split(cls, s)
141
142 def split(cls, s):
--> 143 return list(cls(s))
144 split = classmethod(split)
145
C:\Python27\lib\site-packages\dateutil\parser.pyc in next(self)
135
136 def next(self):
--> 137 token = self.get_token()
138 if token is None:
139 raise StopIteration
C:\Python27\lib\site-packages\dateutil\parser.pyc in get_token(self)
66 nextchar = self.charstack.pop(0)
67 else:
---> 68 nextchar = self.instream.read(1)
69 while nextchar == '\x00':
70 nextchar = self.instream.read(1)
AttributeError: 'Series' object has no attribute 'read'
df['date'].apply(dateutil.parser.parse)gives me the error AttributeError: 'datetime.date' object has no attribute 'read'
df['date'].apply(dateutil.parser.parse)给我错误 AttributeError: 'datetime.date' object has no attribute 'read'
df['date'].truncate(after='2013/04/01')gives the error TypeError: can't compare datetime.datetime to long
df['date'].truncate(after='2013/04/01')给出错误 TypeError: can't compare datetime.datetime to long
df['date'].dtypereturns dtype('O'). Is it already a datetimeobject?
df['date'].dtype返回dtype('O')。它已经是一个datetime对象了吗?
采纳答案by Ryan Saxe
pandas already reads that as a datetimeobject! So what you want is to select rows between two dates and you can do that by masking:
pandas 已经将其读取为datetime对象!所以你想要的是在两个日期之间选择行,你可以通过屏蔽来做到这一点:
df_masked = df[(df.date > '2012-04-01') & (df.date < '2012-04-04')]
Because you said that you were getting an error from the string for some reason, try this:
因为您说由于某种原因从字符串中收到错误,请尝试以下操作:
df_masked = df[(df.date > datetime.date(2012,4,1)) & (df.date < datetime.date(2012,4,4))]
回答by herrfz
You probably need apply, so something like:
你可能需要apply,所以像:
df['date'] = df['date'].apply(dateutil.parser.parse)
Without an example of the column I can't guarantee this will work, but something in that direction should help you to carry on.
如果没有专栏的例子,我不能保证这会奏效,但是在这个方向上的一些东西应该可以帮助你继续。
回答by ryzhiy
You should iterate over the items and parse them independently, then construct a new list.
您应该遍历项目并独立解析它们,然后构建一个新列表。
df['date'] = [dateutil.parser.parse(x) for x in df['date']]
回答by Keith
Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query.
Pandas 知道对象日期时间,但是当您使用某些导入函数时,它被视为字符串。因此,您需要做的是确保将列设置为日期时间类型而不是字符串。然后你就可以进行查询了。
df['date'] = pd.to_datetime(df['date'])
df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))]
回答by jpp
Don't confuse datetime.datewith Pandas pd.Timestamp
不要datetime.date与熊猫混淆pd.Timestamp
A "Pandas datetimeseries" contains pd.Timestampelements, notdatetime.dateelements. The recommended solution for Pandas:
“熊猫datetime系列”包含pd.Timestamp元素,而不是datetime.date元素。推荐的 Pandas 解决方案:
s = pd.to_datetime(s) # convert series to Pandas
mask = s > '2018-03-10' # calculate Boolean mask against Pandas-compatible object
The top answers have issues:
顶级答案有问题:
- @RyanSaxe's accepted answer's first attempt doesn't work; the second answer is inefficient.
- As of Pandas v0.23.0, @Keith's highly upvoted answer doesn't work; it gives
TypeError.
- @RyanSaxe 接受的答案的第一次尝试不起作用;第二个答案是低效的。
- 从 Pandas v0.23.0 开始,@Keith 高度赞成的答案不起作用;它给
TypeError.
Any good Pandas solution mustensure:
任何好的 Pandas 解决方案都必须确保:
- The series is a Pandas
datetimeseries, notobjectdtype. - The
datetimeseries is compared to a compatible object, e.g.pd.Timestamp, or string in the correct format.
- 该系列是 Pandas
datetime系列,而不是objectdtype。 - 将该
datetime系列与兼容对象(例如pd.Timestamp,或正确格式的字符串)进行比较。
Here's a demo with benchmarking, demonstrating that the one-off cost of conversion can be immediately offset by a single operation:
这是一个带有基准测试的演示,演示了一次性的转换成本可以通过单个操作立即抵消:
from datetime import date
L = [date(2018, 1, 10), date(2018, 5, 20), date(2018, 10, 30), date(2018, 11, 11)]
s = pd.Series(L*10**5)
a = s > date(2018, 3, 10) # accepted solution #2, inefficient
b = pd.to_datetime(s) > '2018-03-10' # more efficient, including datetime conversion
assert a.equals(b) # check solutions give same result
%timeit s > date(2018, 3, 10) # 40.5 ms
%timeit pd.to_datetime(s) > '2018-03-10' # 33.7 ms
s = pd.to_datetime(s)
%timeit s > '2018-03-10' # 2.85 ms

