python-pandas-检查数据框中是否存在日期
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39893420/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python - pandas - check if date exists in dataframe
提问by Leo
I have a dataframe like this:
我有一个这样的数据框:
category date number
0 Cat1 2010-03-01 1
1 Cat2 2010-09-01 1
2 Cat3 2010-10-01 1
3 Cat4 2010-12-01 1
4 Cat5 2012-04-01 1
5 Cat2 2013-02-01 1
6 Cat3 2013-07-01 1
7 Cat4 2013-11-01 2
8 Cat5 2014-11-01 5
9 Cat2 2015-01-01 1
10 Cat3 2015-03-01 1
I would like to check if a date is exist in this dataframe but I am unable to. I tried various ways as below but still no use:
我想检查此数据框中是否存在日期,但我无法检查。我尝试了以下各种方法,但仍然没有用:
if pandas.Timestamp("2010-03-01 00:00:00", tz=None) in df['date'].values:
print 'date exist'
if datetime.strptime('2010-03-01', '%Y-%m-%d') in df['date'].values:
print 'date exist'
if '2010-03-01' in df['date'].values:
print 'date exist'
The 'date exist' never got printed. How could I check if the date exist? Because I want to insert the none-existed date with number equals 0 to all the categories so that I could plot a continuously line chart (one category per line). Help is appreciated. Thanks in advance.
“存在日期”从未被打印出来。我如何检查日期是否存在?因为我想在所有类别中插入数字等于 0 的不存在日期,以便我可以绘制连续折线图(每行一个类别)。帮助表示赞赏。提前致谢。
The last one gives me this:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
And the date exist
not get printed.
最后一个给了我这个:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
而date exist
不是被打印出来。
采纳答案by jezrael
I think you need convert to datetime first by to_datetime
and then if need select all rows use boolean indexing
:
我认为您需要先转换为日期时间to_datetime
,然后如果需要选择所有行,请使用boolean indexing
:
df.date = pd.to_datetime(df.date)
print (df.date == pd.Timestamp("2010-03-01 00:00:00"))
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
Name: date, dtype: bool
print (df[df.date == pd.Timestamp("2010-03-01 00:00:00")])
category date number
0 Cat1 2010-03-01 1
For return True
use check value converted to numpy array
by values
:
对于返回True
使用校验值转换为numpy array
通过values
:
if ('2010-03-01' in df['date'].values):
print ('date exist')
Or at least one True
by any
as comment Edchum:
if (df.date == pd.Timestamp("2010-03-01 00:00:00")).any():
print ('date exist')
回答by JoseGzz
For example, to cofirm that the 4th value of ds
is contained within itself:
例如,要确认 的第 4 个值ds
包含在其自身中:
len(set(ds.isin([ds.iloc[3]]))) > 1
Let ds
be a Pandas DataSeries of the form [index, pandas._libs.tslib.Timestamp] with example values:
让我们ds
成为一个带有示例值的 [index, pandas._libs.tslib.Timestamp] 形式的 Pandas DataSeries:
0 2018-01-31 19:08:27.465515
1 2018-02-01 19:08:27.465515
2 2018-02-02 19:08:27.465515
3 2018-02-03 19:08:27.465515
4 2018-02-04 19:08:27.465515
0 2018-01-31 19:08:27.465515
1 2018-02-01 19:08:27.465515
2 2018-02-02 19:08:27.465515
3 2018-02-03 19:08:27.465515
4 2018-02-04 19:08:27.465515
Then, we use the isin
local method to get a DataSeries of booleans where each entry indicates wether that position in ds
matches with the value passed as argument to the function (since isin
expects a list of values we need to provide the value in list format).
然后,我们使用isin
本地方法获取布尔值的 DataSeries,其中每个条目指示该位置是否ds
与作为参数传递给函数的值匹配(因为isin
需要一个值列表,我们需要以列表格式提供值)。
Next, we use the set
global method as to get a set with 1 or 2 values depending on wether there was a match (True and False values) or not (only a False value).
接下来,我们使用set
全局方法获取具有 1 或 2 个值的集合,具体取决于是否存在匹配(True 和 False 值)或不匹配(只有 False 值)。
Finally, we check if the set contains more than 1 value, if that is the case, it means we have a match, and no match otherwise.
最后,我们检查集合是否包含超过 1 个值,如果是这样,则表示我们有匹配项,否则就没有匹配项。