Python 总结每天熊猫的出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17706109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
summing the number of occurrences per day pandas
提问by myusuf3
I have a data set like so in a pandas dataframe:
我在熊猫数据框中有一个像这样的数据集:
score
timestamp
2013-06-29 00:52:28+00:00 -0.420070
2013-06-29 00:51:53+00:00 -0.445720
2013-06-28 16:40:43+00:00 0.508161
2013-06-28 15:10:30+00:00 0.921474
2013-06-28 15:10:17+00:00 0.876710
I need to get counts for the number of measurements, that occur so I am looking for something like this:
我需要计算发生的测量次数,所以我正在寻找这样的东西:
count
timestamp
2013-06-29 2
2013-06-28 3
I do not care about the sentiment column I want the count of the occurrences per day.
我不关心情绪专栏我想要每天发生的次数。
采纳答案by unutbu
If your timestamp
index is a DatetimeIndex
:
如果您的timestamp
索引是DatetimeIndex
:
import io
import pandas as pd
content = '''\
timestamp score
2013-06-29 00:52:28+00:00 -0.420070
2013-06-29 00:51:53+00:00 -0.445720
2013-06-28 16:40:43+00:00 0.508161
2013-06-28 15:10:30+00:00 0.921474
2013-06-28 15:10:17+00:00 0.876710
'''
df = pd.read_table(io.BytesIO(content), sep='\s{2,}', parse_dates=[0], index_col=[0])
print(df)
so df
looks like this:
所以df
看起来像这样:
score
timestamp
2013-06-29 00:52:28 -0.420070
2013-06-29 00:51:53 -0.445720
2013-06-28 16:40:43 0.508161
2013-06-28 15:10:30 0.921474
2013-06-28 15:10:17 0.876710
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
You can use:
您可以使用:
print(df.groupby(df.index.date).count())
which yields
这产生
score
2013-06-28 3
2013-06-29 2
Note the importance of the parse_dates
parameter. Without it, the index would just be a pandas.core.index.Index
object. In which case you could not use df.index.date
.
注意parse_dates
参数的重要性。没有它,索引将只是一个pandas.core.index.Index
对象。在这种情况下,您不能使用df.index.date
.
So the answer depends on the type(df.index)
, which you have not shown...
因此,答案取决于type(df.index)
您尚未显示的 ......
回答by TomAugspurger
In [145]: df
Out[145]:
timestamp
2013-06-29 00:52:28 -0.420070
2013-06-29 00:51:53 -0.445720
2013-06-28 16:40:43 0.508161
2013-06-28 15:10:30 0.921474
2013-06-28 15:10:17 0.876710
Name: score, dtype: float64
In [160]: df.groupby(lambda x: x.date).count()
Out[160]:
2013-06-28 3
2013-06-29 2
dtype: int64
回答by gowithefloww
Otherwise, using the resamplefunction.
否则,使用resample函数。
In [419]: df
Out[419]:
timestamp
2013-06-29 00:52:28 -0.420070
2013-06-29 00:51:53 -0.445720
2013-06-28 16:40:43 0.508161
2013-06-28 15:10:30 0.921474
2013-06-28 15:10:17 0.876710
Name: score, dtype: float64
In [420]: df.resample('D', how={'score':'count'})
Out[420]:
2013-06-28 3
2013-06-29 2
dtype: int64
UPDATE : with pandas 0.18+
更新:使用熊猫 0.18+
as @jbochi pointed out, resample with how
is now deprecated. Use instead :
正如@jbochi 所指出的,how
现在不推荐使用resample with 。改用:
df.resample('D').apply({'score':'count'})