Python 总结每天熊猫的出现次数

Question

提问by myusuf3

I have a data set like so in a pandas dataframe:

我在熊猫数据框中有一个像这样的数据集：

                                  score
timestamp                                 
2013-06-29 00:52:28+00:00        -0.420070
2013-06-29 00:51:53+00:00        -0.445720
2013-06-28 16:40:43+00:00         0.508161
2013-06-28 15:10:30+00:00         0.921474
2013-06-28 15:10:17+00:00         0.876710

I need to get counts for the number of measurements, that occur so I am looking for something like this:

我需要计算发生的测量次数，所以我正在寻找这样的东西：

                                    count
   timestamp
   2013-06-29                       2
   2013-06-28                       3

I do not care about the sentiment column I want the count of the occurrences per day.

我不关心情绪专栏我想要每天发生的次数。

Answer 1

采纳答案by unutbu

If your timestampindex is a DatetimeIndex:

如果您的timestamp索引是DatetimeIndex：

import io
import pandas as pd
content = '''\
timestamp  score
2013-06-29 00:52:28+00:00        -0.420070
2013-06-29 00:51:53+00:00        -0.445720
2013-06-28 16:40:43+00:00         0.508161
2013-06-28 15:10:30+00:00         0.921474
2013-06-28 15:10:17+00:00         0.876710
'''

df = pd.read_table(io.BytesIO(content), sep='\s{2,}', parse_dates=[0], index_col=[0])

print(df)

so dflooks like this:

所以df看起来像这样：

                        score
timestamp                    
2013-06-29 00:52:28 -0.420070
2013-06-29 00:51:53 -0.445720
2013-06-28 16:40:43  0.508161
2013-06-28 15:10:30  0.921474
2013-06-28 15:10:17  0.876710

print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>

You can use:

您可以使用：

print(df.groupby(df.index.date).count())

which yields

这产生

            score
2013-06-28      3
2013-06-29      2

Note the importance of the parse_datesparameter. Without it, the index would just be a pandas.core.index.Indexobject. In which case you could not use df.index.date.

注意parse_dates参数的重要性。没有它，索引将只是一个pandas.core.index.Index对象。在这种情况下，您不能使用df.index.date.

So the answer depends on the type(df.index), which you have not shown...

因此，答案取决于type(df.index)您尚未显示的 ......

Answer 2

回答by TomAugspurger

In [145]: df
Out[145]: 
timestamp
2013-06-29 00:52:28   -0.420070
2013-06-29 00:51:53   -0.445720
2013-06-28 16:40:43    0.508161
2013-06-28 15:10:30    0.921474
2013-06-28 15:10:17    0.876710
Name: score, dtype: float64

In [160]: df.groupby(lambda x: x.date).count()
Out[160]: 
2013-06-28    3
2013-06-29    2
dtype: int64

Answer 3

回答by gowithefloww

Otherwise, using the resamplefunction.

否则，使用resample函数。

In [419]: df
Out[419]: 
timestamp
2013-06-29 00:52:28   -0.420070
2013-06-29 00:51:53   -0.445720
2013-06-28 16:40:43    0.508161
2013-06-28 15:10:30    0.921474
2013-06-28 15:10:17    0.876710
Name: score, dtype: float64

In [420]: df.resample('D', how={'score':'count'})

Out[420]: 
2013-06-28    3
2013-06-29    2
dtype: int64

UPDATE : with pandas 0.18+

更新：使用熊猫 0.18+

as @jbochi pointed out, resample with howis now deprecated. Use instead :

正如@jbochi 所指出的，how现在不推荐使用resample with 。改用：

df.resample('D').apply({'score':'count'})

Python 总结每天熊猫的出现次数

提问by myusuf3

采纳答案by unutbu

回答by TomAugspurger

回答by gowithefloww

相关推荐

最近更新

标签

Python 总结每天熊猫的出现次数

提问by myusuf3

采纳答案by unutbu

回答by TomAugspurger

回答by gowithefloww

相关推荐

Python 如何将列和行的 Pandas DataFrame 子集转换为 numpy 数组？

Python 如果与黑白图像一起使用，OpenCV findContours() 会抱怨

Python 将 csv 转换为 xlsx

Python 如何在 Apache Spark 预构建版本中添加任何新库，如 spark-csv

相关推荐

最近更新

标签