Python & Pandas - 按天分组并计算每一天
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48961892/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python & Pandas - Group by day and count for each day
提问by Erwan Pesle
I am new on pandas and for now i don't get how to arrange my time serie, take a look at it :
我是Pandas的新手,现在我不知道如何安排我的时间系列,看看它:
date & time of connection
19/06/2017 12:39
19/06/2017 12:40
19/06/2017 13:11
20/06/2017 12:02
20/06/2017 12:04
21/06/2017 09:32
21/06/2017 18:23
21/06/2017 18:51
21/06/2017 19:08
21/06/2017 19:50
22/06/2017 13:22
22/06/2017 13:41
22/06/2017 18:01
23/06/2017 16:18
23/06/2017 17:00
23/06/2017 19:25
23/06/2017 20:58
23/06/2017 21:03
23/06/2017 21:05
This is a sample of a dataset of 130 k raws,I tried :
df.groupby('date & time of connection')['date & time of connection'].apply(list)
这是 130 k 原始数据集的示例,我尝试过:
df.groupby('date & time of connection')['date & time of connection'].apply(list)
Not enough i guess
我猜还不够
I think i should :
我想我应该:
- Create a dictionnary with index from dd/mm/yyyy to dd/mm/yyyy
- Convert "date & time of connection" type dateTime to Date
- Group and count Date of "date & time of connection"
- Put the numbers i count inside the dictionary ?
- 创建一个索引从 dd/mm/yyyy 到 dd/mm/yyyy 的字典
- 将“连接日期和时间”类型的日期时间转换为日期
- “连接日期和时间”的分组和计数日期
- 把我数的数字放在字典里?
What do you think about my logic ? Do you know some tutos ? Thank you very much
你怎么看我的逻辑?你知道一些教程吗?非常感谢
回答by jezrael
You can use dt.floor
for convert to date
s and then value_counts
or groupby
with size
:
您可以使用dt.floor
for 转换为date
s 然后value_counts
或groupby
with size
:
df = (pd.to_datetime(df['date & time of connection'])
.dt.floor('d')
.value_counts()
.rename_axis('date')
.reset_index(name='count'))
print (df)
date count
0 2017-06-23 6
1 2017-06-21 5
2 2017-06-19 3
3 2017-06-22 3
4 2017-06-20 2
Or:
或者:
s = pd.to_datetime(df['date & time of connection'])
df = s.groupby(s.dt.floor('d')).size().reset_index(name='count')
print (df)
date & time of connection count
0 2017-06-19 3
1 2017-06-20 2
2 2017-06-21 5
3 2017-06-22 3
4 2017-06-23 6
Timings:
时间:
np.random.seed(1542)
N = 220000
a = np.unique(np.random.randint(N, size=int(N/2)))
df = pd.DataFrame(pd.date_range('2000-01-01', freq='37T', periods=N)).drop(a)
df.columns = ['date & time of connection']
df['date & time of connection'] = df['date & time of connection'].dt.strftime('%d/%m/%Y %H:%M:%S')
print (df.head())
In [193]: %%timeit
...: df['date & time of connection']=pd.to_datetime(df['date & time of connection'])
...: df1 = df.groupby(by=df['date & time of connection'].dt.date).count()
...:
539 ms ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [194]: %%timeit
...: df1 = (pd.to_datetime(df['date & time of connection'])
...: .dt.floor('d')
...: .value_counts()
...: .rename_axis('date')
...: .reset_index(name='count'))
...:
12.4 ms ± 350 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [195]: %%timeit
...: s = pd.to_datetime(df['date & time of connection'])
...: df2 = s.groupby(s.dt.floor('d')).size().reset_index(name='count')
...:
17.7 ms ± 140 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
回答by Allen
To make sure your columns in in date format.
确保您的列采用日期格式。
df['date & time of connection']=pd.to_datetime(df['date & time of connection'])
Then you can group the data by date and do a count:
然后您可以按日期对数据进行分组并进行计数:
df.groupby(by=df['date & time of connection'].dt.date).count()
Out[10]:
date & time of connection
date & time of connection
2017-06-19 3
2017-06-20 2
2017-06-21 5
2017-06-22 3
2017-06-23 6
回答by Jaan Olev
Hey I found easy way to do this with resample.
嘿,我找到了使用重新采样的简单方法。
# Set the date column as index column.
df = df.set_index('your_date_column')
# Make counts
df_counts = df.your_date_column.resample('D').count()
Although your column name is long and contains spaces, which makes me a little cringy. I would use dashes instead of spaces.
虽然你的列名很长并且包含空格,这让我有点害怕。我会使用破折号而不是空格。