Python 熊猫 groupby 到 to_csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47602097/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:16:55  来源:igfitidea点击:

Pandas groupby to to_csv

pythonpandascsvpandas-groupby

提问by kalmdown

Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.

想要将 Pandas groupby 数据框输出到 CSV。尝试了各种 StackOverflow 解决方案,但没有奏效。

Python 3.6.1, Pandas 0.20.1

Python 3.6.1,熊猫 0.20.1

groupby result looks like:

groupby 结果如下:

id  month   year    count
week                
0   9066    82  32142   895
1   7679    84  30112   749
2   8368    126 42187   872
3   11038   102 34165   976
4   8815    117 34122   767
5   10979   163 50225   1252
6   8726    142 38159   996
7   5568    63  26143   582

Want a csv that looks like

想要一个看起来像的 csv

week  count
0   895
1   749
2   872
3   976
4   767
5   1252
6   996
7   582

Current code:

当前代码:

week_grouped = df.groupby('week')
week_grouped.sum() #At this point you have the groupby result
week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function. 

Read SO solutions:

阅读 SO 解决方案:

output groupby to csv file pandas

将 groupby 输出到 csv 文件 pandas

week_grouped.drop_duplicates().to_csv('week_grouped.csv')

Result:AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method

结果:AttributeError:无法访问“DataFrameGroupBy”对象的可调用属性“drop_duplicates”,请尝试使用“apply”方法

Python pandas - writing groupby output to file

Python pandas - 将 groupby 输出写入文件

week_grouped.reset_index().to_csv('week_grouped.csv')

Result:AttributeError: "Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method"

结果:AttributeError:“无法访问‘DataFrameGroupBy’对象的可调用属性‘reset_index’,请尝试使用‘apply’方法”

回答by Alex Luis Arias

Try doing this:

尝试这样做:

week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')

That'll write the entire dataframe to the file. If you only want those two columns then,

这会将整个数据帧写入文件。如果你只想要那两列,那么

week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')

Here's a line by line explanation of the original code:

下面是对原代码的一行一行的解释:

# This creates a "groupby" object (not a dataframe object) 
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')

# This instructs pandas to sum up all the numeric type columns in each 
# group. This returns a dataframe where each row is the sum of the 
# group's numeric columns. You're not storing this dataframe in your 
# example.
week_grouped.sum() 

# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method. 
# So we should store the previous line's result (a dataframe) into a variable 
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')

# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')

# Or with less typing simply
week_grouped.sum().to_csv('...')

回答by Peter Leimbigler

Try changing your second line to week_grouped = week_grouped.sum()and re-running all three lines.

尝试将第二行更改为week_grouped = week_grouped.sum()并重新运行所有三行。

If you run week_grouped.sum()in its own Jupyter notebook cell, you'll see how the statement returnsthe output to the cell's output, instead of assigning the result back to week_grouped. Some pandas methods have an inplace=Trueargument (e.g., df.sort_values(by=col_name, inplace=True)), but sumdoes not.

如果您week_grouped.sum()在其自己的 Jupyter notebook 单元中运行,您将看到该语句如何将输出返回到单元的输出,而不是将结果分配回week_grouped. 一些 Pandas 方法有一个inplace=True参数(例如,df.sort_values(by=col_name, inplace=True)),但sum没有。

EDIT:does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby:

编辑:每周数字是否只在您的 CSV 中出现一次?如果是这样,这里有一个更简单的解决方案,不使用groupby

df = pd.read_csv('input.csv')
df[['id', 'count']].to_csv('output.csv')

回答by Lucas Dresl

I feel that there is no need to use a groupby, you can just drop the columns you do not want too.

我觉得没有必要使用 groupby,你也可以删除你不想要的列。

df = df.drop(['month','year'], axis=1)
df.reset_index()
df.to_csv('Your path')

回答by Revaz

Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.

Group By 返回键值对,其中键是组的标识符,值是组本身,即与键匹配的原始 df 的子集。

In your example week_grouped = df.groupby('week')is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:

在您的示例中week_grouped = df.groupby('week')是一组组(pandas.core.groupby.DataFrameGroupBy 对象),您可以按如下方式详细浏览:

for k, gr in week_grouped:
    # do your stuff instead of print
    print(k)
    print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
    print(gr)
    # You can save each 'gr' in a csv as follows
    gr.to_csv('{}.csv'.format(k))

Or alternatively you can compute aggregation function on your grouped object

或者,您可以在分组对象上计算聚合函数

result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv') 

In your example you need to assign the function result to some variable as by default pandas objects are immutable.

在您的示例中,您需要将函数结果分配给某个变量,因为默认情况下,pandas 对象是不可变的。

some_variable = week_grouped.sum() 
some_variable.to_csv('week_grouped.csv') # This will work

basically result.csv and week_grouped.csv are meant to be same

基本上 result.csv 和 week_grouped.csv 意味着相同

回答by Waldeyr Mendes da Silva

Pandas groupby generates a lot of information (count, mean, std, ...). If you want to save all of them in a csv file, first you need to convert it to a regular Dataframe:

Pandas groupby 会生成大量信息(计数、均值、标准差等)。如果要将它们全部保存在 csv 文件中,首先需要将其转换为常规 Dataframe:

import pandas as pd
...
...
MyGroupDataFrame = MyDataFrame.groupby('id')
pd.DataFrame(MyGroupDataFrame.describe()).to_csv("myTSVFile.tsv", sep='\t', encoding='utf-8')