Python 熊猫 groupby 到 to_csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47602097/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby to to_csv
提问by kalmdown
Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.
想要将 Pandas groupby 数据框输出到 CSV。尝试了各种 StackOverflow 解决方案,但没有奏效。
Python 3.6.1, Pandas 0.20.1
Python 3.6.1,熊猫 0.20.1
groupby result looks like:
groupby 结果如下:
id month year count
week
0 9066 82 32142 895
1 7679 84 30112 749
2 8368 126 42187 872
3 11038 102 34165 976
4 8815 117 34122 767
5 10979 163 50225 1252
6 8726 142 38159 996
7 5568 63 26143 582
Want a csv that looks like
想要一个看起来像的 csv
week count
0 895
1 749
2 872
3 976
4 767
5 1252
6 996
7 582
Current code:
当前代码:
week_grouped = df.groupby('week')
week_grouped.sum() #At this point you have the groupby result
week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function.
Read SO solutions:
阅读 SO 解决方案:
output groupby to csv file pandas
week_grouped.drop_duplicates().to_csv('week_grouped.csv')
Result:AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method
结果:AttributeError:无法访问“DataFrameGroupBy”对象的可调用属性“drop_duplicates”,请尝试使用“apply”方法
Python pandas - writing groupby output to file
Python pandas - 将 groupby 输出写入文件
week_grouped.reset_index().to_csv('week_grouped.csv')
Result:AttributeError: "Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method"
结果:AttributeError:“无法访问‘DataFrameGroupBy’对象的可调用属性‘reset_index’,请尝试使用‘apply’方法”
回答by Alex Luis Arias
Try doing this:
尝试这样做:
week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')
That'll write the entire dataframe to the file. If you only want those two columns then,
这会将整个数据帧写入文件。如果你只想要那两列,那么
week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
Here's a line by line explanation of the original code:
下面是对原代码的一行一行的解释:
# This creates a "groupby" object (not a dataframe object)
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')
# This instructs pandas to sum up all the numeric type columns in each
# group. This returns a dataframe where each row is the sum of the
# group's numeric columns. You're not storing this dataframe in your
# example.
week_grouped.sum()
# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method.
# So we should store the previous line's result (a dataframe) into a variable
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')
# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')
# Or with less typing simply
week_grouped.sum().to_csv('...')
回答by Peter Leimbigler
Try changing your second line to week_grouped = week_grouped.sum()
and re-running all three lines.
尝试将第二行更改为week_grouped = week_grouped.sum()
并重新运行所有三行。
If you run week_grouped.sum()
in its own Jupyter notebook cell, you'll see how the statement returnsthe output to the cell's output, instead of assigning the result back to week_grouped
. Some pandas methods have an inplace=True
argument (e.g., df.sort_values(by=col_name, inplace=True)
), but sum
does not.
如果您week_grouped.sum()
在其自己的 Jupyter notebook 单元中运行,您将看到该语句如何将输出返回到单元的输出,而不是将结果分配回week_grouped
. 一些 Pandas 方法有一个inplace=True
参数(例如,df.sort_values(by=col_name, inplace=True)
),但sum
没有。
EDIT:does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby
:
编辑:每周数字是否只在您的 CSV 中出现一次?如果是这样,这里有一个更简单的解决方案,不使用groupby
:
df = pd.read_csv('input.csv')
df[['id', 'count']].to_csv('output.csv')
回答by Lucas Dresl
I feel that there is no need to use a groupby, you can just drop the columns you do not want too.
我觉得没有必要使用 groupby,你也可以删除你不想要的列。
df = df.drop(['month','year'], axis=1)
df.reset_index()
df.to_csv('Your path')
回答by Revaz
Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.
Group By 返回键值对,其中键是组的标识符,值是组本身,即与键匹配的原始 df 的子集。
In your example week_grouped = df.groupby('week')
is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:
在您的示例中week_grouped = df.groupby('week')
是一组组(pandas.core.groupby.DataFrameGroupBy 对象),您可以按如下方式详细浏览:
for k, gr in week_grouped:
# do your stuff instead of print
print(k)
print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
print(gr)
# You can save each 'gr' in a csv as follows
gr.to_csv('{}.csv'.format(k))
Or alternatively you can compute aggregation function on your grouped object
或者,您可以在分组对象上计算聚合函数
result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv')
In your example you need to assign the function result to some variable as by default pandas objects are immutable.
在您的示例中,您需要将函数结果分配给某个变量,因为默认情况下,pandas 对象是不可变的。
some_variable = week_grouped.sum()
some_variable.to_csv('week_grouped.csv') # This will work
basically result.csv and week_grouped.csv are meant to be same
基本上 result.csv 和 week_grouped.csv 意味着相同
回答by Waldeyr Mendes da Silva
Pandas groupby generates a lot of information (count, mean, std, ...). If you want to save all of them in a csv file, first you need to convert it to a regular Dataframe:
Pandas groupby 会生成大量信息(计数、均值、标准差等)。如果要将它们全部保存在 csv 文件中,首先需要将其转换为常规 Dataframe:
import pandas as pd
...
...
MyGroupDataFrame = MyDataFrame.groupby('id')
pd.DataFrame(MyGroupDataFrame.describe()).to_csv("myTSVFile.tsv", sep='\t', encoding='utf-8')