将 Pandas groupby 操作的输出保存为 CSV
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46087311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Save the output of a pandas groupby operation to CSV
提问by Tom_Hanks
I would like to ask a question about Pandas groupby. I am using ipython notebook (python3).
我想问一个关于 Pandas groupby 的问题。我正在使用 ipython 笔记本(python3)。
For example, there is a dataframe like this.
例如,有一个这样的数据框。
df1 = pd.DataFrame( { "Score" : ["A", "B", "C", "A", "B", "A"] ,"Class":
["Physics", "Science", "Chemistry", "Biology", "History", "English"] } )
Then, I want to groupby with Score.
然后,我想与 Score 分组。
df1.groupby("Score")
I need a output file of this and I tried
我需要一个输出文件,我试过了
df1.groupby("Score").to_csv("Score.txt",sep="\t")
but this does not work. Does anyone know how to make output file?
但这不起作用。有谁知道如何制作输出文件?
回答by piRSquared
What you're asking makes no sense. You may not realize it though. groupby
creates a staging area for which to perform aggregation or transformations across groups of data. Like, if we wanted to count the number of observations for each group, that'd be an aggregation.
你问的毫无意义。不过你可能没有意识到。 groupby
创建一个临时区域,用于跨数据组执行聚合或转换。就像,如果我们想计算每个组的观察次数,那就是聚合。
Because you thought that you could output as some table, I'm going to guess that you thought groupby
actually grouped the rows together. That isn't bad interpretation of the term if you had never seen it used before, even if it is incorrect. The way to do that would be to sort using the method sort_values
.
因为您认为可以输出为某个表,所以我猜您认为groupby
实际上是将行分组在一起。如果您以前从未见过使用过该术语,即使它是不正确的,这也不是对这个术语的错误解释。这样做的方法是使用 method 进行排序sort_values
。
df1.sort_values('Score')
Class Score
0 Physics A
3 Biology A
5 English A
1 Science B
4 History B
2 Chemistry C
If Score were something else that wasn't already ordered lexicographically, we could use the categorical
type to handle it for us.
如果 Score 是其他尚未按字典顺序排序的内容,我们可以使用该categorical
类型为我们处理它。
score = df1.Score.astype('category', categories=list('ABCDF'), ordered=True)
df1.assign(Score=score).sort_values('Score')
Class Score
0 Physics A
3 Biology A
5 English A
1 Science B
4 History B
2 Chemistry C
Finally, you output the data to the file as you expected
最后,按预期将数据输出到文件
df1.sort_values('Score').to_csv("Score.txt", sep="\t")
回答by YOBEN_S
Here is the solution ,I think is close to what you want
这是解决方案,我认为接近您想要的
df1=df1.reset_index()
df1=df1.groupby(['Score','index']).Class.apply(sum).to_frame()
df1
Out[102]:
Class
Score index
A 0 Physics
3 Biology
5 English
B 1 Science
4 History
C 2 Chemistry
回答by u7102456
You need to tell what you want to groupby counts, means or others.
您需要通过计数、手段或其他方式告诉您要分组的内容。
df1.groupby("Score").count().to_csv('d.csv')