Python 组合熊猫中的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17438906/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Combining rows in pandas
提问by lightlike
I have a DataFrame with an index called city_id
of cities in the format [city],[state]
(e.g., new york,ny
containing integer counts in the columns. The problem is that I have multiple rows for the same city, and I want to collapse the rows sharing a city_id
by adding their column values. I looked at groupby()
but it wasn't immediately obvious how to apply it to this problem.
我有一个 DataFrame,其索引city_id
格式为城市[city],[state]
(例如,new york,ny
列中包含整数计数。问题是我有同一个城市的多行,我想city_id
通过添加它们的列值来折叠共享 a 的行. 我看了看,groupby()
但不是很明显如何将它应用于这个问题。
Edit:
编辑:
An example: I'd like to change this:
一个例子:我想改变这个:
city_id val1 val2 val3
houston,tx 1 2 0
houston,tx 0 0 1
houston,tx 2 1 1
into this:
进入这个:
city_id val1 val2 val3
houston,tx 3 3 2
if there are ~10-20k rows.
如果有 ~10-20k 行。
采纳答案by DSM
Starting from
从...开始
>>> df
val1 val2 val3
city_id
houston,tx 1 2 0
houston,tx 0 0 1
houston,tx 2 1 1
somewhere,ew 4 3 7
I might do
我可能会
>>> df.groupby(df.index).sum()
val1 val2 val3
city_id
houston,tx 3 3 2
somewhere,ew 4 3 7
or
或者
>>> df.reset_index().groupby("city_id").sum()
val1 val2 val3
city_id
houston,tx 3 3 2
somewhere,ew 4 3 7
The first approach passes the index values (in this case, the city_id
values) to groupby
and tells it to use those as the group keys, and the second resets the index and then selects the city_id
column. See this sectionof the docs for more examples. Note that there are lots of other methods in the DataFrameGroupBy
objects, too:
第一种方法将索引值(在本例中为city_id
值)传递给groupby
并告诉它使用这些作为组键,第二种方法重置索引,然后选择city_id
列。有关更多示例,请参阅文档的这一部分。请注意,对象中还有许多其他方法DataFrameGroupBy
:
>>> df.groupby(df.index)
<pandas.core.groupby.DataFrameGroupBy object at 0x1045a1790>
>>> df.groupby(df.index).max()
val1 val2 val3
city_id
houston,tx 2 2 1
somewhere,ew 4 3 7
>>> df.groupby(df.index).mean()
val1 val2 val3
city_id
houston,tx 1 1 0.666667
somewhere,ew 4 3 7.000000
回答by LonelySoul
Something in the same line. Sorry not the exact replica.
在同一行的东西。抱歉不是确切的复制品。
mydata = [{'subid' : 'B14-111', 'age': 75, 'fdg':1.78},
{'subid' : 'B14-112', 'age': 22, 'fdg':1.56},{'subid' : 'B14-112', 'age': 40, 'fdg':2.00},]
df = pandas.DataFrame(mydata)
gg = df.groupby("subid",sort=True).sum()