Python 组合熊猫中的行

Question

提问by lightlike

I have a DataFrame with an index called city_idof cities in the format [city],[state](e.g., new york,nycontaining integer counts in the columns. The problem is that I have multiple rows for the same city, and I want to collapse the rows sharing a city_idby adding their column values. I looked at groupby()but it wasn't immediately obvious how to apply it to this problem.

我有一个 DataFrame，其索引city_id格式为城市[city],[state]（例如，new york,ny列中包含整数计数。问题是我有同一个城市的多行，我想city_id通过添加它们的列值来折叠共享 a 的行. 我看了看，groupby()但不是很明显如何将它应用于这个问题。

Edit:

编辑：

An example: I'd like to change this:

一个例子：我想改变这个：

city_id    val1 val2 val3
houston,tx    1    2    0
houston,tx    0    0    1
houston,tx    2    1    1

into this:

进入这个：

city_id    val1 val2 val3
houston,tx    3    3    2

if there are ~10-20k rows.

如果有 ~10-20k 行。

Answer 1

采纳答案by DSM

Starting from

从...开始

>>> df
              val1  val2  val3
city_id                       
houston,tx       1     2     0
houston,tx       0     0     1
houston,tx       2     1     1
somewhere,ew     4     3     7

I might do

我可能会

>>> df.groupby(df.index).sum()
              val1  val2  val3
city_id                       
houston,tx       3     3     2
somewhere,ew     4     3     7

or

或者

>>> df.reset_index().groupby("city_id").sum()
              val1  val2  val3
city_id                       
houston,tx       3     3     2
somewhere,ew     4     3     7

The first approach passes the index values (in this case, the city_idvalues) to groupbyand tells it to use those as the group keys, and the second resets the index and then selects the city_idcolumn. See this sectionof the docs for more examples. Note that there are lots of other methods in the DataFrameGroupByobjects, too:

第一种方法将索引值（在本例中为city_id值）传递给groupby并告诉它使用这些作为组键，第二种方法重置索引，然后选择city_id列。有关更多示例，请参阅文档的这一部分。请注意，对象中还有许多其他方法DataFrameGroupBy：

>>> df.groupby(df.index)
<pandas.core.groupby.DataFrameGroupBy object at 0x1045a1790>
>>> df.groupby(df.index).max()
              val1  val2  val3
city_id                       
houston,tx       2     2     1
somewhere,ew     4     3     7
>>> df.groupby(df.index).mean()
              val1  val2      val3
city_id                           
houston,tx       1     1  0.666667
somewhere,ew     4     3  7.000000

Answer 2

回答by LonelySoul

Something in the same line. Sorry not the exact replica.

在同一行的东西。抱歉不是确切的复制品。

mydata = [{'subid' : 'B14-111', 'age': 75, 'fdg':1.78},
          {'subid' : 'B14-112', 'age': 22, 'fdg':1.56},{'subid' : 'B14-112', 'age': 40, 'fdg':2.00},]
df = pandas.DataFrame(mydata)

gg = df.groupby("subid",sort=True).sum()

Python 组合熊猫中的行

提问by lightlike

采纳答案by DSM

回答by LonelySoul

相关推荐

最近更新

标签

Python 组合熊猫中的行

提问by lightlike

采纳答案by DSM

回答by LonelySoul

相关推荐

Python 如何将元组转换为不带逗号和括号的值字符串

使用 for 循环填充字典（python）

Python：区分行向量和列向量

Python错误：找不到命令

相关推荐

最近更新

标签