折叠 Pandas 数据框中的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43514019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:26:04  来源:igfitidea点击:

Collapsing rows in a Pandas dataframe

python-3.xpandas

提问by user4896331

I'm trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, but it seems to be slanted towards performing some numeric function on the group - I just want to keep the text. Here's what I've got...

我正在尝试折叠包含一列 ID 数据和多列(每列包含不同的字符串)的数据框中的行。看起来 groupby 是解决方案,但它似乎倾向于在组上执行一些数字功能 - 我只想保留文本。这是我所拥有的...

I have a dataframe of the form:

我有一个如下形式的数据框:

index    ID     apples    pears    oranges
0        101                       oranges
1        134    apples
2        576              pears
3        837    apples
4        576                       oranges
5        134              pears

The columns are clean: so the apples column will only ever have the text "apples" in it, or it will be blank".

列是干净的:所以苹果列中只会有文本“苹果”,或者它会是空白的。

Where there are multiple entries under the same ID (in this example, on IDs 134 & 576), I want to collapse the rows together to get this:

如果同一 ID 下有多个条目(在本例中,ID 为 134 和 576),我想将行折叠在一起以获得以下结果:

index    ID     apples    pears    oranges
0        101                       oranges
1        134    apples    pears
2        576              pears    oranges
3        837    apples

I could do this by iterating over the rows, but it seems like a non-pandas solution. Is there a better way?

我可以通过迭代行来做到这一点,但这似乎是一个非Pandas解决方案。有没有更好的办法?

回答by jezrael

You can use groupbywith aggregation ''.join, sumor max:

您可以使用groupby聚合''.joinsummax

#if blank values are NaN first replace to ''
df = df.fillna('')

df = df.groupby('ID').agg(''.join)
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples   

Also works:

也有效:

df = df.fillna('')
df = df.groupby('ID').sum()
#alternatively max
#df = df.groupby('ID').max()
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples     

Also if need remove duplicates per group and per column add unique:

此外,如果需要删除每组和每列的重复项,请添加unique

df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))

回答by piRSquared

Assuming blanks are ''

假设空白是 ''

option 1
pivot_table

选项1
pivot_table

df.pivot_table(['apples', 'pears', 'oranges'], 'ID', aggfunc=''.join)

option 2
sortand take last row as ''will be sorted first

选项 2
sort并取最后一行,因为''将首先排序

def f(df):
    return pd.DataFrame(np.sort(df.values, 0)[[-1]], [df.name], df.columns)

df.set_index(
    'ID', append=True
).groupby(level='ID', group_keys=False).apply(f)


Both yield

两者产量

     apples  oranges  pears
ID                         
101          oranges       
134  apples           pears
576          oranges  pears
837  apples