折叠 Pandas 数据框中的行

Question

提问by user4896331

I'm trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, but it seems to be slanted towards performing some numeric function on the group - I just want to keep the text. Here's what I've got...

我正在尝试折叠包含一列 ID 数据和多列（每列包含不同的字符串）的数据框中的行。看起来 groupby 是解决方案，但它似乎倾向于在组上执行一些数字功能 - 我只想保留文本。这是我所拥有的...

I have a dataframe of the form:

我有一个如下形式的数据框：

index    ID     apples    pears    oranges
0        101                       oranges
1        134    apples
2        576              pears
3        837    apples
4        576                       oranges
5        134              pears

The columns are clean: so the apples column will only ever have the text "apples" in it, or it will be blank".

列是干净的：所以苹果列中只会有文本“苹果”，或者它会是空白的。

Where there are multiple entries under the same ID (in this example, on IDs 134 & 576), I want to collapse the rows together to get this:

如果同一 ID 下有多个条目（在本例中，ID 为 134 和 576），我想将行折叠在一起以获得以下结果：

index    ID     apples    pears    oranges
0        101                       oranges
1        134    apples    pears
2        576              pears    oranges
3        837    apples

I could do this by iterating over the rows, but it seems like a non-pandas solution. Is there a better way?

我可以通过迭代行来做到这一点，但这似乎是一个非Pandas解决方案。有没有更好的办法？

Answer 1

回答by jezrael

You can use groupbywith aggregation ''.join, sumor max:

您可以使用groupby聚合''.join，sum或max：

#if blank values are NaN first replace to ''
df = df.fillna('')

df = df.groupby('ID').agg(''.join)
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples

Also works:

也有效：

df = df.fillna('')
df = df.groupby('ID').sum()
#alternatively max
#df = df.groupby('ID').max()
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples

Also if need remove duplicates per group and per column add unique:

此外，如果需要删除每组和每列的重复项，请添加unique：

df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))

Answer 2

回答by piRSquared

Assuming blanks are ''

假设空白是 ''

option 1
pivot_table

选项1
pivot_table

df.pivot_table(['apples', 'pears', 'oranges'], 'ID', aggfunc=''.join)

option 2
sortand take last row as ''will be sorted first

选项 2
sort并取最后一行，因为''将首先排序

def f(df):
    return pd.DataFrame(np.sort(df.values, 0)[[-1]], [df.name], df.columns)

df.set_index(
    'ID', append=True
).groupby(level='ID', group_keys=False).apply(f)

Both yield

两者产量

     apples  oranges  pears
ID                         
101          oranges       
134  apples           pears
576          oranges  pears
837  apples

折叠 Pandas 数据框中的行

提问by user4896331

回答by jezrael

回答by piRSquared

相关推荐

最近更新

标签

折叠 Pandas 数据框中的行

提问by user4896331

回答by jezrael

回答by piRSquared

相关推荐

pandas read_excel 同一张表上的多个表

pandas 如何在python pandas中绘制最小最大线图

pandas 我怎样才能干净地规范化数据，然后在以后“非规范化”它？

pandas 带散景的条形图熊猫数据框

相关推荐

最近更新

标签