折叠 Pandas 数据框中的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43514019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Collapsing rows in a Pandas dataframe
提问by user4896331
I'm trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, but it seems to be slanted towards performing some numeric function on the group - I just want to keep the text. Here's what I've got...
我正在尝试折叠包含一列 ID 数据和多列(每列包含不同的字符串)的数据框中的行。看起来 groupby 是解决方案,但它似乎倾向于在组上执行一些数字功能 - 我只想保留文本。这是我所拥有的...
I have a dataframe of the form:
我有一个如下形式的数据框:
index ID apples pears oranges
0 101 oranges
1 134 apples
2 576 pears
3 837 apples
4 576 oranges
5 134 pears
The columns are clean: so the apples column will only ever have the text "apples" in it, or it will be blank".
列是干净的:所以苹果列中只会有文本“苹果”,或者它会是空白的。
Where there are multiple entries under the same ID (in this example, on IDs 134 & 576), I want to collapse the rows together to get this:
如果同一 ID 下有多个条目(在本例中,ID 为 134 和 576),我想将行折叠在一起以获得以下结果:
index ID apples pears oranges
0 101 oranges
1 134 apples pears
2 576 pears oranges
3 837 apples
I could do this by iterating over the rows, but it seems like a non-pandas solution. Is there a better way?
我可以通过迭代行来做到这一点,但这似乎是一个非Pandas解决方案。有没有更好的办法?
回答by jezrael
You can use groupby
with aggregation ''.join
, sum
or max
:
您可以使用groupby
聚合''.join
,sum
或max
:
#if blank values are NaN first replace to ''
df = df.fillna('')
df = df.groupby('ID').agg(''.join)
print (df)
apples pears oranges
ID
101 oranges
134 apples pears
576 pears oranges
837 apples
Also works:
也有效:
df = df.fillna('')
df = df.groupby('ID').sum()
#alternatively max
#df = df.groupby('ID').max()
print (df)
apples pears oranges
ID
101 oranges
134 apples pears
576 pears oranges
837 apples
Also if need remove duplicates per group and per column add unique
:
此外,如果需要删除每组和每列的重复项,请添加unique
:
df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))
回答by piRSquared
Assuming blanks are ''
假设空白是 ''
option 1pivot_table
选项1pivot_table
df.pivot_table(['apples', 'pears', 'oranges'], 'ID', aggfunc=''.join)
option 2sort
and take last row as ''
will be sorted first
选项 2sort
并取最后一行,因为''
将首先排序
def f(df):
return pd.DataFrame(np.sort(df.values, 0)[[-1]], [df.name], df.columns)
df.set_index(
'ID', append=True
).groupby(level='ID', group_keys=False).apply(f)
Both yield
两者产量
apples oranges pears
ID
101 oranges
134 apples pears
576 oranges pears
837 apples