pandas 尝试修改pandas groupby的列值时出现“ValueError:值的长度与索引的长度不匹配”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46446956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"ValueError: Length of values does not match length of index" when trying to modify column values a pandas groupby
提问by cs95
I have a dataframe:
我有一个数据框:
A C D
0 one 0.410599 -0.205158
1 one 0.144044 0.313068
2 one 0.333674 -0.742165
3 three 0.761038 -2.552990
4 three 1.494079 2.269755
5 two 1.454274 -0.854096
6 two 0.121675 0.653619
7 two 0.443863 0.864436
Let's assume that A
is the anchor column. I now want to display each group value only once, at the top:
让我们假设这A
是锚列。我现在只想在顶部显示每个组值一次:
A C D
0 one 0.410599 -0.205158
1 0.144044 0.313068
2 0.333674 -0.742165
3 three 0.761038 -2.552990
4 1.494079 2.269755
5 two 1.454274 -0.854096
6 0.121675 0.653619
7 0.443863 0.864436
This is what I've come up with:
这是我想出的:
df['A'] = df.groupby('A', as_index=False)['A']\
.apply(lambda x: x.str.replace('.*', '').set_value(0, x.values[0])).values
My strategy was to do a groupby and then set all values to an empty string other than the first. This doesn't seem to work, because I get:
我的策略是做一个 groupby,然后将所有值设置为第一个以外的空字符串。这似乎不起作用,因为我得到:
ValueError: Length of values does not match length of index
Which means that the output I get is incorrect. Any ideas/suggestions/improvements welcome.
这意味着我得到的输出不正确。欢迎任何想法/建议/改进。
I should add that I am trying to generalise a solution that can single out values at the top OR bottom OR middle of each group, so I'd give more preference to a solution that helps me do that (to understand, the example above shows how to single out values only at the top of each group, however, I want to generalise a solution that allows me to single them out at the bottom or in the middle).
我应该补充一点,我正在尝试概括一个解决方案,该解决方案可以在每个组的顶部或底部或中间挑选出值,因此我更倾向于帮助我做到这一点的解决方案(要理解,上面的示例显示如何仅在每个组的顶部挑出值,但是,我想概括一个解决方案,允许我在底部或中间挑出它们)。
回答by Bharath
Your method didn't work because of the index error. When you groupby 'A', the index is represented the same way in the grouped data too. Since set_value(0)
could not find the correct index, it creates a new objectwith that index. That's the reason why there was a length mismatch.
由于索引错误,您的方法无效。当您按“A”分组时,索引在分组数据中的表示方式也相同。由于set_value(0)
找不到正确的索引,它使用该索引创建一个新对象。这就是长度不匹配的原因。
Fix 1reset_index(drop=True)
修复 1reset_index(drop=True)
df['A'] = df.groupby('A')['A'].apply(lambda x: x.str.replace('.*', '')\
.reset_index(drop=True).set_value(0, x.values[0])).values
df
A C D
0 one 0.410599 -0.205158
1 0.144044 0.313068
2 0.333674 -0.742165
3 three 0.761038 -2.552990
4 1.494079 2.269755
5 two 1.454274 -0.854096
6 0.121675 0.653619
7 0.443863 0.864436
Fix 2set_value
修复 2set_value
set_value
has a 3rd parameter called takeable
which determines how the index is treated. It is False
by default, but setting it to True
worked for my case.
set_value
有一个调用的第三个参数takeable
,它决定如何处理索引。这是False
默认,但它设置为True
我的情况下工作。
In addition to Zero's solutions, the solution for isolating values at the centre of their groups is as follows:
除了Zero 的解决方案之外,在其组的中心隔离值的解决方案如下:
df.A = df.groupby('A'['A'].apply(lambda x: x.str.replace('.*', '')\
.set_value(len(x) // 2, x.values[0], True)).values
df
A C D
0 0.410599 -0.205158
1 one 0.144044 0.313068
2 0.333674 -0.742165
3 0.761038 -2.552990
4 three 1.494079 2.269755
5 1.454274 -0.854096
6 two 0.121675 0.653619
7 0.443863 0.864436
回答by Zero
Since the values are sorted, use the duplicated
method for the first and last cases.
由于值已排序,因此duplicated
对第一种和最后一种情况使用该方法。
Keep First
保持第一
In [4233]: df.loc[df.A.duplicated(keep='first'), 'A'] = ''
In [4234]: df
Out[4234]:
A C D
0 one 0.410599 -0.205158
1 0.144044 0.313068
2 0.333674 -0.742165
3 three 0.761038 -2.552990
4 1.494079 2.269755
5 two 1.454274 -0.854096
6 0.121675 0.653619
7 0.443863 0.864436
Keep Last
保持最后
In [4236]: df.loc[df.A.duplicated(keep='last'), 'A'] = ''
In [4237]: df
Out[4237]:
A C D
0 0.410599 -0.205158
1 0.144044 0.313068
2 one 0.333674 -0.742165
3 0.761038 -2.552990
4 three 1.494079 2.269755
5 1.454274 -0.854096
6 0.121675 0.653619
7 two 0.443863 0.864436