仅使用一行交换 Pandas 数据框中选定行的列值的正确语法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25792619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is correct syntax to swap column values for selected rows in a pandas data frame using just one line?
提问by stachyra
I am using pandasversion 0.14.1 with Python 2.7.5, and I have a data frame with three columns, e.g.:
我使用的大Pandas版本0.14.1与Python 2.7.5,和我有三列,例如数据帧:
import pandas as pd
d = {'L': ['left', 'right', 'left', 'right', 'left', 'right'],
'R': ['right', 'left', 'right', 'left', 'right', 'left'],
'VALUE': [-1, 1, -1, 1, -1, 1]}
df = pd.DataFrame(d)
idx = (df['VALUE'] == 1)
results in a data frame which looks like this:
结果是一个如下所示的数据框:
L R VALUE
0 left right -1
1 right left 1
2 left right -1
3 right left 1
4 left right -1
5 right left 1
For rows where VALUE == 1, I would like to swap the contents of the left and right columns, so that all of the "left" values will end up under the "L" column, and the "right" values end up under the "R" column.
对于 where 的行VALUE == 1,我想交换左右列的内容,以便所有“左”值都在“L”列下结束,而“右”值在“R”下结束柱子。
Having already defined the idxvariable above, I can easily do this in just three more lines, by using a temporary variable as follows:
已经定义了idx上面的变量,通过使用如下临时变量,我可以在另外三行中轻松完成此操作:
tmp = df.loc[idx,'L']
df.loc[idx,'L'] = df.loc[idx,'R']
df.loc[idx,'R'] = tmp
however this seems like really clunky and inelegant syntax to me; surely pandas supports something more succinct? I've noticed that if I swap the column order in the input to the data frame .locattribute, then I get the following swapped output:
然而,这对我来说似乎是非常笨拙和不优雅的语法;大Pandas肯定支持更简洁的东西吗?我注意到,如果我将输入中的列顺序交换到数据框.loc属性,则会得到以下交换输出:
In [2]: print(df.loc[idx,['R','L']])
R L
1 left right
3 left right
5 left right
This suggests to me that I should be able to implement the same swap as above, by using just the following single line:
这向我表明,我应该能够通过使用以下单行来实现与上述相同的交换:
df.loc[idx,['L','R']] = df.loc[idx,['R','L']]
However when I actually try this, nothing happens--the columns remain unswapped. It's as if pandas automatically recognizes that I've put the columns in the wrong order on the right hand side of the assignment statement, and it automatically corrects for the problem. Is there a way that I can disable this "column order autocorrection" in pandas assignment statements, in order to implement the swap without creating unnecessary temporary variables?
然而,当我真正尝试这个时,没有任何反应——列保持未交换。就好像 Pandas 会自动识别出我在赋值语句的右侧以错误的顺序放置了列,并且它会自动更正问题。有没有办法可以在 Pandas 赋值语句中禁用这个“列顺序自动更正”,以便在不创建不必要的临时变量的情况下实现交换?
回答by DSM
One way you could avoid alignment on column names would be to drop down to the underlying array via .values:
避免列名对齐的一种方法是通过以下方式下拉到底层数组.values:
In [33]: df
Out[33]:
L R VALUE
0 left right -1
1 right left 1
2 left right -1
3 right left 1
4 left right -1
5 right left 1
In [34]: df.loc[idx,['L','R']] = df.loc[idx,['R','L']].values
In [35]: df
Out[35]:
L R VALUE
0 left right -1
1 left right 1
2 left right -1
3 left right 1
4 left right -1
5 left right 1
回答by JohnE
The key thing to note here is that pandas attempts to automatically align rows and columns using the index and column names. Hence, you need to somehow tell pandas to ignore the column names here. One way is as @DSM does, by converting to a numpy array. Another way is to rename the columns:
这里要注意的关键是,pandas 尝试使用索引和列名自动对齐行和列。因此,您需要以某种方式告诉Pandas忽略此处的列名。一种方法是 @DSM 所做的,通过转换为 numpy 数组。另一种方法是重命名列:
>>> df.loc[idx] = df.loc[idx].rename(columns={'R':'L','L':'R'})
L R VALUE
0 left right -1
1 left right 1
2 left right -1
3 left right 1
4 left right -1
5 left right 1
回答by Bharath
You can also do this with np.selectand df.wherei.e
你也可以做到这一点与np.select和df.where即
Option 1: np.select
选项 1:np.select
df[['L','R']] = pd.np.select(df['VALUE'] == 1, df[['R','L']].values, df[['L','R']].values)
Option 2: df.where
选项2:df.where
df[['L','R']] = df[['R','L']].where(df['VALUE'] == 1, df[['L','R']].values)
Option 3: df.mask
选项 3:df.mask
df[['L','R']] = df[['L','R']].mask( df['VALUE'] == 1, df[['R','L']].values)
Output:
输出:
L R VALUE
0 left right -1
1 left right 1
2 left right -1
3 left right 1
4 left right -1
5 left right 1

