从 Pandas 数据框中的其他列分配列的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28160808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Assign columns' value from other columns in Pandas dataframe
提问by Chet Meinzer
How do i assign columns in my dataframe to be equal to another column if/where condition is met?
如果/在哪里满足条件,我如何将数据框中的列分配为等于另一列?
Update
The problem
I need to assign many columns values (and sometimes a value from another column in that row) when the condition is met.
The condition is not the problem.
更新 满足条件时
,
我需要分配许多列值(有时是该行中另一列的值)的问题。
条件不是问题。
I need an efficient way to do this:
我需要一种有效的方法来做到这一点:
df.loc[some condition it doesn't matter,
['a','b','c','d','e','f','g','x','y']]=df['z'],1,3,4,5,6,7,8,df['p']
Simplified example data
简化的示例数据
d = {'var' : pd.Series([10,61]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df=pd.DataFrame(d)
Conditionif var is not missing and first digit is less than 5
Resultmake df.x=df.z & df.y=1
条件如果无功是不是失踪,第一个数字是小于5
结果化妆df.x = df.z&df.y = 1
Here is psuedo code that doesn't work, but it is what I would want.
这是不起作用的伪代码,但这是我想要的。
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x','y']]=df['z'],1
but i get
但我明白了
ValueError: cannot set using a list-like indexer with a different length than the value
ValueError:无法使用长度与值不同的类似列表的索引器进行设置
ideal output
理想输出
c var x z y
0 100 10 x x 1
1 0 61 None x None
The code below works, but is too inefficient because i need to assign values to multiple columns.
下面的代码有效,但效率太低,因为我需要为多列分配值。
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x']]=df['z']
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['y']]=1
采纳答案by elyase
You can work row wise:
您可以按行工作:
def f(row):
if row['var'] is not None and int(str(row['var'])[0]) < 5:
row[['x', 'y']] = row['z'], 1
return row
>>> df.apply(f, axis=1)
c var x y z
0 100 10 x 1 x
1 0 61 None NaN x
To overwrite the original df:
要覆盖原始 df:
df = df.apply(f, axis=1)
回答by YS-L
This is one way of doing it:
这是一种方法:
import pandas as pd
import numpy as np
d = {'var' : pd.Series([1,6]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df = pd.DataFrame(d)
# Condition 1: if var is not missing
cond1 = ~df['var'].apply(np.isnan)
# Condition 2: first number is less than 5
cond2 = df['var'].apply(lambda x: int(str(x)[0])) < 5
mask = cond1 & cond2
df.ix[mask, 'x'] = df.ix[mask, 'z']
df.ix[mask, 'y'] = 1
print df
Output:
输出:
c var x y z
0 100 1 x 1 x
1 0 6 None None x
As you can see, the Boolean mask has to be applied on both side of the assignment, and you need to broadcast the value 1on the ycolumn. It is probably cleaner to split the steps into multiple lines.
如您所见,必须在赋值的两侧应用布尔掩码,并且您需要1在y列上广播值。将步骤分成多行可能更清晰。
Question updated, edit: More generally, since some assignments depend on the other columns, and some assignments are just broadcasting along the column, you can do it in two steps:
问题更新,编辑:更一般地,由于一些作业依赖于其他列,而有些作业只是沿着列广播,您可以分两步完成:
df.loc[conds, ['a','y']] = df.loc[conds, ['z','p']]
df.loc[conds, ['b','c','d','e','f','g','x']] = [1,3,4,5,6,7,8]
You may profile and see if this is efficient enough for your use case.
您可以分析并查看这对于您的用例是否足够有效。

