从 Pandas 数据框中的其他列分配列的值

Question

提问by Chet Meinzer

How do i assign columns in my dataframe to be equal to another column if/where condition is met?

如果/在哪里满足条件，我如何将数据框中的列分配为等于另一列？

Update
The problem
I need to assign many columns values (and sometimes a value from another column in that row) when the condition is met.

The condition is not the problem.

更新满足条件时
，
我需要分配许多列值（有时是该行中另一列的值）的问题。

条件不是问题。

I need an efficient way to do this:

我需要一种有效的方法来做到这一点：

df.loc[some condition it doesn't matter,
['a','b','c','d','e','f','g','x','y']]=df['z'],1,3,4,5,6,7,8,df['p']

Simplified example data

简化的示例数据

d = {'var' : pd.Series([10,61]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df=pd.DataFrame(d)

Conditionif var is not missing and first digit is less than 5
Resultmake df.x=df.z & df.y=1

条件如果无功是不是失踪，第一个数字是小于5
结果化妆df.x = df.z＆df.y = 1

Here is psuedo code that doesn't work, but it is what I would want.

这是不起作用的伪代码，但这是我想要的。

df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x','y']]=df['z'],1

but i get

但我明白了

ValueError: cannot set using a list-like indexer with a different length than the value

ValueError：无法使用长度与值不同的类似列表的索引器进行设置

ideal output

理想输出

     c  var     x     z     y
0  100    10    x     x     1
1    0    61    None  x  None

The code below works, but is too inefficient because i need to assign values to multiple columns.

下面的代码有效，但效率太低，因为我需要为多列分配值。

df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x']]=df['z']
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['y']]=1

Answer 1

采纳答案by elyase

You can work row wise:

您可以按行工作：

def f(row):
    if row['var'] is not None and int(str(row['var'])[0]) < 5:
        row[['x', 'y']] = row['z'], 1
    return row

>>> df.apply(f, axis=1)
     c  var     x   y  z
0  100   10     x   1  x
1    0   61  None NaN  x

To overwrite the original df:

要覆盖原始 df：

df = df.apply(f, axis=1)

Answer 2

回答by YS-L

This is one way of doing it:

这是一种方法：

import pandas as pd
import numpy as np

d = {'var' : pd.Series([1,6]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df = pd.DataFrame(d)

# Condition 1: if var is not missing
cond1 = ~df['var'].apply(np.isnan)
# Condition 2: first number is less than 5
cond2 = df['var'].apply(lambda x: int(str(x)[0])) < 5
mask = cond1 & cond2
df.ix[mask, 'x'] = df.ix[mask, 'z']
df.ix[mask, 'y'] = 1
print df

Output:

输出：

     c  var     x     y  z
0  100    1     x     1  x
1    0    6  None  None  x

As you can see, the Boolean mask has to be applied on both side of the assignment, and you need to broadcast the value 1on the ycolumn. It is probably cleaner to split the steps into multiple lines.

如您所见，必须在赋值的两侧应用布尔掩码，并且您需要1在y列上广播值。将步骤分成多行可能更清晰。

Question updated, edit: More generally, since some assignments depend on the other columns, and some assignments are just broadcasting along the column, you can do it in two steps:

问题更新，编辑：更一般地，由于一些作业依赖于其他列，而有些作业只是沿着列广播，您可以分两步完成：

df.loc[conds, ['a','y']] = df.loc[conds, ['z','p']]
df.loc[conds, ['b','c','d','e','f','g','x']] = [1,3,4,5,6,7,8]

You may profile and see if this is efficient enough for your use case.

您可以分析并查看这对于您的用例是否足够有效。

从 Pandas 数据框中的其他列分配列的值

提问by Chet Meinzer

采纳答案by elyase

回答by YS-L

相关推荐

最近更新

标签

从 Pandas 数据框中的其他列分配列的值

提问by Chet Meinzer

采纳答案by elyase

回答by YS-L

相关推荐

Pandas DataFrame 中索引和列的级别（深度）数

pandas.hashtable.PyObjectHashTable.get_item 中的 Python 熊猫 groupby 键错误

按时间索引时，将 Pandas 数据帧拆分为训练集和测试集

pandas - 如何仅将 DataFrame 的选定列保存到 HDF5

相关推荐

最近更新

标签