如何为 Pandas 数据框的某些选定行集体设置多列的值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19346033/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:14:17  来源:igfitidea点击:

How to collectively set the values of multiple columns for certain selected rows for Pandas dataframe?

pythonpandas

提问by bigbug

I have a dataframe dfwhich has 'TPrice','THigh','TLow','TOpen','TClose','TPCLOSE'columns, and now I want to set 'TPrice','THigh','TLow','TOpen','TClose'columns values to be same as 'TPCLOSE'column for the rows whose TPricecolumn value is zero.

我有一个df包含'TPrice','THigh','TLow','TOpen','TClose','TPCLOSE'列的数据框,现在我想将'TPrice','THigh','TLow','TOpen','TClose'列值设置'TPCLOSE'为与TPrice列值为零的行的列相同。

Show some rows whose TPrice is 0:

显示一些 TPrice 为 0 的行:

>>> df[df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE']][0:5]
    TPrice  THigh  TLow  TOpen  TClose  TPCLOSE
13       0      0     0      0       0     4.19
19       0      0     0      0       0     7.74
32       0      0     0      0       0     3.27
43       0      0     0      0       0    12.98
60       0      0     0      0       0     7.48

Then assignment :

然后赋值:

>>> df[df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose']] = df['TPCLOSE']

But Pandas doesn't really change df , for below code still can find some rows:

但是 Pandas 并没有真正改变 df ,因为下面的代码仍然可以找到一些行:

>>> df[df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE']][0:5]
    TPrice  THigh  TLow  TOpen  TClose  TPCLOSE
13       0      0     0      0       0     4.19
19       0      0     0      0       0     7.74
32       0      0     0      0       0     3.27
43       0      0     0      0       0    12.98
60       0      0     0      0       0     7.48

So how to do ?

那怎么办?

Update for Jeff solution:

Jeff 解决方案的更新:

>>> quote_df = get_quote()
>>> quote_df[quote_df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE','RT','TVol']][0:5]
    TPrice  THigh  TLow  TOpen  TClose  TPCLOSE   RT  TVol
13       0      0     0      0       0     4.19 -100     0
32       0      0     0      0       0     3.27 -100     0
43       0      0     0      0       0    12.98 -100     0
45       0      0     0      0       0    26.74 -100     0
60       0      0     0      0       0     7.48 -100     0
>>> row_selection = quote_df['TPrice']==0
>>> col_selection = ['THigh','TLow','TOpen','TClose']
>>> for col in col_selection:
...     quote_df.loc[row_selection, col] = quote_df['TPCLOSE']
... 
>>> quote_df[quote_df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE','RT','TVol']][0:5]
    TPrice  THigh  TLow  TOpen  TClose  TPCLOSE   RT  TVol
13       0   4.19  4.19   4.19    4.19     4.19 -100     0
32       0   4.19  4.19   4.19    4.19     3.27 -100     0
43       0   4.19  4.19   4.19    4.19    12.98 -100     0
45       0   4.19  4.19   4.19    4.19    26.74 -100     0
60       0   4.19  4.19   4.19    4.19     7.48 -100     0
>>> 

回答by Jeff

This operation is not automatically broadcast, so you need to do something like this

这个操作不是自动广播的,所以你需要做这样的事情

In [17]: df = DataFrame(dict(A = [1,2,0,0,0],B=[0,0,0,10,11],C=[3,4,5,6,7]))

In [18]: df
Out[18]: 
   A   B  C
0  1   0  3
1  2   0  4
2  0   0  5
3  0  10  6
4  0  11  7

Compute which rows you want to mask first (otherwise they might change as you go) if you are modifying A (as you are here)

如果您正在修改 A(就像您在这里一样),请计算您要首先屏蔽哪些行(否则它们可能会随着您的进行而改变)

In [19]: mask = df['A'] == 0

In [20]: for col in ['A','B']:
   ....:     df.loc[mask,col] = df['C']
   ....:     

In [21]: df
Out[21]: 
   A  B  C
0  1  0  3
1  2  0  4
2  5  5  5
3  6  6  6
4  7  7  7

This requires a change to make it more natural (as you are assigning a series on the rhs to a dataframe on the lhs, which right now doesn't broadcast like you would think it should) https://github.com/pydata/pandas/issues/5206

这需要进行更改以使其更自然(因为您将 rhs 上的系列分配给 lhs 上的数据帧,该数据帧现在不会像您认为的那样广播) https://github.com/pydata/Pandas/问题/5206

回答by Def_Os

>>> import pandas as pd
>>> test=pd.DataFrame({'A': [0,1,2], 'B': [3,4,5], 'C': [6,7,8]})
>>> test
   A  B  C
0  0  3  6
1  1  4  7
2  2  5  8
>>> test.apply(lambda x: x.where(test.A!=0, test.C), axis=0)
   A  B  C
0  6  6  6
1  1  4  7
2  2  5  8