如何为 Pandas 数据框的某些选定行集体设置多列的值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19346033/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to collectively set the values of multiple columns for certain selected rows for Pandas dataframe?
提问by bigbug
I have a dataframe dfwhich has 'TPrice','THigh','TLow','TOpen','TClose','TPCLOSE'columns, and now I want to set 'TPrice','THigh','TLow','TOpen','TClose'columns values to be same as 'TPCLOSE'column for the rows whose TPricecolumn value is zero.
我有一个df包含'TPrice','THigh','TLow','TOpen','TClose','TPCLOSE'列的数据框,现在我想将'TPrice','THigh','TLow','TOpen','TClose'列值设置'TPCLOSE'为与TPrice列值为零的行的列相同。
Show some rows whose TPrice is 0:
显示一些 TPrice 为 0 的行:
>>> df[df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE']][0:5]
TPrice THigh TLow TOpen TClose TPCLOSE
13 0 0 0 0 0 4.19
19 0 0 0 0 0 7.74
32 0 0 0 0 0 3.27
43 0 0 0 0 0 12.98
60 0 0 0 0 0 7.48
Then assignment :
然后赋值:
>>> df[df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose']] = df['TPCLOSE']
But Pandas doesn't really change df , for below code still can find some rows:
但是 Pandas 并没有真正改变 df ,因为下面的代码仍然可以找到一些行:
>>> df[df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE']][0:5]
TPrice THigh TLow TOpen TClose TPCLOSE
13 0 0 0 0 0 4.19
19 0 0 0 0 0 7.74
32 0 0 0 0 0 3.27
43 0 0 0 0 0 12.98
60 0 0 0 0 0 7.48
So how to do ?
那怎么办?
Update for Jeff solution:
Jeff 解决方案的更新:
>>> quote_df = get_quote()
>>> quote_df[quote_df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE','RT','TVol']][0:5]
TPrice THigh TLow TOpen TClose TPCLOSE RT TVol
13 0 0 0 0 0 4.19 -100 0
32 0 0 0 0 0 3.27 -100 0
43 0 0 0 0 0 12.98 -100 0
45 0 0 0 0 0 26.74 -100 0
60 0 0 0 0 0 7.48 -100 0
>>> row_selection = quote_df['TPrice']==0
>>> col_selection = ['THigh','TLow','TOpen','TClose']
>>> for col in col_selection:
... quote_df.loc[row_selection, col] = quote_df['TPCLOSE']
...
>>> quote_df[quote_df['TPrice']==0][['TPrice','THigh','TLow','TOpen','TClose','TPCLOSE','RT','TVol']][0:5]
TPrice THigh TLow TOpen TClose TPCLOSE RT TVol
13 0 4.19 4.19 4.19 4.19 4.19 -100 0
32 0 4.19 4.19 4.19 4.19 3.27 -100 0
43 0 4.19 4.19 4.19 4.19 12.98 -100 0
45 0 4.19 4.19 4.19 4.19 26.74 -100 0
60 0 4.19 4.19 4.19 4.19 7.48 -100 0
>>>
回答by Jeff
This operation is not automatically broadcast, so you need to do something like this
这个操作不是自动广播的,所以你需要做这样的事情
In [17]: df = DataFrame(dict(A = [1,2,0,0,0],B=[0,0,0,10,11],C=[3,4,5,6,7]))
In [18]: df
Out[18]:
A B C
0 1 0 3
1 2 0 4
2 0 0 5
3 0 10 6
4 0 11 7
Compute which rows you want to mask first (otherwise they might change as you go) if you are modifying A (as you are here)
如果您正在修改 A(就像您在这里一样),请计算您要首先屏蔽哪些行(否则它们可能会随着您的进行而改变)
In [19]: mask = df['A'] == 0
In [20]: for col in ['A','B']:
....: df.loc[mask,col] = df['C']
....:
In [21]: df
Out[21]:
A B C
0 1 0 3
1 2 0 4
2 5 5 5
3 6 6 6
4 7 7 7
This requires a change to make it more natural (as you are assigning a series on the rhs to a dataframe on the lhs, which right now doesn't broadcast like you would think it should) https://github.com/pydata/pandas/issues/5206
这需要进行更改以使其更自然(因为您将 rhs 上的系列分配给 lhs 上的数据帧,该数据帧现在不会像您认为的那样广播) https://github.com/pydata/Pandas/问题/5206
回答by Def_Os
>>> import pandas as pd
>>> test=pd.DataFrame({'A': [0,1,2], 'B': [3,4,5], 'C': [6,7,8]})
>>> test
A B C
0 0 3 6
1 1 4 7
2 2 5 8
>>> test.apply(lambda x: x.where(test.A!=0, test.C), axis=0)
A B C
0 6 6 6
1 1 4 7
2 2 5 8

