Pandas DataFrame 应用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11794935/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:47:20  来源:igfitidea点击:

Pandas DataFrame Apply

pythonpandas

提问by Freddie Witherden

I have a Pandas DataFramewith four columns, A, B, C, D. It turns out that, sometimes, the values of Band Ccan be 0. I therefore wish to obtain the following:

我有一个熊猫DataFrame四列,A, B, C, D。事实证明,有时,值BC0。因此,我希望获得以下信息:

B[i] = B[i] if B[i] else min(A[i], D[i])
C[i] = C[i] if C[i] else max(A[i], D[i])

where I have used ito indicate a run over all rows of the frame. With Pandas it is easy to find the rows which contain zero columns:

我曾经用来i表示在框架的所有行上运行。使用 Pandas 很容易找到包含零列的行:

df[df.B == 0] and df[df.C == 0]

however I have no idea how to easily perform the above transformation. I can think of various inefficient and inelegant methods (forloops over the entire frame) but nothing simple.

但是我不知道如何轻松执行上述转换。我可以想到各种低效和不优雅的方法(for在整个框架上循环),但没有什么简单的。

回答by Wouter Overmeire

A combination of boolean indexing and apply can do the trick. Below an example on replacing zero element for column C.

布尔索引和应用的组合可以解决问题。下面是替换 C 列的零元素的示例。

In [22]: df
Out[22]:
   A  B  C  D
0  8  3  5  8
1  9  4  0  4
2  5  4  3  8
3  4  8  5  1

In [23]: bi = df.C==0

In [24]: df.ix[bi, 'C'] = df[bi][['A', 'D']].apply(max, axis=1)

In [25]: df
Out[25]:
   A  B  C  D
0  8  3  5  8
1  9  4  9  4
2  5  4  3  8
3  4  8  5  1

回答by THM

Try 'iterrows' DataFrame class method for efficiently iterating through the rows of a DataFrame.See chapter 6.7.2 of the pandas 0.8.1 guide.

尝试使用“iterrows”DataFrame 类方法来有效地遍历 DataFrame 的行。请参阅 pandas 0.8.1 指南的第 6.7.2 章。

from pandas import *
import numpy as np

df = DataFrame({'A' : [5,6,3], 'B' : [0,0,0], 'C':[0,0,0], 'D' : [3,4,5]})

for idx, row in df.iterrows():
    if row['B'] == 0:
        row['B'] = min(row['A'], row['D'])
    if row['C'] == 0:
        row['C'] = min(row['A'], row['D'])