Python Pandas - 根据行值有条件地为新列选择数据的源列

Question

提问by aensm

Is there a pandas function that allows selection from different columns based on a condition? This is analogous to a CASE statement in a SQL Select clause. For example, say I have the following DataFrame:

是否有允许根据条件从不同列中进行选择的 Pandas 函数？这类似于 SQL Select 子句中的 CASE 语句。例如，假设我有以下 DataFrame：

foo = DataFrame(
    [['USA',1,2],
    ['Canada',3,4],
    ['Canada',5,6]], 
    columns = ('Country', 'x', 'y')
)

I want to select from column 'x' when Country=='USA', and from column 'y' when Country=='Canada', resulting in something like the following:

我想在 Country=='USA' 时从 'x' 列中选择，当 Country=='Canada' 时从 'y' 列中进行选择，结果如下所示：

  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]

Answer 1

采纳答案by falsetru

Using DataFrame.where's otherargument and pandas.concat:

使用DataFrame.where的other参数和pandas.concat：

>>> import pandas as pd
>>>
>>> foo = pd.DataFrame([
...     ['USA',1,2],
...     ['Canada',3,4],
...     ['Canada',5,6]
... ], columns=('Country', 'x', 'y'))
>>>
>>> z = foo['x'].where(foo['Country'] == 'USA', foo['y'])
>>> pd.concat([foo['Country'], z], axis=1)
  Country  x
0     USA  1
1  Canada  4
2  Canada  6

If you want zas column name, specify keys:

如果您想要z作为列名，请指定keys：

>>> pd.concat([foo['Country'], z], keys=['Country', 'z'], axis=1)
  Country  z
0     USA  1
1  Canada  4
2  Canada  6

Answer 2

回答by EdChum

This would work:

这会起作用：

In [84]:

def func(x):
    if x['Country'] == 'USA':
        return x['x']
    if x['Country'] == 'Canada':
        return x['y']
    return NaN
foo['z'] = foo.apply(func(row), axis = 1)
foo
Out[84]:
  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]

You can use loc:

您可以使用loc：

In [137]:

foo.loc[foo['Country']=='Canada','z'] = foo['y']
foo.loc[foo['Country']=='USA','z'] = foo['x']
foo
Out[137]:
  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]

EDIT

编辑

Although unwieldy using locwill scale better with larger dataframes as the apply here is called for every row whilst using boolean indexing will be vectorised.

尽管笨拙的使用loc将随着更大的数据帧扩展得更好，因为这里的应用是为每一行调用的，而使用布尔索引将被向量化。

Answer 3

回答by Alexander McFarlane

Here is a generic solution to selecting arbitrary columns given a value in another column.

这是在给定另一列中的值的情况下选择任意列的通用解决方案。

This has the additional benefit of separating the lookup logic in a simple dictstructure which makes it easy to modify.

这有一个额外的好处，即在一个简单的dict结构中分离查找逻辑，使其易于修改。

import pandas as pd
df = pd.DataFrame(
    [['UK', 'burgers', 4, 5, 6],
    ['USA', 4, 7, 9, 'make'],
    ['Canada', 6, 4, 6, 'you'],
    ['France', 3, 6, 'fat', 8]],
    columns = ('Country', 'a', 'b', 'c', 'd')
)

I extend to an operation where a conditional result is stored in an external lookup structure (dict)

我扩展到将条件结果存储在外部查找结构 ( dict) 中的操作

lookup = {'Canada': 'd', 'France': 'c', 'UK': 'a', 'USA': 'd'}

Loop the pd.DataFramefor each column stored in the dictand use the values in the condition table to determine which column to select

循环pd.DataFrame存储在中的每一列dict并使用条件表中的值来确定要选择的列

for k,v in lookup.iteritems():
    filt = df['Country'] == k
    df.loc[filt, 'result'] = df.loc[filt, v] # modifies in place

To give the life lesson

给人生上一课

In [69]: df
Out[69]:
  Country        a  b    c     d   result
0      UK  burgers  4    5     6  burgers
1     USA        4  7    9  make     make
2  Canada        6  4    6   you      you
3  France        3  6  fat     8      fat

Answer 4

回答by Super Mario

My try:

我的尝试：

temp1 = foo[(foo['Country'] == 'Canada')][['Country', 'y']].rename(columns={'y': 'z'})
temp2 = foo[(foo['Country'] == 'USA')][['Country', 'x']].rename(columns={'x': 'z'})
wanted_df = pd.concat([temp1, temp2])

Python Pandas - 根据行值有条件地为新列选择数据的源列

提问by aensm

采纳答案by falsetru

回答by EdChum

回答by Alexander McFarlane

回答by Super Mario

相关推荐

最近更新

标签

Python Pandas - 根据行值有条件地为新列选择数据的源列

提问by aensm

采纳答案by falsetru

回答by EdChum

回答by Alexander McFarlane

回答by Super Mario

相关推荐

什么是纯 Python 等效于 IPython 魔术函数调用 %matplotlib inline？

Python 以编程方式更新 YAML 文件

Python 将 APIView 添加到 Django REST Framework 可浏览 API

python：无法打开文件“django-admin.py”：[Errno 2] 没有那个文件或目录

相关推荐

最近更新

标签