pandas np.where 多个返回值

Question

提问by DGraham

Using pandas and numpy I am trying to process a column in a dataframe, and want to create a new column with values relating to it. So if in column x the value 1 is present, in the new column it would be a, for value 2 it would be b etc

使用 pandas 和 numpy 我正在尝试处理数据框中的一列，并希望创建一个包含与其相关的值的新列。因此，如果在 x 列中存在值 1，则在新列中它将是 a，对于值 2 它将是 b 等

I can do this for single conditions, i.e

我可以针对单个条件执行此操作，即

df['new_col'] = np.where(df['col_1'] == 1, a, n/a)

And I can find example of multiple conditions i.e if x = 3 or x = 4 the value should a, but not to do something like if x = 3 the value should be a and if x = 4 the value be c.

我可以找到多个条件的示例，即如果 x = 3 或 x = 4，则该值应该是 a，但不要做类似如果 x = 3 的值应该是 a 并且如果 x = 4 的值是 c 的事情。

I tried simply running two lines of code such as :

我尝试简单地运行两行代码，例如：

df['new_col'] = np.where(df['col_1'] == 1, a, n/a)
df['new_col'] = np.where(df['col_1'] == 2, b, n/a)

But obviously the second line overwrites. Am I missing something crucial?

但显然第二行会覆盖。我错过了一些重要的东西吗？

Answer 1

回答by jezrael

I think you can use loc:

我认为你可以使用loc：

df.loc[(df['col_1'] == 1, 'new_col')] = a
df.loc[(df['col_1'] == 2, 'new_col')] = b

Or:

或者：

df['new_col'] = np.where(df['col_1'] == 1, a, np.where(df['col_1'] == 2, b, np.nan))

Answer 2

回答by Stop harming Monica

I think numpy choose()is the best option for you.

我认为 numpychoose()是您的最佳选择。

import numpy as np
choices = 'abcde'
N = 10
np.random.seed(0)
data = np.random.randint(1, len(choices) + 1, size=N)
print(data)
print(np.choose(data - 1, choices))

Output:

输出：

[5 1 4 4 4 2 4 3 5 1]
['e' 'a' 'd' 'd' 'd' 'b' 'd' 'c' 'e' 'a']

Answer 3

回答by SpeedCoder5

Use the pandas Series.mapinstead of where.

使用 pandas Series.map而不是 where。

import pandas as pd
df = pd.DataFrame({'col_1' : [1,2,4,2]})
print(df)

def ab_ify(v):
    if v == 1:
        return 'a'
    elif v == 2:
        return 'b'
    else:
        return None

df['new_col'] = df['col_1'].map(ab_ify)
print(df)

# output:
#
#    col_1
# 0      1
# 1      2
# 2      4
# 3      2
#    col_1 new_col
# 0      1       a
# 1      2       b
# 2      4    None
# 3      2       b

Answer 4

回答by rde

you could define a dict with your desired transformations. Then loop through the a DataFrame column and fill it.

你可以用你想要的转换定义一个字典。然后循环遍历 DataFrame 列并填充它。

There may a more elegant ways, but this will work:

可能有更优雅的方法，但这会起作用：

# create a dummy DataFrame
df = pd.DataFrame( np.random.randint(2, size=(6,4)), columns=['col_1', 'col_2', 'col_3', 'col_4'],  index=range(6)  )

# create a dict with your desired substitutions:
swap_dict = {  0 : 'a',
               1 : 'b',
             999 : 'zzz',  }

# introduce new column and fill with swapped information:
for i in df.index:
    df.loc[i, 'new_col'] = swap_dict[  df.loc[i, 'col_1']  ]

print df

returns something like:

返回类似：

   col_1  col_2  col_3  col_4 new_col
0      1      1      1      1       b
1      1      1      1      1       b
2      0      1      1      0       a
3      0      1      0      0       a
4      0      0      1      1       a
5      0      0      1      0       a

pandas np.where 多个返回值

提问by DGraham

回答by jezrael

回答by Stop harming Monica

回答by SpeedCoder5

回答by rde

相关推荐

最近更新

标签

pandas np.where 多个返回值

提问by DGraham

回答by jezrael

回答by Stop harming Monica

回答by SpeedCoder5

回答by rde

相关推荐

pandas ipython笔记本中的熊猫子图标题大小

Pandas 数据框：按两列分组，然后对另一列求平均值

将 psycopg2 DictRow 查询转换为 Pandas 数据框

pandas 插入 NaN 的 Python 'map' 函数，可以改为返回原始值吗？

相关推荐

最近更新

标签