pandas 映射熊猫数据框中的值范围

Question

提问by E. Sommer

Apologies if this has been asked before, but I looked extensively without results.

抱歉，如果之前有人问过这个问题，但我进行了广泛的调查，但没有结果。

import pandas as pd    
import numpy as np    
df = pd.DataFrame(data = np.random.randint(1,10,10),columns=['a'])    

   a
0  7
1  8
2  8
3  3
4  1
5  1
6  2
7  8
8  6
9  6

I'd like to create a new column bthat maps several values of aaccording to some rule, say a=[1,2,3] is 1, a = [4,5,6,7] is 2, a = [8,9,10] is 3. one-to-one mapping is clear to me, but what if I want to map by a list of values or a range?

我想创建一个新列b，a根据某些规则映射多个值，例如 a=[1,2,3] 是 1，a = [4,5,6,7] 是 2，a = [8 ,9,10] 是 3. 一对一映射对我来说很清楚，但是如果我想按值列表或范围映射怎么办？

I tought along these lines...

我沿着这些路线坚持...

df['b'] = df['a'].map({[1,2,3]:1,range(4,7):2,[8,9,10]:3})

Answer 1

回答by jpp

There are a few alternatives.

有几种选择。

Pandas via `pd.cut`/ NumPy via `np.digitize`

Pandas 通过`pd.cut`/ NumPy 通过`np.digitize`

You can construct a list of boundaries, then use specialist library functions. This is described in @EdChum's solution, and also in this answer.

您可以构建一个边界列表，然后使用专业的库函数。这在@EdChum 的解决方案以及这个答案中都有描述。

NumPy via `np.select`

NumPy 通过 `np.select`

df = pd.DataFrame(data=np.random.randint(1,10,10), columns=['a'])

criteria = [df['a'].between(1, 3), df['a'].between(4, 7), df['a'].between(8, 10)]
values = [1, 2, 3]

df['b'] = np.select(criteria, values, 0)

The elements of criteriaare Boolean series, so for listsof values, you can use df['a'].isin([1, 3]), etc.

的元素criteria是布尔系列，因此对于值列表，您可以使用df['a'].isin([1, 3])等。

Dictionary mapping via `range`

字典映射通过 `range`

d = {range(1, 4): 1, range(4, 8): 2, range(8, 11): 3}

df['c'] = df['a'].apply(lambda x: next((v for k, v in d.items() if x in k), 0))

print(df)

   a  b  c
0  1  1  1
1  7  2  2
2  5  2  2
3  1  1  1
4  3  1  1
5  5  2  2
6  4  2  2
7  4  2  2
8  9  3  3
9  3  1  1

Answer 2

回答by EdChum

IIUC you could use cutto achieve this:

您可以使用 IIUCcut来实现这一点：

In[33]:
pd.cut(df['a'], bins=[0,3,7,11], right=True, labels=False)+1

Out[33]: 
0    2
1    3
2    3
3    1
4    1
5    1
6    1
7    3
8    2
9    2

Here you'd pass the cutoff values to cut, and this will categorise your values, by passing labels=Falseit will give them an ordinal value (zero-based) so you just +1to them

在这里，您将截止值传递给cut，这将对您的值进行分类，通过传递labels=False它将为它们提供一个序数值（从零开始），因此您只需+1将它们传递给它们

Here you can see how the cuts were calculated:

在这里您可以看到切割是如何计算的：

In[34]:
pd.cut(df['a'], bins=[0,3,7,11], right=True)

Out[34]: 
0     (3, 7]
1    (7, 11]
2    (7, 11]
3     (0, 3]
4     (0, 3]
5     (0, 3]
6     (0, 3]
7    (7, 11]
8     (3, 7]
9     (3, 7]
Name: a, dtype: category
Categories (3, interval[int64]): [(0, 3] < (3, 7] < (7, 11]]

pandas 映射熊猫数据框中的值范围

提问by E. Sommer

回答by jpp

Pandas via `pd.cut`/ NumPy via `np.digitize`

Pandas 通过`pd.cut`/ NumPy 通过`np.digitize`

NumPy via `np.select`

NumPy 通过 `np.select`

Dictionary mapping via `range`

字典映射通过 `range`

回答by EdChum

相关推荐

最近更新

标签

pandas 映射熊猫数据框中的值范围

提问by E. Sommer

回答by jpp

Pandas via pd.cut/ NumPy via np.digitize

Pandas 通过pd.cut/ NumPy 通过np.digitize

NumPy via np.select

NumPy 通过 np.select

Dictionary mapping via range

字典映射通过 range

回答by EdChum

相关推荐

pandas 在特性pandas.series 中将-inf 值替换为np.nan

pandas 将字典添加到数据框的最佳方法

pandas 级别 NaN 必须与名称相同

pandas 如何在python中打印出数据框

相关推荐

最近更新

标签

Pandas via `pd.cut`/ NumPy via `np.digitize`

Pandas 通过`pd.cut`/ NumPy 通过`np.digitize`

NumPy via `np.select`

NumPy 通过 `np.select`

Dictionary mapping via `range`

字典映射通过 `range`