pandas 映射熊猫数据框中的值范围
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50098025/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Mapping ranges of values in pandas dataframe
提问by E. Sommer
Apologies if this has been asked before, but I looked extensively without results.
抱歉,如果之前有人问过这个问题,但我进行了广泛的调查,但没有结果。
import pandas as pd
import numpy as np
df = pd.DataFrame(data = np.random.randint(1,10,10),columns=['a'])
a
0 7
1 8
2 8
3 3
4 1
5 1
6 2
7 8
8 6
9 6
I'd like to create a new column b
that maps several values of a
according to some rule, say a=[1,2,3] is 1, a = [4,5,6,7] is 2, a = [8,9,10] is 3. one-to-one mapping is clear to me, but what if I want to map by a list of values or a range?
我想创建一个新列b
,a
根据某些规则映射多个值,例如 a=[1,2,3] 是 1,a = [4,5,6,7] 是 2,a = [8 ,9,10] 是 3. 一对一映射对我来说很清楚,但是如果我想按值列表或范围映射怎么办?
I tought along these lines...
我沿着这些路线坚持...
df['b'] = df['a'].map({[1,2,3]:1,range(4,7):2,[8,9,10]:3})
回答by jpp
There are a few alternatives.
有几种选择。
Pandas via pd.cut
/ NumPy via np.digitize
Pandas 通过pd.cut
/ NumPy 通过np.digitize
You can construct a list of boundaries, then use specialist library functions. This is described in @EdChum's solution, and also in this answer.
您可以构建一个边界列表,然后使用专业的库函数。这在@EdChum 的解决方案以及这个答案中都有描述。
NumPy via np.select
NumPy 通过 np.select
df = pd.DataFrame(data=np.random.randint(1,10,10), columns=['a'])
criteria = [df['a'].between(1, 3), df['a'].between(4, 7), df['a'].between(8, 10)]
values = [1, 2, 3]
df['b'] = np.select(criteria, values, 0)
The elements of criteria
are Boolean series, so for listsof values, you can use df['a'].isin([1, 3])
, etc.
的元素criteria
是布尔系列,因此对于值列表,您可以使用df['a'].isin([1, 3])
等。
Dictionary mapping via range
字典映射通过 range
d = {range(1, 4): 1, range(4, 8): 2, range(8, 11): 3}
df['c'] = df['a'].apply(lambda x: next((v for k, v in d.items() if x in k), 0))
print(df)
a b c
0 1 1 1
1 7 2 2
2 5 2 2
3 1 1 1
4 3 1 1
5 5 2 2
6 4 2 2
7 4 2 2
8 9 3 3
9 3 1 1
回答by EdChum
IIUC you could use cut
to achieve this:
您可以使用 IIUCcut
来实现这一点:
In[33]:
pd.cut(df['a'], bins=[0,3,7,11], right=True, labels=False)+1
Out[33]:
0 2
1 3
2 3
3 1
4 1
5 1
6 1
7 3
8 2
9 2
Here you'd pass the cutoff values to cut
, and this will categorise your values, by passing labels=False
it will give them an ordinal value (zero-based) so you just +1
to them
在这里,您将截止值传递给cut
,这将对您的值进行分类,通过传递labels=False
它将为它们提供一个序数值(从零开始),因此您只需+1
将它们传递给它们
Here you can see how the cuts were calculated:
在这里您可以看到切割是如何计算的:
In[34]:
pd.cut(df['a'], bins=[0,3,7,11], right=True)
Out[34]:
0 (3, 7]
1 (7, 11]
2 (7, 11]
3 (0, 3]
4 (0, 3]
5 (0, 3]
6 (0, 3]
7 (7, 11]
8 (3, 7]
9 (3, 7]
Name: a, dtype: category
Categories (3, interval[int64]): [(0, 3] < (3, 7] < (7, 11]]