pandas DataFrame:添加其值为现有列的分位数/排名的列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38356156/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
DataFrame: add column whose values are the quantile number/rank of an existing column?
提问by luca
I have a DataFrame with some columns. I'd like to add a new column where each row value is the quantile rank of one existing column.
我有一个带有一些列的 DataFrame。我想添加一个新列,其中每一行值是一个现有列的分位数等级。
I can use DataFrame.rank to rank a column, but then I don't know how to get the quantile number of this ranked value and to add this quantile number as a new colunm.
我可以使用 DataFrame.rank 对列进行排名,但随后我不知道如何获取此排名值的分位数并将此分位数添加为新列。
Example: if this is my DataFrame
示例:如果这是我的 DataFrame
df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=['a', 'b'])
a b
0 1 1
1 2 10
2 3 100
3 4 100
and I'd like to know the quantile number (using 2 quantiles) of column b. I'd expect this result:
我想知道 b 列的分位数(使用 2 个分位数)。我希望这个结果:
a b quantile
0 1 1 1
1 2 10 1
2 3 100 2
3 4 100 2
回答by luca
I discoveredit is quite easy:
我发现这很容易:
df['quantile'] = pd.qcut(df['b'], 2, labels=False)
a b quantile
0 1 1 0
1 2 10 0
2 3 100 1
3 4 100 1
Interesting to know "difference between pandas.qcut and pandas.cut"
回答by jeyoor
You can use DataFrame.quantilewith q=[0.25, 0.5, 0.75] on the existing column to produce a quartile column.
您可以在现有列上使用带有 q=[0.25, 0.5, 0.75] 的DataFrame.quantile来生成四分位数列。
Then, you can DataFrame.rankon that quartile column.
然后,您可以对该四分位数列进行DataFrame.rank。
See below for an example of adding a quartile column:
有关添加四分位数列的示例,请参见下文:
import pandas as pd
d = {'one' : pd.Series([40., 45., 50., 55, 60, 65], index=['val1', 'val2', 'val3', 'val4', 'val5', 'val6'])}
df = pd.DataFrame(d)
quantile_frame = df.quantile(q=[0.25, 0.5, 0.75])
quantile_ranks = []
for index, row in df.iterrows():
if (row['one'] <= quantile_frame.ix[0.25]['one']):
quantile_ranks.append(1)
elif (row['one'] > quantile_frame.ix[0.25]['one'] and row['one'] <= quantile_frame.ix[0.5]['one']):
quantile_ranks.append(2)
elif (row['one'] > quantile_frame.ix[0.5]['one'] and row['one'] <= quantile_frame.ix[0.75]['one']):
quantile_ranks.append(3)
else:
quantile_ranks.append(4)
df['quartile'] = quantile_ranks
Note: There's probably a more idiomatic way to accomplish this with Pandas... but it's beyond me
注意:使用 Pandas 可能有一种更惯用的方法来实现这一点......但它超出了我的范围
回答by feetwet
df['quantile'] = pd.qcut(df['b'], 2, labels=False)
seems to tend to throw a SettingWithCopyWarning
.
df['quantile'] = pd.qcut(df['b'], 2, labels=False)
似乎倾向于抛出一个SettingWithCopyWarning
.
The only generalway I have found of doing this without complaints is like:
我发现这样做没有抱怨的唯一一般方法是:
quantiles = pd.qcut(df['b'], 2, labels=False)
df = df.assign(quantile=quantiles.values)
This will assign the quantile rank values as a new DataFrame
column df['quantile']
.
这会将分位数排名值指定为新DataFrame
列df['quantile']
。
回答by Abhishek Singh
df.sort_values(['b'],inplace = True)
df.reset_index(inplace = True,drop = True)
df.reset_index(inplace = True)
df.rename(columns = {'index':'row_num'},inplace = True)
df['quantile'] = df['row_num'].apply(lambda x: math.ceil(10*(x+1)/df.shape[0]))
I used to use this, but I guess I can use quantile
我曾经使用过这个,但我想我可以使用分位数