pandas DataFrame：添加其值为现有列的分位数/排名的列？

Question

提问by luca

I have a DataFrame with some columns. I'd like to add a new column where each row value is the quantile rank of one existing column.

我有一个带有一些列的 DataFrame。我想添加一个新列，其中每一行值是一个现有列的分位数等级。

I can use DataFrame.rank to rank a column, but then I don't know how to get the quantile number of this ranked value and to add this quantile number as a new colunm.

我可以使用 DataFrame.rank 对列进行排名，但随后我不知道如何获取此排名值的分位数并将此分位数添加为新列。

Example: if this is my DataFrame

示例：如果这是我的 DataFrame

df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=['a', 'b'])

   a    b
0  1    1
1  2   10
2  3  100
3  4  100

and I'd like to know the quantile number (using 2 quantiles) of column b. I'd expect this result:

我想知道 b 列的分位数（使用 2 个分位数）。我希望这个结果：

   a    b  quantile
0  1    1    1
1  2   10    1
2  3  100    2
3  4  100    2

Answer 1

回答by luca

I discoveredit is quite easy:

我发现这很容易：

df['quantile'] = pd.qcut(df['b'], 2, labels=False)

   a    b  quantile
0  1    1         0
1  2   10         0
2  3  100         1
3  4  100         1

Interesting to know "difference between pandas.qcut and pandas.cut"

有趣的是“ pandas.qcut 和 pandas.cut 之间的区别”

Answer 2

回答by jeyoor

You can use DataFrame.quantilewith q=[0.25, 0.5, 0.75] on the existing column to produce a quartile column.

您可以在现有列上使用带有 q=[0.25, 0.5, 0.75] 的DataFrame.quantile来生成四分位数列。

Then, you can DataFrame.rankon that quartile column.

然后，您可以对该四分位数列进行DataFrame.rank。

See below for an example of adding a quartile column:

有关添加四分位数列的示例，请参见下文：

import pandas as pd

d = {'one' : pd.Series([40., 45., 50., 55, 60, 65], index=['val1', 'val2', 'val3', 'val4', 'val5', 'val6'])}
df = pd.DataFrame(d)

quantile_frame = df.quantile(q=[0.25, 0.5, 0.75])
quantile_ranks = []
for index, row in df.iterrows():
    if (row['one'] <= quantile_frame.ix[0.25]['one']):
        quantile_ranks.append(1)
    elif (row['one'] > quantile_frame.ix[0.25]['one'] and row['one'] <= quantile_frame.ix[0.5]['one']):
        quantile_ranks.append(2)
    elif (row['one'] > quantile_frame.ix[0.5]['one'] and row['one'] <= quantile_frame.ix[0.75]['one']):
        quantile_ranks.append(3)
    else:
        quantile_ranks.append(4)

df['quartile'] = quantile_ranks

Note: There's probably a more idiomatic way to accomplish this with Pandas... but it's beyond me

注意：使用 Pandas 可能有一种更惯用的方法来实现这一点......但它超出了我的范围

Answer 3

回答by feetwet

df['quantile'] = pd.qcut(df['b'], 2, labels=False)seems to tend to throw a SettingWithCopyWarning.

df['quantile'] = pd.qcut(df['b'], 2, labels=False)似乎倾向于抛出一个SettingWithCopyWarning.

The only generalway I have found of doing this without complaints is like:

我发现这样做没有抱怨的唯一一般方法是：

quantiles = pd.qcut(df['b'], 2, labels=False)
df = df.assign(quantile=quantiles.values)

This will assign the quantile rank values as a new DataFramecolumn df['quantile'].

这会将分位数排名值指定为新DataFrame列df['quantile']。

A solution for a more generalized case, in which one wants to partition the cut by multiple columns, is given here.

此处给出了一种更通用的情况的解决方案，在这种情况下，人们希望通过多列对切割进行分区。

Answer 4

回答by Abhishek Singh

df.sort_values(['b'],inplace = True)
df.reset_index(inplace = True,drop = True)
df.reset_index(inplace = True)
df.rename(columns = {'index':'row_num'},inplace = True)
df['quantile'] = df['row_num'].apply(lambda x: math.ceil(10*(x+1)/df.shape[0]))

I used to use this, but I guess I can use quantile

我曾经使用过这个，但我想我可以使用分位数

pandas DataFrame：添加其值为现有列的分位数/排名的列？

提问by luca

回答by luca

回答by jeyoor

回答by feetwet

回答by Abhishek Singh

相关推荐

最近更新

标签

pandas DataFrame：添加其值为现有列的分位数/排名的列？

提问by luca

回答by luca

回答by jeyoor

回答by feetwet

回答by Abhishek Singh

相关推荐

pandas 确认两个熊猫数据框的相等性？

pandas 如何在python中流式传输和操作大型数据文件

Hive 数据到 Pandas 数据框

用 mode() 填充 NAN 数据不起作用 -Pandas

相关推荐

最近更新

标签