在 Pandas Dataframe 中删除标准差较低的列

Question

提问by Ashkan

Is there any way of doing this without writing a for loop?

有没有办法在不编写 for 循环的情况下做到这一点？

Suppose we have the following data:

假设我们有以下数据：

d = {'A': {-1: 0.19052041339798062,
      0: -0.0052531481871952871,
      1: -0.0022017467720961644,
      2: -0.051109629013311737,
      3: 0.18569441222621336},
     'B': {-1: 0.029181417300734112,
      0: -0.0031021862533310743,
      1: -0.014358516787430284,
      2: 0.0046386615308068877,
      3: 0.056676322314857898},
     'C': {-1: 0.071883343375205785,
      0: -0.011930096520251999,
      1: -0.011836365865654104,
      2: -0.0033930358388315237,
      3: 0.11812543193496111},
     'D': {-1: 0.17670604006475121,
      0: -0.088756293654161142,
      1: -0.093383245649534194,
      2: 0.095649943383654359,
      3: 0.51030339029516592},
     'E': {-1: 0.30273513342295627,
      0: -0.30640233455497284,
      1: -0.32698263145105921,
      2: 0.60257484810641992,
      3: 0.36859978928328413},
     'F': {-1: 0.25328469046380131,
      0: -0.063890702001567143,
      1: -0.10007720832198815,
      2: 0.08153164759036724,
      3: 0.36606175240021183},
     'G': {-1: 0.28764606940509913,
      0: -0.11022209861109525,
      1: -0.1264164305949009,
      2: 0.17030074112227081,
      3: 0.30100292424380881}}
df = pd.DataFrame(d)

I know I can get the std values by std_vals = df.std(), which gives the following result, and use these values to drop the columns one by one.

我知道我可以通过获得 std 值std_vals = df.std()，它给出以下结果，并使用这些值一一删除列。

In[]:
        pd.DataFrame(d).std()
Out[]:
        A    0.115374
        B    0.028435
        C    0.059394
        D    0.247617
        E    0.421117
        F    0.200776
        G    0.209710
        dtype: float64

However, I don't know how to use the Pandas indexing to drop the columns with low std values directly.

但是，我不知道如何使用 Pandas 索引直接删除具有低标准值的列。

Is there a way to do this, or I need to loop over each column?

有没有办法做到这一点，或者我需要遍历每一列？

Answer 1

回答by maxymoo

You can use the locmethod of a dataframe to select certain columns based on a Boolean indexer. Create the indexer like this (uses Numpy Array broadcasting to apply the condition to each column):

您可以使用loc数据框的方法基于布尔索引器选择某些列。像这样创建索引器（使用 Numpy Array 广播将条件应用于每一列）：

df.std() > 0.3

Out[84]: 
A    False
B    False
C    False
D    False
E     True
F    False
G    False
dtype: bool

Then call locwith :in the first position to indicate that you want to return all rows:

然后在第一个位置调用locwith:表示要返回所有行：

df.loc[:, df.std() > .3]
Out[85]: 
           E
-1  0.302735
 0 -0.306402
 1 -0.326983
 2  0.602575
 3  0.368600

Answer 2

回答by Jianxun Li

To drop columns, You need those column names.

要删除列，您需要这些列名。

threshold = 0.2

df.drop(df.std()[df.std() < threshold].index.values, axis=1)

         D       E       F       G
-1  0.1767  0.3027  0.2533  0.2876
 0 -0.0888 -0.3064 -0.0639 -0.1102
 1 -0.0934 -0.3270 -0.1001 -0.1264
 2  0.0956  0.6026  0.0815  0.1703
 3  0.5103  0.3686  0.3661  0.3010

在 Pandas Dataframe 中删除标准差较低的列

提问by Ashkan

回答by maxymoo

回答by Jianxun Li

相关推荐

最近更新

标签

在 Pandas Dataframe 中删除标准差较低的列

提问by Ashkan

回答by maxymoo

回答by Jianxun Li

相关推荐

在 Cloud9 中安装 Python 模块 pandas

pandas 如何将 DataFrame 从 Stack Overflow 复制/粘贴到 Python 中

如何在 matplotlib 和 pandas 中增加图像大小？

pandas 在函数内部修改的 DataFrame

相关推荐

最近更新

标签