在 Pandas 中查找数字列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43898414/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:35:06  来源:igfitidea点击:

find numeric column names in Pandas

pythonpandasdataframe

提问by Arnold Klein

I need to select columns in Pandas which contain only numeric values in column names, for example:

我需要在 Pandas 中选择列名中仅包含数值的列,例如:

df=
          0     1     2     3     4 window_label next_states       ids
0      17.0  18.0  16.0  15.0  15.0        ddddd           d      13.0
1      18.0  16.0  15.0  15.0  16.0        ddddd           d      13.0
2      16.0  15.0  15.0  16.0  15.0        ddddd           d      13.0
3      15.0  15.0  16.0  15.0  17.0        ddddd           d      13.0
4      15.0  16.0  15.0  17.0   NaN        ddddd           d      13.0

so I need to select only first five columns. Something like:

所以我只需要选择前五列。就像是:

df[df.columns.isnumeric()]

EDIT

编辑

I came up with the solution:

我想出了解决方案:

digit_column_names = [num for num in list(df.columns) if isinstance(num, (int,float))]
df_new = df[digit_column_names]

not very pythonic or pandasian, but it works.

不是很pythonic或pandasian,但它有效。

回答by Vaishali

Try

尝试

df.ids = df.ids.astype('object')    
new_df = df.select_dtypes([np.number])


    0       1       2       3       4       
0   17.0    18.0    16.0    15.0    15.0    
1   18.0    16.0    15.0    15.0    16.0    
2   16.0    15.0    15.0    16.0    15.0    
3   15.0    15.0    16.0    15.0    17.0    
4   15.0    16.0    15.0    17.0    NaN     

EDIT: If you are interested in selecting column names that are numeric, here is something that you can do.

编辑:如果您有兴趣选择数字列名,您可以执行以下操作。

df = pd.DataFrame({0: [1,2], '1': [3,4], 'blah': [5,6], 2: [7,8]})
df.columns = pd.to_numeric(df.columns, errors = 'coerce')
df[df.columns.dropna()]

You get

你得到

    0.0 1.0 2.0
0   1   3   7
1   2   4   8

回答by MaxU

Here is an answer for the EDIT part:

这是编辑部分的答案:

i've intentionally created a mixture of column names as real numbers and strings that can be converted to numbers:

我有意将列名混合为实数和可以转换为数字的字符串:

In [44]: df.columns.tolist()
Out[44]: [0, 1, 2, 3, '4', 'window_label', 'next_states', 'ids']
# NOTE:                ^

we can use pd.to_numeric(..., errors='coerce')method:

我们可以使用pd.to_numeric(..., errors='coerce')方法:

In [41]: df.columns[pd.to_numeric(df.columns, errors='coerce').to_series().notnull()]
Out[41]: Index([0, 1, 2, 3, '4'], dtype='object')

In [42]: cols = df.columns[pd.to_numeric(df.columns, errors='coerce').to_series().notnull()]

In [43]: df[cols]
Out[43]:
      0     1     2     3     4
0  17.0  18.0  16.0  15.0  15.0
1  18.0  16.0  15.0  15.0  16.0
2  16.0  15.0  15.0  16.0  15.0
3  15.0  15.0  16.0  15.0  17.0
4  15.0  16.0  15.0  17.0   NaN

回答by Eric Ed Lohmar

I found another questionon this website that is pretty related. I used the code from that and applied it to your problem. I also threw a float into the column names to make sure it worked with intand float. It looks like:

我在这个网站上发现了另一个非常相关的问题。我使用了其中的代码并将其应用于您的问题。我还在列名中加入了一个浮点数,以确保它与int和一起工作float。看起来像:

import pandas as pd

df = pd.DataFrame({0: [17.0, 18, 16, 15, 15],
                   1: [18.0, 16, 15, 15, 16],
                   2.0: [16.0, 15, 15, 16, 15],
                   3: [15.0, 15, 16, 15, 17],
                   4: [15.0, 16, 15, 17, None],
                   'window_label': ['ddddd' for i in range(5)],
                   'next_states': ['d' for i in range(5)],
                   'ids': [13.0 for i in range(5)]})

num_cols = []
for col in df.columns.values:
    try:
        float(col)
        num_cols.append(col)
    except ValueError:
        pass

print(df[num_cols])

and the result looks like:

结果如下:

      0     1   2.0     3     4
0  17.0  18.0  16.0  15.0  15.0
1  18.0  16.0  15.0  15.0  16.0
2  16.0  15.0  15.0  16.0  15.0
3  15.0  15.0  16.0  15.0  17.0
4  15.0  16.0  15.0  17.0   NaN

Edit1: I just realized that you can keep the numeric determiner in a generator function and have a slightly faster/certainly less memory intensive way of doing the same thing.

编辑 1:我刚刚意识到您可以将数字确定器保留在生成器函数中,并且可以以稍微快一点/当然更少的内存密集型方式来做同样的事情。

import pandas as pd


def is_num(cols):
    for col in cols:
        try:
            float(col)
            yield col
        except ValueError:
            continue

df = pd.DataFrame({0: [17.0, 18, 16, 15, 15],
                   1: [18.0, 16, 15, 15, 16],
                   2.0: [16.0, 15, 15, 16, 15],
                   3: [15.0, 15, 16, 15, 17],
                   4: [15.0, 16, 15, 17, None],
                   'window_label': ['ddddd' for i in range(5)],
                   'next_states': ['d' for i in range(5)],
                   'ids': [13.0 for i in range(5)]})

print(df[[col for col in is_num(df.columns.values)]])

yields the exact same result as above, although it is somewhat less readable.

产生与上面完全相同的结果,尽管它的可读性稍差。

回答by Moondra

If you are only looking for numeric column names I think this should work:

如果您只是在寻找数字列名称,我认为这应该有效:

df.columns[df.columns.str.isnumeric()]

or this

或这个

df.iloc[:,df.columns.str.isnumeric()]