在 Pandas 中查找数字列名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43898414/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
find numeric column names in Pandas
提问by Arnold Klein
I need to select columns in Pandas which contain only numeric values in column names, for example:
我需要在 Pandas 中选择列名中仅包含数值的列,例如:
df=
0 1 2 3 4 window_label next_states ids
0 17.0 18.0 16.0 15.0 15.0 ddddd d 13.0
1 18.0 16.0 15.0 15.0 16.0 ddddd d 13.0
2 16.0 15.0 15.0 16.0 15.0 ddddd d 13.0
3 15.0 15.0 16.0 15.0 17.0 ddddd d 13.0
4 15.0 16.0 15.0 17.0 NaN ddddd d 13.0
so I need to select only first five columns. Something like:
所以我只需要选择前五列。就像是:
df[df.columns.isnumeric()]
EDIT
编辑
I came up with the solution:
我想出了解决方案:
digit_column_names = [num for num in list(df.columns) if isinstance(num, (int,float))]
df_new = df[digit_column_names]
not very pythonic or pandasian, but it works.
不是很pythonic或pandasian,但它有效。
回答by Vaishali
Try
尝试
df.ids = df.ids.astype('object')
new_df = df.select_dtypes([np.number])
0 1 2 3 4
0 17.0 18.0 16.0 15.0 15.0
1 18.0 16.0 15.0 15.0 16.0
2 16.0 15.0 15.0 16.0 15.0
3 15.0 15.0 16.0 15.0 17.0
4 15.0 16.0 15.0 17.0 NaN
EDIT: If you are interested in selecting column names that are numeric, here is something that you can do.
编辑:如果您有兴趣选择数字列名,您可以执行以下操作。
df = pd.DataFrame({0: [1,2], '1': [3,4], 'blah': [5,6], 2: [7,8]})
df.columns = pd.to_numeric(df.columns, errors = 'coerce')
df[df.columns.dropna()]
You get
你得到
0.0 1.0 2.0
0 1 3 7
1 2 4 8
回答by MaxU
Here is an answer for the EDIT part:
这是编辑部分的答案:
i've intentionally created a mixture of column names as real numbers and strings that can be converted to numbers:
我有意将列名混合为实数和可以转换为数字的字符串:
In [44]: df.columns.tolist()
Out[44]: [0, 1, 2, 3, '4', 'window_label', 'next_states', 'ids']
# NOTE: ^
we can use pd.to_numeric(..., errors='coerce')
method:
我们可以使用pd.to_numeric(..., errors='coerce')
方法:
In [41]: df.columns[pd.to_numeric(df.columns, errors='coerce').to_series().notnull()]
Out[41]: Index([0, 1, 2, 3, '4'], dtype='object')
In [42]: cols = df.columns[pd.to_numeric(df.columns, errors='coerce').to_series().notnull()]
In [43]: df[cols]
Out[43]:
0 1 2 3 4
0 17.0 18.0 16.0 15.0 15.0
1 18.0 16.0 15.0 15.0 16.0
2 16.0 15.0 15.0 16.0 15.0
3 15.0 15.0 16.0 15.0 17.0
4 15.0 16.0 15.0 17.0 NaN
回答by Eric Ed Lohmar
I found another questionon this website that is pretty related. I used the code from that and applied it to your problem. I also threw a float into the column names to make sure it worked with int
and float
. It looks like:
我在这个网站上发现了另一个非常相关的问题。我使用了其中的代码并将其应用于您的问题。我还在列名中加入了一个浮点数,以确保它与int
和一起工作float
。看起来像:
import pandas as pd
df = pd.DataFrame({0: [17.0, 18, 16, 15, 15],
1: [18.0, 16, 15, 15, 16],
2.0: [16.0, 15, 15, 16, 15],
3: [15.0, 15, 16, 15, 17],
4: [15.0, 16, 15, 17, None],
'window_label': ['ddddd' for i in range(5)],
'next_states': ['d' for i in range(5)],
'ids': [13.0 for i in range(5)]})
num_cols = []
for col in df.columns.values:
try:
float(col)
num_cols.append(col)
except ValueError:
pass
print(df[num_cols])
and the result looks like:
结果如下:
0 1 2.0 3 4
0 17.0 18.0 16.0 15.0 15.0
1 18.0 16.0 15.0 15.0 16.0
2 16.0 15.0 15.0 16.0 15.0
3 15.0 15.0 16.0 15.0 17.0
4 15.0 16.0 15.0 17.0 NaN
Edit1: I just realized that you can keep the numeric determiner in a generator function and have a slightly faster/certainly less memory intensive way of doing the same thing.
编辑 1:我刚刚意识到您可以将数字确定器保留在生成器函数中,并且可以以稍微快一点/当然更少的内存密集型方式来做同样的事情。
import pandas as pd
def is_num(cols):
for col in cols:
try:
float(col)
yield col
except ValueError:
continue
df = pd.DataFrame({0: [17.0, 18, 16, 15, 15],
1: [18.0, 16, 15, 15, 16],
2.0: [16.0, 15, 15, 16, 15],
3: [15.0, 15, 16, 15, 17],
4: [15.0, 16, 15, 17, None],
'window_label': ['ddddd' for i in range(5)],
'next_states': ['d' for i in range(5)],
'ids': [13.0 for i in range(5)]})
print(df[[col for col in is_num(df.columns.values)]])
yields the exact same result as above, although it is somewhat less readable.
产生与上面完全相同的结果,尽管它的可读性稍差。
回答by Moondra
If you are only looking for numeric column names I think this should work:
如果您只是在寻找数字列名称,我认为这应该有效:
df.columns[df.columns.str.isnumeric()]
or this
或这个
df.iloc[:,df.columns.str.isnumeric()]