Python 从 Pandas DataFrame 中删除非数字列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12725417/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:42:50  来源:igfitidea点击:

Drop non-numeric columns from a pandas DataFrame

pythonpandas

提问by Einar

In my application I load text files that are structured as follows:

在我的应用程序中,我加载结构如下的文本文件:

  • First non numeric column (ID)
  • A number of non-numeric columns (strings)
  • A number of numeric columns (floats)
  • 第一个非数字列 (ID)
  • 一些非数字列(字符串)
  • 一些数字列(浮点数)

The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this:

非数字列的数量是可变的。目前我将数据加载到 DataFrame 中,如下所示:

source = pandas.read_table(inputfile, index_col=0)

I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

我想一下子删除所有非数字列,而不知道它们的名称或索引,因为这可以读取它们的 dtype。大熊猫可以做到这一点,还是我必须自己做些东西?

采纳答案by sapo_cosmico

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

为了避免使用私有方法,您还可以使用select_dtypes,您可以在其中包含或排除所需的 dtypes。

Ran into it on this poston the exact same thing.

这个帖子上遇到了完全相同的事情。

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

或者在您的情况下,特别是:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

回答by Wouter Overmeire

It`s a private method, but it will do the trick: source._get_numeric_data()

这是一个私有方法,但它可以解决问题:source._get_numeric_data()

In [2]: import pandas as pd

In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})

In [4]: source
Out[4]:
     A  B       C
0  foo  1  (1, 2)
1  bar  2  (3, 4)

In [5]: source._get_numeric_data()
Out[5]:
   B
0  1
1  2

回答by Luigi Bungaro

I also have another possible solution for dropping the columns with categorical value with 2 lines of code, defining a list with columns of categorical values (1st line) and dropping them with the second line. df is our DataFrame

我还有另一种可能的解决方案,用于用 2 行代码删除具有分类值的列,定义一个包含分类值列的列表(第一行),然后用第二行删除它们。df 是我们的 DataFrame

df before dropping: df before dropping

df 下降前: 下降前的 df

  to_be_dropped=pd.DataFrame(df.categorical).columns
  df= df.drop(to_be_dropped,axis=1)

df after dropping: df after dropping

下降后的df: 下降后的df

回答by Thomas Gotwig

This would remove each column which doesn't include float64 numerics.

这将删除不包含 float64 数字的每一列。

df = pd.read_csv('sample.csv', index_col=0)
non_floats = []
for col in df:
    if df[col].dtypes != "float64":
        non_floats.append(col)
df = df.drop(columns=non_floats)