Python 通过 dtype 选择 Pandas 列

Question

提问by caner

I was wondering if there is an elegant and shorthand way in Pandas DataFrames to select columns by data type (dtype). i.e. Select only int64 columns from a DataFrame.

我想知道 Pandas DataFrames 中是否有一种优雅的速记方式来按数据类型（dtype）选择列。即仅从 DataFrame 中选择 int64 列。

To elaborate, something along the lines of

详细说明，类似于

df.select_columns(dtype=float64)

Thanks in advance for the help

在此先感谢您的帮助

Answer 1

采纳答案by Dan Allan

df.loc[:, df.dtypes == np.float64]

Answer 2

回答by normonics

df.select_dtypes(include=[np.float64])

Answer 3

回答by Andy Hayden

Since 0.14.1 there's a select_dtypesmethod so you can do this more elegantly/generally.

从 0.14.1 开始，有一种select_dtypes方法可以让您更优雅/更一般地执行此操作。

In [11]: df = pd.DataFrame([[1, 2.2, 'three']], columns=['A', 'B', 'C'])

In [12]: df.select_dtypes(include=['int'])
Out[12]:
   A
0  1

To select all numeric types use the numpy dtype numpy.number

要选择所有数字类型，请使用 numpy dtype numpy.number

In [13]: df.select_dtypes(include=[np.number])
Out[13]:
   A    B
0  1  2.2

In [14]: df.select_dtypes(exclude=[object])
Out[14]:
   A    B
0  1  2.2

Answer 4

回答by MaxU

I'd like to extend existing answer by adding options for selecting all floatingdtypes or all integerdtypes:

我想通过添加用于选择所有浮动dtype 或所有整数dtype 的选项来扩展现有答案：

Demo:

演示：

np.random.seed(1234)

df = pd.DataFrame({
        'a':np.random.rand(3), 
        'b':np.random.rand(3).astype('float32'), 
        'c':np.random.randint(10,size=(3)).astype('int16'),
        'd':np.arange(3).astype('int32'), 
        'e':np.random.randint(10**7,size=(3)).astype('int64'),
        'f':np.random.choice([True, False], 3),
        'g':pd.date_range('2000-01-01', periods=3)
     })

yields:

产量：

In [2]: df
Out[2]:
          a         b  c  d        e      f          g
0  0.191519  0.785359  6  0  7578569  False 2000-01-01
1  0.622109  0.779976  8  1  7981439   True 2000-01-02
2  0.437728  0.272593  0  2  2558462   True 2000-01-03

In [3]: df.dtypes
Out[3]:
a           float64
b           float32
c             int16
d             int32
e             int64
f              bool
g    datetime64[ns]
dtype: object

Selecting all floating numbercolumns:

选择所有浮点数列：

In [4]: df.select_dtypes(include=['floating'])
Out[4]:
          a         b
0  0.191519  0.785359
1  0.622109  0.779976
2  0.437728  0.272593

In [5]: df.select_dtypes(include=['floating']).dtypes
Out[5]:
a    float64
b    float32
dtype: object

Selecting all integer numbercolumns:

选择所有整数列：

In [6]: df.select_dtypes(include=['integer'])
Out[6]:
   c  d        e
0  6  0  7578569
1  8  1  7981439
2  0  2  2558462

In [7]: df.select_dtypes(include=['integer']).dtypes
Out[7]:
c    int16
d    int32
e    int64
dtype: object

Selecting all numericcolumns:

选择所有数字列：

In [8]: df.select_dtypes(include=['number'])
Out[8]:
          a         b  c  d        e
0  0.191519  0.785359  6  0  7578569
1  0.622109  0.779976  8  1  7981439
2  0.437728  0.272593  0  2  2558462

In [9]: df.select_dtypes(include=['number']).dtypes
Out[9]:
a    float64
b    float32
c      int16
d      int32
e      int64
dtype: object

Answer 5

回答by hui chen

Optionally if you don't want to create a subset of the dataframe during the process, you can directly iterate through the column datatype.

或者，如果您不想在此过程中创建数据帧的子集，您可以直接遍历列数据类型。

I haven't benchmarked the code below, assume it will be faster if you work on very large dataset.

我还没有对下面的代码进行基准测试，假设你处理非常大的数据集会更快。

[col for col in df.columns.tolist() if df[col].dtype not in ['object','<M8[ns]']]

Answer 6

回答by Gurubux

Multiple includes for selecting columns with list of types for example- float64 and int64

多个包含用于选择具有类型列表的列，例如 float64 和 int64

df_numeric = df.select_dtypes(include=[np.float64,np.int64])

Answer 7

回答by Anjan Prasad

select_dtypes(include=[np.int])

Answer 8

回答by Jake Drew

If you want to select int64 columns and then update "in place", you can use:

如果要选择 int64 列然后“就地”更新，可以使用：

int64_cols = [col for col in df.columns if is_int64_dtype(df[col].dtype)]
df[int64_cols]

For example, notice that I update all the int64 columns in df to zero below:

例如，请注意我将下面的 df 中的所有 int64 列更新为零：

In [1]:

    import pandas as pd
    from pandas.api.types import is_int64_dtype

    df = pd.DataFrame({'a': [1, 2] * 3,
                       'b': [True, False] * 3,
                       'c': [1.0, 2.0] * 3,
                       'd': ['red','blue'] * 3,
                       'e': pd.Series(['red','blue'] * 3, dtype="category"),
                       'f': pd.Series([1, 2] * 3, dtype="int64")})

    int64_cols = [col for col in df.columns if is_int64_dtype(df[col].dtype)] 
    print('int64 Cols: ',int64_cols)

    print(df[int64_cols])

    df[int64_cols] = 0

    print(df[int64_cols]) 

Out [1]:

    int64 Cols:  ['a', 'f']

           a  f
        0  1  1
        1  2  2
        2  1  1
        3  2  2
        4  1  1
        5  2  2
           a  f
        0  0  0
        1  0  0
        2  0  0
        3  0  0
        4  0  0
        5  0  0

Just for completeness:

只是为了完整性：

df.loc() and df.select_dtypes() are going to give a copy of a slice from the dataframe. This means that if you try to update values from df.select_dtypes(), you will get a SettingWithCopyWarning and no updates will happen to df in place.

df.loc() 和 df.select_dtypes() 将提供数据帧中切片的副本。这意味着如果您尝试从 df.select_dtypes() 更新值，您将获得 SettingWithCopyWarning 并且不会对 df 进行更新。

For example, notice when I try to update df using .loc() or .select_dtypes() to select columns, nothing happens:

例如，请注意，当我尝试使用 .loc() 或 .select_dtypes() 更新 df 以选择列时，没有任何反应：

In [2]:

    df = pd.DataFrame({'a': [1, 2] * 3,
                       'b': [True, False] * 3,
                       'c': [1.0, 2.0] * 3,
                       'd': ['red','blue'] * 3,
                       'e': pd.Series(['red','blue'] * 3, dtype="category"),
                       'f': pd.Series([1, 2] * 3, dtype="int64")})

    df_bool = df.select_dtypes(include='bool')
    df_bool.b[0] = False

    print(df_bool.b[0])
    print(df.b[0])

    df.loc[:, df.dtypes == np.int64].a[0]=7
    print(df.a[0])

Out [2]:

    False
    True
    1

Python 通过 dtype 选择 Pandas 列

提问by caner

采纳答案by Dan Allan

回答by normonics

回答by Andy Hayden

回答by MaxU

回答by hui chen

回答by Gurubux

回答by Anjan Prasad

回答by Jake Drew

相关推荐

最近更新

标签

Python 通过 dtype 选择 Pandas 列

提问by caner

采纳答案by Dan Allan

回答by normonics

回答by Andy Hayden

回答by MaxU

回答by hui chen

回答by Gurubux

回答by Anjan Prasad

回答by Jake Drew

相关推荐

Python 如何使用百分比制作熊猫交叉表？

检查对象是否是python中的列表列表？

python input() 没有按预期工作

Python 安装几乎所有库的 pip 问题

相关推荐

最近更新

标签