Pandas read_csv dtype 指定除一列之外的所有列

Question

提问by elaspog

I've a CSV file. Most of it's values I want to read as string, but I want to read a column as bool if the column with the given title exists..

我有一个 CSV 文件。我想将其中的大部分值读为字符串，但如果具有给定标题的列存在，我想将列读为 bool。

Because the CSV file has a lots of columns, I don't want to specify on each column the datatype directly and give something like this:

因为 CSV 文件有很多列，我不想在每一列上直接指定数据类型并给出如下内容：

data = read_csv('sample.csv', dtype={'A': str, 'B': str, ..., 'X': bool})

Is it possible to define the string type on each column but one and read an optional column as a bool at the same time?

是否可以在每一列上定义字符串类型，但同时将可选列作为 bool 读取？

My current solution is the following (but it's very unefficient and slow):

我目前的解决方案如下（但它非常低效且缓慢）：

data = read_csv('sample.csv', dtype=str) # reads all column as string
if 'X' in data.columns:
    l = lambda row: True if row['X'] == 'True' else False if row['X'] == 'False' else None
    data['X'] = data.apply(l, axis=1)

UPDATE: Sample CSV:

更新：示例 CSV：

A;B;C;X
a1;b1;c1;True
a2;b2;c2;False
a3;b3;c3;True

Or the same can ba without the 'X' column (because the column is optional):

或者同样可以不带“X”列（因为该列是可选的）：

A;B;C
a1;b1;c1
a2;b2;c2
a3;b3;c3

Answer 1

采纳答案by jezrael

You can first filter columns containsvalue Xwith boolean indexingand then replace:

您可以先过滤列contains值X，boolean indexing然后replace：

cols = df.columns[df.columns.str.contains('X')]
df[cols] = df[cols].replace({'True': True, 'False': False})

Or if need filter column X:

或者如果需要过滤列X：

cols = df.columns[df.columns == 'X']
df[cols] = df[cols].replace({'True': True, 'False': False})

Sample:

样本：

import pandas as pd

df = pd.DataFrame({'A':['a1','a2','a3'],
                   'B':['b1','b2','b3'],
                   'C':['c1','c2','c3'],
                   'X':['True','False','True']})

print (df)
    A   B   C      X
0  a1  b1  c1   True
1  a2  b2  c2  False
2  a3  b3  c3   True

print (df.dtypes)
A    object
B    object
C    object
X    object
dtype: object

cols = df.columns[df.columns.str.contains('X')]
print (cols)

Index(['X'], dtype='object')

df[cols] = df[cols].replace({'True': True, 'False': False})

print (df.dtypes)
A    object
B    object
C    object
X      bool
dtype: object
print (df)

    A   B   C      X
0  a1  b1  c1   True
1  a2  b2  c2  False
2  a3  b3  c3   True

Answer 2

回答by TheLazyScripter

why not use bool()data type. bool()evaluates to true if a parameter is passed and the parameter is not False, None, '', or 0

为什么不使用bool()数据类型。bool()如果传递参数并且该参数不是 False、None、'' 或 0，则计算结果为 true

if 'X' in data.columns:
    try:
        l = bool(data.columns['X'].replace('False', 0))
    except:
        l = None
    data['X'] = data.apply(l, axis=1)

Answer 3

回答by gbakie

Actually you don't need any special handling when using read_csv from pandas (tested on version 0.17). Using your example file with X:

实际上，使用 pandas 的 read_csv 时不需要任何特殊处理（在 0.17 版上测试）。将您的示例文件与 X 一起使用：

import pandas as pd

df = pd.read_csv("file.csv", delimiter=";")
print(df.dtypes)

A    object
B    object
C    object
X      bool
dtype: object

Pandas read_csv dtype 指定除一列之外的所有列

提问by elaspog

采纳答案by jezrael

回答by TheLazyScripter

回答by gbakie

相关推荐

最近更新

标签

Pandas read_csv dtype 指定除一列之外的所有列

提问by elaspog

采纳答案by jezrael

回答by TheLazyScripter

回答by gbakie

相关推荐

如何将 Pandas 数据框列从 np.datetime64 转换为 datetime？

pandas：用不带引号的文字制表符编写制表符分隔的数据框

Pandas / IPython Notebook：在数据框中包含并显示图像

pandas 如何在python上过滤数据透视表

相关推荐

最近更新

标签