Pandas read_csv dtype 指定除一列之外的所有列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37515896/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas read_csv dtype specify all columns but one
提问by elaspog
I've a CSV file. Most of it's values I want to read as string, but I want to read a column as bool if the column with the given title exists..
我有一个 CSV 文件。我想将其中的大部分值读为字符串,但如果具有给定标题的列存在,我想将列读为 bool。
Because the CSV file has a lots of columns, I don't want to specify on each column the datatype directly and give something like this:
因为 CSV 文件有很多列,我不想在每一列上直接指定数据类型并给出如下内容:
data = read_csv('sample.csv', dtype={'A': str, 'B': str, ..., 'X': bool})
Is it possible to define the string type on each column but one and read an optional column as a bool at the same time?
是否可以在每一列上定义字符串类型,但同时将可选列作为 bool 读取?
My current solution is the following (but it's very unefficient and slow):
我目前的解决方案如下(但它非常低效且缓慢):
data = read_csv('sample.csv', dtype=str) # reads all column as string
if 'X' in data.columns:
l = lambda row: True if row['X'] == 'True' else False if row['X'] == 'False' else None
data['X'] = data.apply(l, axis=1)
UPDATE: Sample CSV:
更新:示例 CSV:
A;B;C;X
a1;b1;c1;True
a2;b2;c2;False
a3;b3;c3;True
Or the same can ba without the 'X' column (because the column is optional):
或者同样可以不带“X”列(因为该列是可选的):
A;B;C
a1;b1;c1
a2;b2;c2
a3;b3;c3
采纳答案by jezrael
You can first filter columns contains
value X
with boolean indexing
and then replace
:
您可以先过滤列contains
值X
,boolean indexing
然后replace
:
cols = df.columns[df.columns.str.contains('X')]
df[cols] = df[cols].replace({'True': True, 'False': False})
Or if need filter column X
:
或者如果需要过滤列X
:
cols = df.columns[df.columns == 'X']
df[cols] = df[cols].replace({'True': True, 'False': False})
Sample:
样本:
import pandas as pd
df = pd.DataFrame({'A':['a1','a2','a3'],
'B':['b1','b2','b3'],
'C':['c1','c2','c3'],
'X':['True','False','True']})
print (df)
A B C X
0 a1 b1 c1 True
1 a2 b2 c2 False
2 a3 b3 c3 True
print (df.dtypes)
A object
B object
C object
X object
dtype: object
cols = df.columns[df.columns.str.contains('X')]
print (cols)
Index(['X'], dtype='object')
df[cols] = df[cols].replace({'True': True, 'False': False})
print (df.dtypes)
A object
B object
C object
X bool
dtype: object
print (df)
A B C X
0 a1 b1 c1 True
1 a2 b2 c2 False
2 a3 b3 c3 True
回答by TheLazyScripter
why not use bool()
data type. bool()
evaluates to true if a parameter is passed and the parameter is not False, None, '', or 0
为什么不使用bool()
数据类型。bool()
如果传递参数并且该参数不是 False、None、'' 或 0,则计算结果为 true
if 'X' in data.columns:
try:
l = bool(data.columns['X'].replace('False', 0))
except:
l = None
data['X'] = data.apply(l, axis=1)
回答by gbakie
Actually you don't need any special handling when using read_csv from pandas (tested on version 0.17). Using your example file with X:
实际上,使用 pandas 的 read_csv 时不需要任何特殊处理(在 0.17 版上测试)。将您的示例文件与 X 一起使用:
import pandas as pd
df = pd.read_csv("file.csv", delimiter=";")
print(df.dtypes)
A object
B object
C object
X bool
dtype: object