Python Pandas DtypeWarning 在导入时指定 dtype 选项 - 如何?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30314153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas DtypeWarning Specify dtype option on import - How?
提问by Jarad
I have these columns:
我有这些列:
['Campaign', 'Ad group', 'Keyword', 'Status', 'Match type', 'Max. CPC', 'Quality score', 'Impressions', 'Clicks', 'CTR', 'Avg. CPC', 'Cost', 'Avg. position', 'Converted clicks', 'Click conversion rate', 'Cost / converted click', 'Bounce rate', 'Pages / session', 'Avg. session duration (seconds)', '% new sessions']
The error I'm receiving says:
我收到的错误说:
Warning (from warnings module):
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 1164
data = self._reader.read(nrows)
DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.
What does the Columns (5)part mean? Is that the column position? Does Campaigncolumn start at position 0 or 1?
什么是Columns (5)部分是什么意思?那是柱位吗?不Campaign列开始在位置0或1?
Also, I suspect this error is because my Max. CPCcolumn has ' --'in a few areas instead of zeros. I want this column datatype to be a float. How do I translate these ' --'to 0.00and also set this column as a float datatype when reading the CSV?
另外,我怀疑这个错误是因为我的Max. CPC列有' --'几个区域而不是零。我希望此列数据类型为浮点数。如何翻译这些' --'到0.00和读取CSV时,也设置此列作为一个float数据类型?
I've tried:
我试过了:
import pandas as pd
import numpy as np
df = pd.read_csv('file.csv', dtype={'Max. CPC': pd.np.float64})
print(df.head())
But get a ValueError:
但是得到一个 ValueError:
ValueError: could not convert string to float: ' --'
采纳答案by EdChum
There are 2 approaches I can think of, one is to pass a list of values that read_csvcan consider to treat as NaNvalues, this would convert those values in the list to be converted to NaNso that the dtype of that column remains as a floatand not object:
我可以想到两种方法,一种是传递read_csv可以考虑视为NaN值的值列表,这会将列表中的这些值转换为要转换为的值,NaN以便该列的 dtype 保持为 afloat而不是object:
df = pd.read_csv('file.csv', dtype={'Max. CPC': pd.np.float64}, na_values=[' --'])
You can then convert these NaNvalues to 0.00calling fillna:
然后,您可以将这些NaN值转换为0.00调用fillna:
df['Max. CPC'] = df['Max. CPC'].fillna(0.00)
The other is to load as before and replacethese values to 0.00:
另一种是像以前一样加载replace这些值0.00:
df['Max. CPC'] = df['Max. CPC'].replace(' --', 0.00)

