Python Pandas DtypeWarning 在导入时指定 dtype 选项 - 如何?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30314153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:22:01  来源:igfitidea点击:

Python Pandas DtypeWarning Specify dtype option on import - How?

pythoncsvpandas

提问by Jarad

I have these columns:

我有这些列:

['Campaign', 'Ad group', 'Keyword', 'Status', 'Match type', 'Max. CPC', 'Quality score', 'Impressions', 'Clicks', 'CTR', 'Avg. CPC', 'Cost', 'Avg. position', 'Converted clicks', 'Click conversion rate', 'Cost / converted click', 'Bounce rate', 'Pages / session', 'Avg. session duration (seconds)', '% new sessions']

The error I'm receiving says:

我收到的错误说:

Warning (from warnings module):
  File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 1164
    data = self._reader.read(nrows)
DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.

What does the Columns (5)part mean? Is that the column position? Does Campaigncolumn start at position 0 or 1?

什么是Columns (5)部分是什么意思?那是柱位吗?不Campaign列开始在位置0或1?

Also, I suspect this error is because my Max. CPCcolumn has ' --'in a few areas instead of zeros. I want this column datatype to be a float. How do I translate these ' --'to 0.00and also set this column as a float datatype when reading the CSV?

另外,我怀疑这个错误是因为我的Max. CPC列有' --'几个区域而不是零。我希望此列数据类型为浮点数。如何翻译这些' --'0.00和读取CSV时,也设置此列作为一个float数据类型?

I've tried:

我试过了:

import pandas as pd
import numpy as np

df = pd.read_csv('file.csv', dtype={'Max. CPC': pd.np.float64})

print(df.head())

But get a ValueError:

但是得到一个 ValueError:

ValueError: could not convert string to float: ' --'

采纳答案by EdChum

There are 2 approaches I can think of, one is to pass a list of values that read_csvcan consider to treat as NaNvalues, this would convert those values in the list to be converted to NaNso that the dtype of that column remains as a floatand not object:

我可以想到两种方法,一种是传递read_csv可以考虑视为NaN值的值列表,这会将列表中的这些值转换为要转换为的值,NaN以便该列的 dtype 保持为 afloat而不是object

df = pd.read_csv('file.csv', dtype={'Max. CPC': pd.np.float64}, na_values=[' --'])

You can then convert these NaNvalues to 0.00calling fillna:

然后,您可以将这些NaN值转换为0.00调用fillna

df['Max. CPC'] = df['Max. CPC'].fillna(0.00)

The other is to load as before and replacethese values to 0.00:

另一种是像以前一样加载replace这些值0.00

df['Max. CPC'] = df['Max. CPC'].replace(' --', 0.00)