pandas pd.read_csv 给了我 str 但需要浮动

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45478070/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:09:53  来源:igfitidea点击:

pd.read_csv gives me str but need float

pythonpandasnumpy

提问by steff

I have a CSV which looks like this:

我有一个 CSV 看起来像这样:

Date,Open,High,Low,Close,Adj Close,Volume
2007-07-25,4.929000,4.946000,4.896000,4.904000,4.904000,0
2007-07-26,4.863000,4.867000,4.759000,4.777000,4.777000,0
2007-07-27,4.741000,4.818000,4.741000,4.788000,4.788000,0
2007-07-30,4.763000,4.810000,4.763000,4.804000,4.804000,0

after

data = pd.read_csv(file, index_col='Date').drop(['Open','Close','Adj Close','Volume'], axis=1)

i end up with a df which looks like this:

我最终得到一个 df ,它看起来像这样:

                High       Low
Date                          
2007-07-25  4.946000  4.896000
2007-07-26  4.867000  4.759000
2007-07-27  4.818000  4.741000
2007-07-30  4.810000  4.763000
2007-07-31  4.843000  4.769000

Now i want to get High - Low. Tried:

现在我想获得高 - 低。尝试过:

np.diff(data.values, axis=1)

but getting an error: unsupported operand type(s) for -: 'str' and 'str'

但是得到一个错误:不支持的操作数类型-:'str'和'str'

but sure why the values in the df are str in the first place. Grateful for any solution.

但肯定为什么 df 中的值首先是 str 。感谢任何解决方案。

回答by jezrael

I think you need to_numericwith errors='coerce'because it seems there are some bad data:

我认为您需要to_numeric使用,errors='coerce'因为似乎有一些不好的数据:

data = pd.read_csv(file, index_col='Date', usecols=['High','Low'])

data = data.apply(pd.to_numeric, errors='coerce')

回答by Sébastien S.

The read_csv dtype option doesn't work ?

read_csv dtype 选项不起作用?

from the documentationdtype : Type name or dict of column -> type, default None Data type for data or columns. E.g. {‘a': np.float64, ‘b': np.int32} Use str or object to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

来自文档dtype :类型名称或列的 dict -> 类型,默认为无数据或列的数据类型。例如 {'a': np.float64, 'b': np.int32} 使用 str 或 object 来保留而不是解释 dtype。如果指定了转换器,它们将被应用于 dtype 转换的 INSTEAD。

data = pd.read_csv(file,
    index_col='Date',
    usecols=['High','Low'],
    dtype={'High': np.float64, 'Low': np.float64})