pandas 将csv文件作为浮点数读取到pandas数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45027400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:58:21  来源:igfitidea点击:

reading csv file to pandas dataframe as float

pythoncsvpandasparsing

提问by doctorer

I have a .csvfile with strings in the top row and first column, with the rest of the data as floating point numbers. I want to read it into a dataframe with the first row and column as column names and index respectively, and all the floating values as float64.

我有一个.csv在顶行和第一列中包含字符串的文件,其余数据为浮点数。我想将它读入一个数据帧,第一行和第一列分别作为列名和索引,所有浮动值作为float64.

If I use df = pd.read_csv(filename,index_col=0)all the numeric values are left as strings.

如果我使用df = pd.read_csv(filename,index_col=0)所有数值都保留为字符串。

If I use df = pd.read_csv(filename, index_col=0, dtype=np.float64)I get an exception: ValueError: could not convert string to floatas it attempts to parse the first column as float.

如果我使用,df = pd.read_csv(filename, index_col=0, dtype=np.float64)我会得到一个异常:ValueError: could not convert string to float因为它试图将第一列解析为float.

There are a large number of columns, and i do not have the column names, so I don't want to identify each column for parsing as float; I want to parse every column exceptthe first one.

有很多列,我没有列名,所以我不想将每一列标识为float; 我想解析第一列之外的每一列。

采纳答案by doctorer

The original code was correct

原始代码是正确的

df = pd.read_csv(filename,index_col=0)

but the .csvfile had been constructed incorrectly.

但该.csv文件的构造不正确。

As @juanpa.arrivillaga pointed out, pandas will infer the dtypeswithout any arguments, provided all the data in a column is of the same dtype. The columns were being interpreted as strings because although mostof the data was numeric, one row contained non-numeric data (actually dates). Removing this row from the .csvsolved the problem.

正如@juanpa.arrivillaga 指出的那样,dtypes只要列中的所有数据都是相同的,pandas 将在没有任何参数的情况下推断出dtype。这些列被解释为字符串,因为尽管大部分数据是数字,但一行包含非数字数据(实际上是日期)。从.csv解决了问题中删除这一行。

回答by Ragsparkz

Get the list of all column names, remove the first one. cast other columns.

获取所有列名的列表,删除第一个。投其他列。

cols = df.columns
cols.remove('fistcolumn')
for col in cols:
    df[col] = df[col].astype(float)