pandas pd.read_csv 默认将整数视为浮点数

Question

提问by codingknob

I have a csvthat looks like (headers = first row):

我有一个csv看起来像（标题=第一行）：

name,a,a1,b,b1
arnold,300311,arnld01,300311,arnld01
sam,300713,sam01,300713,sam01

When I run:

当我运行时：

df = pd.read_csv('file.csv')

Columns aand bhave a .0attached to the end like so:

列a并b有一个.0附加到末尾，如下所示：

df.head()

name,a,a1,b,b1
arnold,300311.0,arnld01,300311.0,arnld01
sam,300713.0,sam01,300713.0,sam01

Columns aand bare integers or blanks so why does pd.read_csv()treat them like floats and how do I ensure they are integers on the read?

列a和b是整数或空白，那么为什么pd.read_csv()将它们视为浮点数，我如何确保它们在读取时是整数？

Answer 1

回答by Andy

As rootmentioned in the comments, this is a limitation of Pandas (and Numpy). NaNis a float and the empty values you have in your CSV are NaN.

正如评论中提到的root，这是 Pandas（和 Numpy）的一个限制。NaN是一个浮点数，您在 CSV 中的空值是 NaN。

This is listed in the gotchasof pandas as well.

这也列在Pandas的陷阱中。

You can work around this in a few ways.

您可以通过几种方式解决此问题。

For the examples below I used the following to import the data - note that I added a row with an empty value in columns aand b

对于下面的示例，我使用以下内容导入数据 - 请注意，我在列中添加了一个空值的行，a并且b

import pandas as pd
from StringIO import StringIO

data = """name,a,a1,b,b1
arnold,300311,arnld01,300311,arnld01
sam,300713,sam01,300713,sam01
test,,test01,,test01"""

df = pd.read_csv(StringIO(data), sep=",")

Drop NaN rows

删除 NaN 行

Your first option is to drop rows that contain this NaNvalue. The downside of this, is that you lose the entire row. After getting your data into a dataframe, run this:

您的第一个选择是删除包含此NaN值的行。这样做的缺点是您会丢失整行。将数据放入数据框后，运行以下命令：

df.dropna(inplace=True)
df.a = df.a.astype(int)
df.b = df.b.astype(int)

This drops all NaNrows from the dataframe, then it converts column aand column bto an int

这NaN将从数据框中删除所有行，然后将列a和列b转换为int

>>> df.dtypes
name    object
a        int32
a1      object
b        int32
b1      object
dtype: object

>>> df
     name       a       a1       b       b1
0  arnold  300311  arnld01  300311  arnld01
1     sam  300713    sam01  300713    sam01

Fill `NaN`with placeholder data

填充`NaN`占位符数据

This option will replace all your NaNvalues with a throw away value. That value is something you need to determine. For this test, I made it -999999. This will allow use to keep the rest of the data, convert it to an int, and make it obvious what data is invalid. You'll be able to filter these rows out if you are making calculations based on the columns later.

此选项将用NaN丢弃值替换您的所有值。该值是您需要确定的。对于这个测试，我做到了-999999。这将允许使用保留其余数据，将其转换为 int，并使哪些数据无效。如果您稍后根据列进行计算，您将能够过滤掉这些行。

df.fillna(-999999, inplace=True)
df.a = df.a.astype(int)
df.b = df.b.astype(int)

This produces a dataframe like so:

这会产生一个像这样的数据帧：

>>> df.dtypes
name    object
a        int32
a1      object
b        int32
b1      object
dtype: object

>>> df
     name       a       a1       b       b1
0  arnold  300311  arnld01  300311  arnld01
1     sam  300713    sam01  300713    sam01
2    test -999999   test01 -999999   test01

Leave the float values

保留浮点值

Finally, another choice is to leave the float values (and NaN) and not worry about the non-integer data type.

最后，另一种选择是保留浮点值（和NaN）而不用担心非整数数据类型。

Answer 2

回答by user2515138

Converting Float to Integer values using Pandas read_csv - Working ====================================================

使用 Pandas read_csv 将浮点数转换为整数值 - 工作 ======================================== ============

# Importing the dataset
dataset = pd.read_csv('WorldWarWeather_Data.csv')
X = dataset.iloc[:, 3:11].values
y = dataset.iloc[:, 2].values
X=X.astype(int)
y=y.astype(int)

pandas pd.read_csv 默认将整数视为浮点数

提问by codingknob

回答by Andy

Drop NaN rows

删除 NaN 行

Fill `NaN`with placeholder data

填充`NaN`占位符数据

Leave the float values

保留浮点值

回答by user2515138

相关推荐

最近更新

标签

pandas pd.read_csv 默认将整数视为浮点数

提问by codingknob

回答by Andy

Drop NaN rows

删除 NaN 行

Fill NaNwith placeholder data

填充NaN占位符数据

Leave the float values

保留浮点值

回答by user2515138

相关推荐

无法使用 Pandas plot() 函数组合条形图和折线图

pandas 创建列表时跳过熊猫数据框中的第一行

pandas 熊猫：在散点图中使用颜色

pandas 根据另一列的值在熊猫中创建新列

相关推荐

最近更新

标签

Fill `NaN`with placeholder data

填充`NaN`占位符数据