Python 阻止 Pandas 将 int 转换为 float

Question

提问by user2570465

I have a DataFrame. Two relevant columns are the following: one is a column of intand another is a column of str.

我有一个DataFrame. 两个相关的列如下：一个是的列，int另一个是的列str。

I understand that if I insert NaNinto the intcolumn, Pandas will convert all the intinto floatbecause there is no NaNvalue for an int.

我明白，如果我插入NaN到int列，熊猫将全部转换int成float，因为没有NaN一个值int。

However, when I insert Noneinto the strcolumn, Pandas converts all my intto floatas well. This doesn't make sense to me - why does the value I put in column 2 affect column 1?

然而，当我插入None到str列，熊猫将所有我int要float为好。这对我来说没有意义 - 为什么我放在第 2 列中的值会影响第 1 列？

Here's a simple working example (Python 2):

这是一个简单的工作示例（Python 2）：

import pandas as pd
df = pd.DataFrame()
df["int"] = pd.Series([], dtype=int)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print df
print
df.loc[1] = [1, None]
print df

The output is

输出是

   int   str
0    0  zero

   int   str
0  0.0  zero
1  1.0   NaN

Is there any way to make the output the following:

有什么办法可以使输出如下：

   int   str
0    0  zero

   int   str
0    0  zero
1    1   NaN

without recasting the first column to int.

无需将第一列重铸为int.

I prefer using intinstead of floatbecause the actual data in that column are integers. If there's not workaround, I'll just use floatthough.
I prefer not having to recast because in my actual code, I don't
store the actual dtype.
I also need the data inserted row-by-row.

我更喜欢使用int而不是float因为该列中的实际数据是整数。如果没有解决方法，我只会使用float。
我更喜欢不必重铸，因为在我的实际代码中，我不
存储实际的dtype.
我还需要逐行插入数据。

Answer 1

回答by maxymoo

If you set dtype=object, your series will be able to contain arbitrary data types:

如果您设置dtype=object，您的系列将能够包含任意数据类型：

df["int"] = pd.Series([], dtype=object)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)

   int   str
0    0  zero
1  NaN   NaN

  int   str
0   0  zero
1   1  None

Answer 2

回答by fuglede

If you use DataFrame.appendto add the data, the dtypes are preserved, and you do not have to recast or rely on object:

如果DataFrame.append用于添加数据，则保留 dtypes，并且您不必重新转换或依赖object：

In [157]: df
Out[157]:
   int   str
0    0  zero

In [159]: df.append(pd.DataFrame([[1, None]], columns=['int', 'str']), ignore_index=True)
Out[159]:
   int   str
0    0  zero
1    1  None

Answer 3

回答by totalhack

As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaN/None.

从 pandas 1.0.0 开始，我相信您还有另一种选择，即首先使用convert_dtypes。这会将数据帧列转换为支持 pd.NA 的数据类型，从而避免 NaN/None 的问题。

...

df = df.convert_dtypes()
df.loc[1] = [1, None]
print(df)

#   int   str
# 0   0  zero
# 1   1  NaN

Python 阻止 Pandas 将 int 转换为 float

提问by user2570465

回答by maxymoo

回答by fuglede

回答by totalhack

相关推荐

最近更新

标签

Python 阻止 Pandas 将 int 转换为 float

提问by user2570465

回答by maxymoo

回答by fuglede

回答by totalhack

相关推荐

Python 如何保存和加载 xgboost 模型？

Python 类型错误：不支持解码 str

AttributeError: 模块 'pandas' 没有属性 'read_csv' Python3.5

Python 中的 != 和 <> 运算符之间有区别吗？

相关推荐

最近更新

标签