Python 阻止 Pandas 将 int 转换为 float
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40251948/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Stop Pandas from converting int to float
提问by user2570465
I have a DataFrame
. Two relevant columns are the following: one is a column of int
and another is a column of str
.
我有一个DataFrame
. 两个相关的列如下:一个是 的列,int
另一个是 的列str
。
I understand that if I insert NaN
into the int
column, Pandas will convert all the int
into float
because there is no NaN
value for an int
.
我明白,如果我插入NaN
到int
列,熊猫将全部转换int
成float
,因为没有NaN
一个值int
。
However, when I insert None
into the str
column, Pandas converts all my int
to float
as well. This doesn't make sense to me - why does the value I put in column 2 affect column 1?
然而,当我插入None
到str
列,熊猫将所有我int
要float
为好。这对我来说没有意义 - 为什么我放在第 2 列中的值会影响第 1 列?
Here's a simple working example (Python 2):
这是一个简单的工作示例(Python 2):
import pandas as pd
df = pd.DataFrame()
df["int"] = pd.Series([], dtype=int)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print df
print
df.loc[1] = [1, None]
print df
The output is
输出是
int str
0 0 zero
int str
0 0.0 zero
1 1.0 NaN
Is there any way to make the output the following:
有什么办法可以使输出如下:
int str
0 0 zero
int str
0 0 zero
1 1 NaN
without recasting the first column to int
.
无需将第一列重铸为int
.
I prefer using
int
instead offloat
because the actual data in that column are integers. If there's not workaround, I'll just usefloat
though.I prefer not having to recast because in my actual code, I don't
store the actualdtype
.I also need the data inserted row-by-row.
我更喜欢使用
int
而不是float
因为该列中的实际数据是整数。如果没有解决方法,我只会使用float
。我更喜欢不必重铸,因为在我的实际代码中,我不
存储实际的dtype
.我还需要逐行插入数据。
回答by maxymoo
If you set dtype=object
, your series will be able to contain arbitrary data types:
如果您设置dtype=object
,您的系列将能够包含任意数据类型:
df["int"] = pd.Series([], dtype=object)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)
int str
0 0 zero
1 NaN NaN
int str
0 0 zero
1 1 None
回答by fuglede
If you use DataFrame.append
to add the data, the dtypes are preserved, and you do not have to recast or rely on object
:
如果DataFrame.append
用于添加数据,则保留 dtypes,并且您不必重新转换或依赖object
:
In [157]: df
Out[157]:
int str
0 0 zero
In [159]: df.append(pd.DataFrame([[1, None]], columns=['int', 'str']), ignore_index=True)
Out[159]:
int str
0 0 zero
1 1 None
回答by totalhack
As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaN/None.
从 pandas 1.0.0 开始,我相信您还有另一种选择,即首先使用convert_dtypes。这会将数据帧列转换为支持 pd.NA 的数据类型,从而避免 NaN/None 的问题。
...
df = df.convert_dtypes()
df.loc[1] = [1, None]
print(df)
# int str
# 0 0 zero
# 1 1 NaN