Pandas DataFrame 导出到_csv 更改列的 dtype

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50165799/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:31:35  来源:igfitidea点击:

Pandas DataFrame export to_csv change dtype of columns

python-3.xpandas

提问by Aklys

Hopefully a simple request.

希望是一个简单的请求。

I'm finding that when I build a DataFrame and set the column datatypes and then export it to csv it is doing a conversion on the datatype of a numerical string to an integer.

我发现当我构建一个 DataFrame 并设置列数据类型然后将其导出到 csv 时,它正在将数字字符串的数据类型转换为整数。

Such as a value might be "0000" and the csv ends up with value 0. But I need it to retain the number of characters in the string and save the csv as "0000".

例如一个值可能是“0000”,csv 以值 0 结束。但我需要它保留字符串中的字符数并将 csv 保存为“0000”。

Anyone know of a way to retain the string rather than the converted datatype?

有人知道保留字符串而不是转换后的数据类型的方法吗?

Setting the datatype after import doesn't solve the issue (before anyone tells me I can set it on/after import), as it causes the issue that when converting the integer to a string you have to also configure the leading 0s on every import as well, which is not optimal.

在导入后设置数据类型并不能解决问题(在任何人告诉我我可以在导入时/之后设置它之前),因为它会导致在将整数转换为字符串时您还必须在每次导入时配置前导 0 的问题同样,这不是最佳的。

Hoping I'm overlooking something simple.

希望我忽略了一些简单的事情。

(EDIT) oh and my export line is just a simple export which is why it might be I'm just not realising the argument that needs to be provided.

(编辑)哦,我的导出行只是一个简单的导出,这就是为什么我可能只是没有意识到需要提供的参数。

df.to_csv("Test.csv", index=False)

采纳答案by ASGM

Assuming that df['your_column']is the column you want to preserve, you can use the dtypeargument in read_csv():

假设这df['your_column']是您要保留的列,您可以在 中使用dtype参数read_csv()

df.read_csv('temp.csv', dtype={'your_column': str})

If that's not working, are you sure your columns contain strings to begin with? Because here's the behavior I see:

如果这不起作用,您确定您的列包含开头的字符串吗?因为这是我看到的行为:

>>> df1 = pd.DataFrame({'a': ['0000', '0000', '0100',]})
>>> df1
      a
0  0000
1  0000
2  0100
>>> df1.to_csv('temp.csv', index=False)
>>> df2.read_csv('temp.csv', dtype={'a': str})
>>> df2
      a
0  0000
1  0000
2  0100

Maybe your problem isn't on export or import, but on creation.

也许您的问题不在于导出或导入,而在于创建。

df = pd.DataFrame({'a': 0000, 0000, 0100]})

This is going to make a dataframe with values 0,0,100. If you want them to be strings, you need to create them as strings.

这将创建一个带有 values 的数据框0,0,100。如果您希望它们是字符串,则需要将它们创建为字符串。