Python - 将数据框中的所有项目转换为字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42676982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:01:13  来源:igfitidea点击:

Python - Turn all items in a Dataframe to strings

pythonstringpandasdataframe

提问by theprowler

I followed the following procedure: In Python, how do I convert all of the items in a list to floats?because each column of my Dataframe is list, but instead of floatsI chose to change all the values to strings.

我遵循以下过程:在 Python 中,如何将列表中的所有项目转换为浮点数?因为我的 Dataframe 的每一列都是list,而不是floats我选择将所有值更改为strings.

df = [str(i) for i in df]

df = [str(i) for i in df]

But this failed.

但这失败了。

It simply erased all the data except for the first row of column names.

它只是擦除了除第一行列名之外的所有数据。

Then, trying df = [str(i) for i in df.values]resulted in changing the entire Dataframe into one big list, but that messes up the data too much to be able to meet the goal of my script which is to export the Dataframe to my Oracle table.

然后,尝试df = [str(i) for i in df.values]导致将整个 Dataframe 更改为一个大列表,但这会将数据弄得一团糟,无法满足我的脚本的目标,即将 Dataframe 导出到我的 Oracle 表。

Is there a way to convert all the items that are in my Dataframe that are NOT strings into strings?

有没有办法将我的 Dataframe 中所有不是字符串的项目转换为字符串?

回答by PdevG

You can use this:

你可以使用这个:

df = df.astype(str)

out of curiosity I decided to see if there is any difference in efficiency between the accepted solution and mine.

出于好奇,我决定看看已接受的解决方案和我的解决方案在效率上是否有任何差异。

The results are below:

结果如下:

example df:

示例 df:

df = pd.DataFrame([list(range(1000))], index=[0])

test df.astype:

测试df.astype

%timeit df.astype(str) 
>> 100 loops, best of 3: 2.18 ms per loop

test df.applymap:

测试df.applymap

%timeit df.applymap(str)
1 loops, best of 3: 245 ms per loop

It seems df.astypeis quite a lot faster :)

似乎df.astype快了很多:)

回答by Psidom

You can use applymapmethod:

您可以使用applymap方法:

df = df.applymap(str)

回答by Sander van den Oord

With pandas >= 1.0 there is now a dedicated string datatype:

pandas >= 1.0 现在有一个专用的字符串数据类型:

You can convert your column to this pandas string datatypeusing .astype('string'):

您可以使用.astype('string')将您的列转换为此Pandas字符串数据类型

df = df.astype('string')

This is different from using strwhich sets the pandas 'object' datatype:

这与 using strwhich 设置熊猫“对象”数据类型不同:

df = df.astype(str)

You can see the difference in datatypes when you look at the info of the dataframe:

当您查看数据框的信息时,您可以看到数据类型的差异:

df = pd.DataFrame({
    'zipcode_str': [90210, 90211] ,
    'zipcode_string': [90210, 90211],
})

df['zipcode_str'] = df['zipcode_str'].astype(str)
df['zipcode_string'] = df['zipcode_str'].astype('string')

df.info()

# you can see that the first column has dtype object
# while the second column has the new dtype string
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   zipcode_str     2 non-null      object
 1   zipcode_string  2 non-null      string
dtypes: object(1), string(1)


From the docs:


从文档:

The 'string' extension type solves several issues with object-dtype NumPy arrays:

1) You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.

2) object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn't a clear way to select just text while excluding non-text, but still object-dtype columns.

3) When reading code, the contents of an object dtype array is less clear than string.

'string' 扩展类型解决了对象数据类型 NumPy 数组的几个问题:

1) 您可能会意外地在对象 dtype 数组中存储字符串和非字符串的混合物。StringArray 只能存储字符串。

2) object dtype 破坏了特定于 dtype 的操作,如 DataFrame.select_dtypes()。没有明确的方法来选择文本同时排除非文本,但仍然是对象类型的列。

3)在阅读代码时,对象dtype数组的内容不如字符串清晰。


Information about pandas 1.0 can be found here:
https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html


可以在此处找到有关 pandas 1.0 的信息:https:
//pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

回答by Sarbari Roy

This worked for me:

这对我有用:

dt.applymap(lambda x: x[0] if type(x) is list else None)