Python 如何将 dtype 作为对象的列转换为 Pandas Dataframe 中的字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33957720/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:14:51  来源:igfitidea点击:

How to convert column with dtype as object to string in Pandas Dataframe

pythonpandas

提问by user3546523

When I read a csv file to pandas dataframe, each column is cast to its own datatypes. I have a column that was converted to an object. I want to perform string operations for this column such as splitting the values and creating a list. But no such operation is possible because its dtype is object. Can anyone please let me know the way to convert all the items of a column to strings instead of objects?

当我将 csv 文件读取到 Pandas 数据帧时,每一列都被转换为它自己的数据类型。我有一个已转换为对象的列。我想对此列执行字符串操作,例如拆分值和创建列表。但是没有这样的操作是可能的,因为它的 dtype 是对象。任何人都可以让我知道将列的所有项目转换为字符串而不是对象的方法吗?

I tried several ways but nothing worked. I used astype, str(), to_string etc.

我尝试了几种方法,但没有任何效果。我使用了 astype、str()、to_string 等。

a=lambda x: str(x).split(',')
df['column'].apply(a)

or

或者

df['column'].astype(str)

回答by Hypothetical Ninja

Did you try assigning it back to the column?

您是否尝试将其分配回列?

df['column'] = df['column'].astype('str') 

Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs,You could try:

参考这个问题,pandas 数据帧存储指向字符串的指针,因此它的类型为“对象”。根据文档,您可以尝试:

df['column_new'] = df['column'].str.split(',') 

回答by Siraj S.

since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.

由于字符串数据类型具有可变长度,因此默认情况下存储为对象数据类型。如果你想将它们存储为字符串类型,你可以这样做。

df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,

or alternatively

或者

df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters

回答by koshmaster

You could try using df['column'].str.and then use any string function. Pandas documentation includes those like split

您可以尝试使用df['column'].str.然后使用任何字符串函数。Pandas 文档包括诸如split 之类的文档

回答by zurfyx

Not answering the question directly, but it might help someone else.

不直接回答问题,但它可能会帮助其他人。

I have a column called Volume, having both -(invalid/NaN) and numbers formatted with ,

我有一个名为 的列Volume,其中包含-(无效/NaN)和数字格式,

df['Volume'] = df['Volume'].astype('str')
df['Volume'] = df['Volume'].str.replace(',', '')
df['Volume'] = pd.to_numeric(df['Volume'], errors='coerce')

Casting to string is requiredfor it to apply to str.replace

需要转换为字符串才能应用于str.replace

pandas.Series.str.replace
pandas.to_numeric

pandas.Series.str.replace
pandas.to_numeric