如何在 Pandas Dataframe、Python3.x 中将“字节”对象转换为文字字符串？

Question

提问by ShanZhengYang

I have a Python3.x pandas DataFrame whereby certain columns are strings which as expressed as bytes (like in Python2.x)

我有一个 Python3.x pandas DataFrame，其中某些列是表示为字节的字符串（如在 Python2.x 中）

import pandas as pd
df = pd.DataFrame(...)
df
       COLUMN1         ....
0      b'abcde'        ....
1      b'dog'          ....
2      b'cat1'         ....
3      b'bird1'        ....
4      b'elephant1'    ....

When I access by column with df.COLUMN1, I see Name: COLUMN1, dtype: object

当我按列访问时df.COLUMN1，我看到Name: COLUMN1, dtype: object

However, if I access by element, it is a "bytes" object

但是，如果我按元素访问，它是一个“字节”对象

df.COLUMN1.ix[0].dtype
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'dtype'

How do I convert these into "regular" strings? That is, how can I get rid of this b''prefix?

如何将这些转换为“常规”字符串？也就是说，我怎样才能摆脱这个b''前缀？

Answer 1

回答by EdChum

You can use vectorised str.decodeto decode byte strings into ordinary strings:

您可以使用 vectorisedstr.decode将字节字符串解码为普通字符串：

df['COLUMN1'].str.decode("utf-8")

To do this for multiple columns you can select just the str columns:

要对多列执行此操作，您可以仅选择 str 列：

str_df = df.select_dtypes([np.object])

convert all of them:

转换所有这些：

str_df = str_df.stack().str.decode('utf-8').unstack()

You can then swap out converted cols with the original df cols:

然后，您可以用原始 df cols 换出转换后的 cols：

for col in str_df:
    df[col] = str_df[col]

Answer 2

回答by Yu Zhou

df['COLUMN1'].apply(lambda x: x.decode("utf-8"))

Answer 3

回答by Dinesh.hmn

df.columns = [x.decode("utf-8") for x in df.columns]

This will make it faster and easier.

这将使它更快更容易。

如何在 Pandas Dataframe、Python3.x 中将“字节”对象转换为文字字符串？

提问by ShanZhengYang

回答by EdChum

回答by Yu Zhou

回答by Dinesh.hmn

相关推荐

最近更新

标签

如何在 Pandas Dataframe、Python3.x 中将“字节”对象转换为文字字符串？

提问by ShanZhengYang

回答by EdChum

回答by Yu Zhou

回答by Dinesh.hmn

相关推荐

pymongo 身份验证在 python 脚本中失败

Python AttributeError: 'module' 对象没有属性 'computation'

Python 列表理解中的多个变量？

无法为 python 3.6 安装 BeautifulSoup

相关推荐

最近更新

标签