如何在 Pandas Dataframe、Python3.x 中将“字节”对象转换为文字字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40389764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:29:05  来源:igfitidea点击:

How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?

pythonarrayspython-3.xpandasbyte

提问by ShanZhengYang

I have a Python3.x pandas DataFrame whereby certain columns are strings which as expressed as bytes (like in Python2.x)

我有一个 Python3.x pandas DataFrame,其中某些列是表示为字节的字符串(如在 Python2.x 中)

import pandas as pd
df = pd.DataFrame(...)
df
       COLUMN1         ....
0      b'abcde'        ....
1      b'dog'          ....
2      b'cat1'         ....
3      b'bird1'        ....
4      b'elephant1'    ....

When I access by column with df.COLUMN1, I see Name: COLUMN1, dtype: object

当我按列访问时df.COLUMN1,我看到Name: COLUMN1, dtype: object

However, if I access by element, it is a "bytes" object

但是,如果我按元素访问,它是一个“字节”对象

df.COLUMN1.ix[0].dtype
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'dtype'

How do I convert these into "regular" strings? That is, how can I get rid of this b''prefix?

如何将这些转换为“常规”字符串?也就是说,我怎样才能摆脱这个b''前缀?

回答by EdChum

You can use vectorised str.decodeto decode byte strings into ordinary strings:

您可以使用 vectorisedstr.decode将字节字符串解码为普通字符串:

df['COLUMN1'].str.decode("utf-8")

To do this for multiple columns you can select just the str columns:

要对多列执行此操作,您可以仅选择 str 列:

str_df = df.select_dtypes([np.object])

convert all of them:

转换所有这些:

str_df = str_df.stack().str.decode('utf-8').unstack()

You can then swap out converted cols with the original df cols:

然后,您可以用原始 df cols 换出转换后的 cols:

for col in str_df:
    df[col] = str_df[col]

回答by Yu Zhou

df['COLUMN1'].apply(lambda x: x.decode("utf-8"))

回答by Dinesh.hmn

df.columns = [x.decode("utf-8") for x in df.columns]

This will make it faster and easier.

这将使它更快更容易。