pandas 使用read_sas后如何从pandas对象类型中的b'Text'获取文本？

Question

提问by doktr

I'm trying to read the data from .sas7bdat format of SAS using pandas function read_sas:

我正在尝试使用 Pandas 函数 read_sas 从 .sas7bdat 格式的 SAS 读取数据：

import pandas as pd
df = pd.read_sas('D:/input/houses.sas7bdat', format = 'sas7bdat')
df.head()

And I have two data types in the df dataframe - float64 and object. I completely satisfied with the float64 datatype, so I can freely convert it to int, string etc. The problem is with object data type, which I can see in the df dataframe wrapped like this:

我在 df 数据帧中有两种数据类型 - float64 和 object。我对 float64 数据类型完全满意，因此我可以自由地将其转换为 int、string 等。问题在于对象数据类型，我可以在如下包装的 df 数据帧中看到：

b'Text'

or like this:

或者像这样：

b'12345'

instead of

代替

Text

or

或者

I can't convert it to string or int respectively or to "normal" object data type. Also I can't eleminate b'' using slice or replace technics. So I'm not able to use columns with the object data type. Please, tell me how can I get rid of b''.

我无法将其分别转换为 string 或 int 或“普通”对象数据类型。此外，我无法使用切片或替换技术来消除 b''。所以我不能使用对象数据类型的列。请告诉我如何摆脱b''。

Answer 1

回答by MAFiA303

add this encoding="utf-8"

添加这个 encoding="utf-8"

so the line would be as follows:

所以该行如下：

df = pd.read_sas('D:/input/houses.sas7bdat', format = 'sas7bdat', encoding="utf-8")

Answer 2

回答by Eric

First, figure out your sas dataset encoding. In SAS, run proc contents on the dataset. Check the "Encoding". In my case, my encoding was "latin1 Western (ISO)". Then enter your encoding as such:

首先，弄清楚您的 sas 数据集编码。在 SAS 中，对数据集运行 proc 内容。检查“编码”。就我而言，我的编码是“latin1 Western (ISO)”。然后输入您的编码：

df = pd.read_sas('filename', format = 'sas7bdat', encoding = 'latin-1')

Answer 3

回答by Adrien Pacifico

The encodingargument in pd.read_sas()leads me to have very large dataframes which lead me to have memory related errors.

中的encoding参数pd.read_sas()导致我拥有非常大的数据帧，这导致我出现与内存相关的错误。

An other way to deal with the problem would be to convertthe byte strings to an other encoding (e.g. utf8).

处理该问题的另一种方法是convert将字节字符串转换为其他编码（例如utf8）。

Example:

例子：

Example dataframe:

示例数据框：


df = pd.DataFrame({"A": [1, 2, 3], 
                   "B": [b"a", b"b", b"c"], 
                   "C": ["a", "b", "c"]})

Transform byte strings to strings:

将字节字符串转换为字符串：

for col in df:
    if isinstance(df[col][0], bytes):
        print(col, "will be transformed from bytestring to string")
        df[col] = df[col].str.decode("utf8")  # or any other encoding
print(df)

output:

输出：

   A  B  C
0  1  a  a
1  2  b  b
2  3  c  c

Useful links:

有用的链接：

Pandas Series.str.decode() page of GeeksforGeeks(where I found my solution)
What is the difference between a string and a byte string?

GeeksforGeeks 的 Pandas Series.str.decode() 页面（在那里我找到了我的解决方案）
字符串和字节字符串有什么区别？

pandas 使用read_sas后如何从pandas对象类型中的b'Text'获取文本？

提问by doktr

回答by MAFiA303

回答by Eric

回答by Adrien Pacifico

Example:

例子：

相关推荐

最近更新

标签

pandas 使用read_sas后如何从pandas对象类型中的b'Text'获取文本？

提问by doktr

回答by MAFiA303

回答by Eric

回答by Adrien Pacifico

Example:

例子：

相关推荐

pandas 如何使用 Statsmodels.api 获取回归截距

pandas seaborn 热图图中的离散图例

pandas 使用熊猫绘制具有真实日期的时间序列的简单方法

Pandas 等价的 rbind 操作

相关推荐

最近更新

标签