如何将字节数据转换为 python pandas 数据帧？

Question

提问by user7188934

I would like to convert 'bytes' data into a Pandas dataframe.

我想将“字节”数据转换为 Pandas 数据帧。

The data looks like this (few first lines):

数据如下所示（第一行几行）：

    (b'#Settlement Date,Settlement Period,CCGT,OIL,COAL,NUCLEAR,WIND,PS,NPSHYD,OCGT'
 b',OTHER,INTFR,INTIRL,INTNED,INTEW,BIOMASS\n2017-01-01,1,7727,0,3815,7404,3'
 b'923,0,944,0,2123,948,296,856,238,\n2017-01-01,2,8338,0,3815,7403,3658,16,'
 b'909,0,2124,998,298,874,288,\n2017-01-01,3,7927,0,3801,7408,3925,0,864,0,2'
 b'122,998,298,816,286,\n2017-01-01,4,6996,0,3803,7407,4393,0,863,0,2122,998'

The columns headers appear at the top. each subsequent line is a timestamp and numbers.

列标题出现在顶部。随后的每一行都是一个时间戳和数字。

Is there a straightforward way to do this?

有没有直接的方法来做到这一点？

Thank you very much

非常感谢

@Paula Livingstone:

@保拉利文斯通：

This seems to work:

这似乎有效：

s=str(bytes_data,'utf-8')

file = open("data.txt","w") 

file.write(s)
df=pd.read_csv('data.txt')

maybe this can be done without using a file in between.

也许这可以在不使用文件的情况下完成。

Answer 1

回答by Tim

I had the same issue and found this library https://docs.python.org/2/library/stringio.htmlfrom the answer here: How to create a Pandas DataFrame from a string

我遇到了同样的问题，并从这里的答案中找到了这个库https://docs.python.org/2/library/stringio.html：How to create a Pandas DataFrame from a string

Try something like:

尝试类似：

from io import StringIO

s=str(bytes_data,'utf-8')

data = StringIO(s) 

df=pd.read_csv(data)

Answer 2

回答by Paula Livingstone

Ok cool, your input formatting is quite awkward but the following works:

好吧，你的输入格式很尴尬，但以下有效：

with open('file.txt', 'r') as myfile:
    data=myfile.read().replace('\n', '') #read in file as a string

df = pd.Series(" ".join(data.strip(' b\'').strip('\'').split('\' b\'')).split('\n')).str.split(',', expand=True)

print(df)

this produces the following:

这会产生以下结果：

                 0                  1     2    3     4        5      6   7   \
0  #Settlement Date  Settlement Period  CCGT  OIL  COAL  NUCLEAR   WIND  PS   
1        2017-01-01                  1  7727    0  3815     7404   3923   0   
2        2017-01-01                  2  8338    0  3815     7403   3658  16   
3        2017-01-01                  3  7927    0  3801     7408   3925   0   

       8      9      10     11      12      13     14       15  
0  NPSHYD  OCGT   OTHER  INTFR  INTIRL  INTNED  INTEW  BIOMASS  
1     944      0   2123    948     296     856    238           
2     909      0   2124    998     298     874    288           
3     864      0   2122    998     298     816    286     None

In order for this to work you will need to ensure that your input file contains only a collection of complete rows. For this reason I removed the partial row for the purposes of the test.

为了使其工作，您需要确保您的输入文件只包含完整行的集合。出于这个原因，我为了测试的目的删除了部分行。

As you have said that the data source is an http GET request then the initial read would take place using pandas.read_html.

正如您所说，数据源是一个 http GET 请求，那么初始读取将使用pandas.read_html.

More detail on this can be found here. Note specifically the section on io (io : str or file-like).

可以在此处找到有关此的更多详细信息。特别注意 io 部分（io : str 或类似文件）。

如何将字节数据转换为 python pandas 数据帧？

提问by user7188934

回答by Tim

回答by Paula Livingstone

相关推荐

最近更新

标签

如何将字节数据转换为 python pandas 数据帧？

提问by user7188934

回答by Tim

回答by Paula Livingstone

相关推荐

pandas 如何在条形图中按递增顺序对条形进行排序？

转置 Pandas DataFrame 并将列标题更改为列表

pandas 按组与熊猫相加唯一值

在 Pandas 数据框中计算滚动 z 分数

相关推荐

最近更新

标签