如何将字节数据转换为 python pandas 数据帧?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47379476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:47:30  来源:igfitidea点击:

How to convert bytes data into a python pandas dataframe?

python-3.xpandas

提问by user7188934

I would like to convert 'bytes' data into a Pandas dataframe.

我想将“字节”数据转换为 Pandas 数据帧。

The data looks like this (few first lines):

数据如下所示(第一行几行):

    (b'#Settlement Date,Settlement Period,CCGT,OIL,COAL,NUCLEAR,WIND,PS,NPSHYD,OCGT'
 b',OTHER,INTFR,INTIRL,INTNED,INTEW,BIOMASS\n2017-01-01,1,7727,0,3815,7404,3'
 b'923,0,944,0,2123,948,296,856,238,\n2017-01-01,2,8338,0,3815,7403,3658,16,'
 b'909,0,2124,998,298,874,288,\n2017-01-01,3,7927,0,3801,7408,3925,0,864,0,2'
 b'122,998,298,816,286,\n2017-01-01,4,6996,0,3803,7407,4393,0,863,0,2122,998'

The columns headers appear at the top. each subsequent line is a timestamp and numbers.

列标题出现在顶部。随后的每一行都是一个时间戳和数字。

Is there a straightforward way to do this?

有没有直接的方法来做到这一点?

Thank you very much

非常感谢

@Paula Livingstone:

@保拉利文斯通:

This seems to work:

这似乎有效:

s=str(bytes_data,'utf-8')

file = open("data.txt","w") 

file.write(s)
df=pd.read_csv('data.txt')

maybe this can be done without using a file in between.

也许这可以在不使用文件的情况下完成。

回答by Tim

I had the same issue and found this library https://docs.python.org/2/library/stringio.htmlfrom the answer here: How to create a Pandas DataFrame from a string

我遇到了同样的问题,并从这里的答案中找到了这个库https://docs.python.org/2/library/stringio.html:How to create a Pandas DataFrame from a string

Try something like:

尝试类似:

from io import StringIO

s=str(bytes_data,'utf-8')

data = StringIO(s) 

df=pd.read_csv(data)

回答by Paula Livingstone

Ok cool, your input formatting is quite awkward but the following works:

好吧,你的输入格式很尴尬,但以下有效:

with open('file.txt', 'r') as myfile:
    data=myfile.read().replace('\n', '') #read in file as a string

df = pd.Series(" ".join(data.strip(' b\'').strip('\'').split('\' b\'')).split('\n')).str.split(',', expand=True)

print(df)

this produces the following:

这会产生以下结果:

                 0                  1     2    3     4        5      6   7   \
0  #Settlement Date  Settlement Period  CCGT  OIL  COAL  NUCLEAR   WIND  PS   
1        2017-01-01                  1  7727    0  3815     7404   3923   0   
2        2017-01-01                  2  8338    0  3815     7403   3658  16   
3        2017-01-01                  3  7927    0  3801     7408   3925   0   

       8      9      10     11      12      13     14       15  
0  NPSHYD  OCGT   OTHER  INTFR  INTIRL  INTNED  INTEW  BIOMASS  
1     944      0   2123    948     296     856    238           
2     909      0   2124    998     298     874    288           
3     864      0   2122    998     298     816    286     None 

In order for this to work you will need to ensure that your input file contains only a collection of complete rows. For this reason I removed the partial row for the purposes of the test.

为了使其工作,您需要确保您的输入文件只包含完整行的集合。出于这个原因,我为了测试的目的删除了部分行。

As you have said that the data source is an http GET request then the initial read would take place using pandas.read_html.

正如您所说,数据源是一个 http GET 请求,那么初始读取将使用pandas.read_html.

More detail on this can be found here. Note specifically the section on io (io : str or file-like).

可以在此处找到有关此的更多详细信息。特别注意 io 部分(io : str 或类似文件)。