如何将字节数据转换为 python pandas 数据帧?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47379476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert bytes data into a python pandas dataframe?
提问by user7188934
I would like to convert 'bytes' data into a Pandas dataframe.
我想将“字节”数据转换为 Pandas 数据帧。
The data looks like this (few first lines):
数据如下所示(第一行几行):
(b'#Settlement Date,Settlement Period,CCGT,OIL,COAL,NUCLEAR,WIND,PS,NPSHYD,OCGT'
b',OTHER,INTFR,INTIRL,INTNED,INTEW,BIOMASS\n2017-01-01,1,7727,0,3815,7404,3'
b'923,0,944,0,2123,948,296,856,238,\n2017-01-01,2,8338,0,3815,7403,3658,16,'
b'909,0,2124,998,298,874,288,\n2017-01-01,3,7927,0,3801,7408,3925,0,864,0,2'
b'122,998,298,816,286,\n2017-01-01,4,6996,0,3803,7407,4393,0,863,0,2122,998'
The columns headers appear at the top. each subsequent line is a timestamp and numbers.
列标题出现在顶部。随后的每一行都是一个时间戳和数字。
Is there a straightforward way to do this?
有没有直接的方法来做到这一点?
Thank you very much
非常感谢
@Paula Livingstone:
@保拉利文斯通:
This seems to work:
这似乎有效:
s=str(bytes_data,'utf-8')
file = open("data.txt","w")
file.write(s)
df=pd.read_csv('data.txt')
maybe this can be done without using a file in between.
也许这可以在不使用文件的情况下完成。
回答by Tim
I had the same issue and found this library https://docs.python.org/2/library/stringio.htmlfrom the answer here: How to create a Pandas DataFrame from a string
我遇到了同样的问题,并从这里的答案中找到了这个库https://docs.python.org/2/library/stringio.html:How to create a Pandas DataFrame from a string
Try something like:
尝试类似:
from io import StringIO
s=str(bytes_data,'utf-8')
data = StringIO(s)
df=pd.read_csv(data)
回答by Paula Livingstone
Ok cool, your input formatting is quite awkward but the following works:
好吧,你的输入格式很尴尬,但以下有效:
with open('file.txt', 'r') as myfile:
data=myfile.read().replace('\n', '') #read in file as a string
df = pd.Series(" ".join(data.strip(' b\'').strip('\'').split('\' b\'')).split('\n')).str.split(',', expand=True)
print(df)
this produces the following:
这会产生以下结果:
0 1 2 3 4 5 6 7 \
0 #Settlement Date Settlement Period CCGT OIL COAL NUCLEAR WIND PS
1 2017-01-01 1 7727 0 3815 7404 3923 0
2 2017-01-01 2 8338 0 3815 7403 3658 16
3 2017-01-01 3 7927 0 3801 7408 3925 0
8 9 10 11 12 13 14 15
0 NPSHYD OCGT OTHER INTFR INTIRL INTNED INTEW BIOMASS
1 944 0 2123 948 296 856 238
2 909 0 2124 998 298 874 288
3 864 0 2122 998 298 816 286 None
In order for this to work you will need to ensure that your input file contains only a collection of complete rows. For this reason I removed the partial row for the purposes of the test.
为了使其工作,您需要确保您的输入文件只包含完整行的集合。出于这个原因,我为了测试的目的删除了部分行。
As you have said that the data source is an http GET request then the initial read would take place using pandas.read_html
.
正如您所说,数据源是一个 http GET 请求,那么初始读取将使用pandas.read_html
.
More detail on this can be found here. Note specifically the section on io (io : str or file-like).
可以在此处找到有关此的更多详细信息。特别注意 io 部分(io : str 或类似文件)。