Python pandas.read_csv 来自字符串或包数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20696479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas.read_csv from string or package data
提问by John Salvatier
I have some csv text data in a package which I want to read using read_csv. I was doing this by
我在一个包中有一些 csv 文本数据,我想使用 read_csv 读取它们。我是这样做的
from pkgutil import get_data
from StringIO import StringIO
data = read_csv(StringIO(get_data('package.subpackage', 'path/to/data.csv')))
However, StringIO.StringIO disappears in Python 3, and io.StringIO only accepts Unicode. Is there a simple way to do this?
然而,StringIO.StringIO 在 Python 3 中消失了,io.StringIO 只接受 Unicode。有没有一种简单的方法可以做到这一点?
Edit: the following does not appear to work
编辑:以下似乎不起作用
import pandas as pd
import pkgutil
from io import StringIO
def get_data_file(pkg, path):
f = StringIO()
contents = unicode(pkgutil.get_data('pymc.examples', 'data/wells.dat'))
f.write(contents)
return f
wells = get_data_file('pymc.examples', 'data/wells.dat')
data = pd.read_csv(wells, delimiter=' ', index_col='id',
dtype={'switch': np.int8})
failing with
失败
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 401, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 209, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 509, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 611, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 893, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 441, in pandas._parser.TextReader.__cinit__ (pandas/src/parser.c:3940)
File "parser.pyx", line 551, in pandas._parser.TextReader._get_header (pandas/src/parser.c:5096)
pandas._parser.CParserError: Passed header=0 but only 0 lines in file
采纳答案by DSM
The following worked for me in 3.3:
以下在 3.3 中对我有用:
>>> import numpy as np, pandas as pd
>>> import io, pkgutil
>>> wells = pkgutil.get_data('pymc.examples', 'data/wells.dat')
>>> type(wells)
<class 'bytes'>
>>> df = pd.read_csv(io.BytesIO(wells), encoding='utf8', sep=" ", index_col="id", dtype={"switch": np.int8})
>>> df.head()
switch arsenic dist assoc educ
id
1 1 2.36 16.826000 0 0
2 1 0.71 47.321999 0 0
3 0 2.07 20.966999 0 10
4 1 1.15 21.486000 0 12
5 1 1.10 40.874001 1 14
[5 rows x 5 columns]
N.B. I had to manually put wells.datin that location, so I can't swear I copied it correctly and that there isn't terminal whitespace, because I deleted some. But passing read_csva BytesIOobject and an encoding parameter should work. (Actually, you can probably get away without it, but it's a good habit. io.TextIOWrappermight be another option.)
注意我必须手动放置wells.dat在那个位置,所以我不能发誓我正确复制了它并且没有终端空格,因为我删除了一些。但是传递read_csv一个BytesIO对象和一个编码参数应该可以工作。(实际上,没有它您可能会逃脱,但这是一个好习惯。 io.TextIOWrapper可能是另一种选择。)
回答by CONvid19
To pass a stringto pandas read_csv(), you can use io.StringIO, i.e.:
要将 a 传递string给 pandas read_csv(),您可以使用io.StringIO,即:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("csv string..."))

