Python 导入文本文件:没有要从文件解析的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40193452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Importing text file : No Columns to parse from file
提问by mezz
I am trying to take input from sys.stdin. This is a map reducer program for hadoop. Input file is in txt form. Preview of the data set:
我正在尝试从 sys.stdin 获取输入。这是一个用于 hadoop 的 map reducer 程序。输入文件为txt格式。数据集预览:
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
6 86 3 883603013
62 257 2 879372434
286 1014 5 879781125
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457
303 785 3 879485318
122 387 5 879270459
194 274 2 879539794
291 1042 4 874834944
Code that I have been trying -
我一直在尝试的代码 -
import sys
df = pd.read_csv(sys.stdin,error_bad_lines=False)
I have also tried with delimiter = \t, header=False,defining column name
Nothing seems to work, the error I am getting is this error:
我也尝试过delimiter = \t, header=False,defining column name
似乎没有任何效果,我得到的错误是这个错误:
[root@sandbox lab]# cat /root/lab/u.data | python /root/lab/mid-1-mapper.py |python /root/lab/mid-1-reducer.py
Traceback (most recent call last):
File "/root/lab/mid-1-reducer.py", line 8, in <module>
df = pd.read_csv(sys.stdin,delimiter='\t')
File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 645, in parser_f
return _read(filepath_or_buffer, kwds)
File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 388, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 729, in __init__
self._make_engine(self.engine)
File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 922, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 1389, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 538, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5896)
pandas.io.common.EmptyDataError: No columns to parse from file
However, if when I try this directly in python(not in hadoop), it works fine.
但是,如果当我直接在 python 中(而不是在 hadoop 中)尝试这个时,它工作正常。
I have tried to looked into stackoverflow posts, one of the post suggested try and except. Applying that leaves me with a empty file. Can anybody help? Thanks
我曾尝试查看 stackoverflow 帖子,其中一篇帖子建议尝试和除外。应用它给我留下一个空文件。有人可以帮忙吗?谢谢
采纳答案by DerWeh
Using try and except just lets you continue in spite of errors and handle them. It won't magically fix your errors.
使用 try 和 except 只会让您在出现错误的情况下继续并处理它们。它不会神奇地修复您的错误。
read_csv
expects csv
files, which your input is obviously not. A quick look into the documentation:
read_csv
需要csv
文件,而您的输入显然不是。快速查看文档:
delim_whitespace : boolean, default False
Specifies whether or not whitespace (e.g. ' ' or ' ') will be used as the sep. Equivalent to setting sep='+s'. If this option is set to True, nothing should be passed in for the delimiter parameter.
delim_whitespace : 布尔值,默认为 False
指定是否将空格(例如“ ”或“ ”)用作分隔符。相当于设置 sep='+s'。如果此选项设置为 True,则不应为 delimiter 参数传入任何内容。
This seems like the right argument. Use
这似乎是正确的论点。用
pandas.read_csv(filepath_or_buffer, delim_whitespace=True).
Using delimiter='\t'
should also work, unless the tabs are expanded (replaced by spaces). As we can't really tell, delim_whitespace
seems to be the better option.
使用delimiter='\t'
也应该有效,除非选项卡被扩展(由空格替换)。正如我们无法确定的那样,delim_whitespace
似乎是更好的选择。
If this doesn't help, just print out your sys.stdin
to check if you properly pass the text.
如果这没有帮助,只需打印出您的内容sys.stdin
以检查您是否正确传递了文本。
Edit: I just saw that you use
编辑:我刚刚看到你使用
cat /root/lab/u.data | python /root/lab/mid-1-mapper.py |python /root/lab/mid-1-reducer.py
Is this intended, this way mid-1-reducer.py
processes the output of mid-1-mapper.py
. If you want to process the content of the file u.data
consider reading the file and not sys.stdin
.
这是有意的,这种方式mid-1-reducer.py
处理mid-1-mapper.py
. 如果要处理文件的内容,请u.data
考虑读取文件而不是sys.stdin
.
回答by Grainier
You have to set delim_whitespace
to True, to use whitespaces as the separator.
您必须设置delim_whitespace
为 True,才能使用空格作为分隔符。
import sys
import pandas as pd
if __name__ == '__main__':
df = pd.read_csv(sys.stdin, header=None, delim_whitespace=True)
print df