.gz 文件到带有 hive 分隔符的 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25063920/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
.gz file to pandas DataFrame with hive delimiter
提问by Keith
I am getting a very odd result when I try to load my .gz data file.
当我尝试加载我的 .gz 数据文件时,我得到了一个非常奇怪的结果。
My code is pretty simple
我的代码很简单
dt = pd.read_table(gzip.open(file.gz))
but I get a very odd delimiter. I had expected a tab ('\t') but iPython sees it as a WHITE LEFT-POINTING TRIANGLE. Most other programs do not see it at all. 
但我得到了一个非常奇怪的分隔符。我原以为有一个制表符 ('\t'),但 iPython 将其视为一个WHITE LEFT-POINTING TRIANGLE。大多数其他程序根本看不到它。
The data originally comes from hive through paramiko, if that matters I can give more details. Does anybody have a suggestion for how to delimit on such a thing?
数据最初来自 hive 通过 paramiko,如果这很重要,我可以提供更多详细信息。有人对如何界定这样的事情有什么建议吗?
EDIT:
编辑:
print(gzip.open("file.gz").read()[-5])
Returns exactly this character.
准确返回这个字符。
And
和
In [28] gzip.open("file.gz").read()[-5]
Out[28]: '\x01'
回答by Keith
pd.read_table("file.gz",compression='gzip',sep='\x01')
or
或者
pd.read_table(gzip.open('file.gz'),sep='\x01')
Will both do it.
两个都会做。

