Python Pandas:读取文件时如何跳过列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24366449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:29:10  来源:igfitidea点击:

Python Pandas : How to skip columns when reading a file?

pythonpandas

提问by jrjc

I have table formatted as follow :

我的表格格式如下:

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

When I open it with pandas doing :

当我用熊猫打开它时:

a = pd.read_table("file", header=None, sep=" ")

It tells me :

它告诉我:

CParserError: Error tokenizing data. C error: Expected 9 fields in line 2, saw 12

What I'd basically like to have is something similar to the skiprows option which would allow me to do something like :

我基本上想要的是类似于 skiprows 选项的东西,它可以让我做类似的事情:

a = pd.read_table("file", header=None, sep=" ", skipcolumns=[8:])

I'm aware that I could re-format this table with awk, but I'd like to known whether a Pandas solution exists or not.

我知道我可以用 重新格式化这个表awk,但我想知道 Pandas 解决方案是否存在。

Thanks.

谢谢。

回答by otus

The usecolsparameter allows you to select which columns to use:

usecols参数允许您选择要使用的列:

a = pd.read_table("file", header=None, sep=" ", usecols=range(8))

However, to accept irregular column counts you need to also use engine='python'.

但是,要接受不规则的列计数,您还需要使用engine='python'.

回答by Martin Konecny

If you are using Linux/OS X/Windows Cygwin, you should be able to prepare the file as follows:

如果您使用的是 Linux/OS X/Windows Cygwin,您应该能够按如下方式准备文件:

cat your_file |  cut -d' ' -f1,2,3,4,5,6,7 > out.file

Then in Python:

然后在 Python 中:

a = pd.read_table("out.file", header=None, sep=" ")

Example:

例子:

Input:

输入:

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

Output:

输出:

foo - bar - 10 2e-5 0.0
quz - baz - 4 1e-2 1

You can run this command manually on the command-line, or simply call it from within Python using the subprocessmodule.

您可以在命令行上手动运行此命令,也可以使用subprocess模块从 Python 内部调用它。