pandas 读取excel框架时跳过特定的列集 - 熊猫

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49677313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:25:37  来源:igfitidea点击:

Skip specific set of columns when reading excel frame - pandas

pythonexcelpython-3.xpandas

提问by Juan David

I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:

我事先知道excel文件中不需要哪些列,并且在读取文件以提高性能时我想避免使用它们。像这样的东西:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])

There is nothing related to this in the documentation. is there any workaround for this?

文档中没有与此相关的任何内容。有什么解决方法吗?

采纳答案by MaxU

You can use the following technique:

您可以使用以下技术:

In [7]: cols2skip = [2,5,8]

In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

and then

进而

df = pd.read_excel(filename, usecols=cols)

回答by MarMat

If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:

如果您的 Pandas 版本允许(首先检查您是否可以将函数传递给 usecols),我会尝试以下操作:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)

This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.

这应该跳过所有没有标题名称的列。您可以将“未命名”替换为您不需要的列名列表。