pandas 读取excel框架时跳过特定的列集 - 熊猫

Question

提问by Juan David

I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:

我事先知道excel文件中不需要哪些列，并且在读取文件以提高性能时我想避免使用它们。像这样的东西：

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])

There is nothing related to this in the documentation. is there any workaround for this?

文档中没有与此相关的任何内容。有什么解决方法吗？

Answer 1

采纳答案by MaxU

You can use the following technique:

您可以使用以下技术：

In [7]: cols2skip = [2,5,8]

In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

and then

进而

df = pd.read_excel(filename, usecols=cols)

Answer 2

回答by MarMat

If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:

如果您的 Pandas 版本允许（首先检查您是否可以将函数传递给 usecols），我会尝试以下操作：

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)

This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.

这应该跳过所有没有标题名称的列。您可以将“未命名”替换为您不需要的列名列表。

pandas 读取excel框架时跳过特定的列集 - 熊猫

提问by Juan David

采纳答案by MaxU

回答by MarMat

相关推荐

最近更新

标签

pandas 读取excel框架时跳过特定的列集 - 熊猫

提问by Juan David

采纳答案by MaxU

回答by MarMat

相关推荐

pandas seaborn 多变量组条形图

pandas 如何在 Python 中删除非英语单词？

pandas 在将数据框写入 Excel 工作表时，获取 AttributeError 'Workbook' 对象没有属性 'add_worksheet'

pandas Python热图：更改颜色图并使不对称

相关推荐

最近更新

标签