pandas 如何仅将 csv 文件中的特定列加载到 DataFrame 中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13236098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:28:57  来源:igfitidea点击:

How to load only specific columns from csv file into a DataFrame

pythonpandascsv

提问by Ian Langmore

Suppose I have a csv file with 400 columns. I cannot load the entire file into a DataFrame (won't fit in memory). However, I only really want 50 columns, and this will fit in memory. I don't see any built in Pandas way to do this. What do you suggest? I'm open to using the PyTablesinterface, or pandas.io.sql.

假设我有一个包含 400 列的 csv 文件。我无法将整个文件加载到 DataFrame 中(不适合内存)。但是,我真的只想要 50 列,这将适合内存。我没有看到任何内置的 Pandas 方式来做到这一点。你有什么建议?我愿意使用该PyTables界面,或者pandas.io.sql.

The best-case scenario would be a function like: pandas.read_csv(...., columns=['name', 'age',...,'income']). I.e. we pass a list of column names (or numbers) that will be loaded.

在最好的情况下会是这样一个功能: pandas.read_csv(...., columns=['name', 'age',...,'income'])。即我们传递将加载的列名(或数字)列表。

采纳答案by Chang She

There's no default way to do this right now. I would suggest chunking the file and iterating over it and discarding the columns you don't want. So something like pd.concat([x.ix[:, cols_to_keep] for x in pd.read_csv(..., chunksize=200)])

目前没有默认的方法可以做到这一点。我建议将文件分块并对其进行迭代并丢弃您不想要的列。所以像pd.concat([x.ix[:, cols_to_keep] for x in pd.read_csv(..., chunksize=200)])

回答by Wes McKinney

Ian, I implemented a usecolsoption which does exactly what you describe. It will be in upcoming pandas 0.10; development version will be available soon.

伊恩,我实现了一个usecols完全符合你描述的选项。它将在即将到来的 pandas 0.10 中;开发版即将推出。



Since 0.10, you can use usecolslike

因为0.10,你可以使用usecols

df = pd.read_csv(...., usecols=['name', 'age',..., 'income'])