pandas 如何仅将 csv 文件中的特定列加载到 DataFrame 中

Question

提问by Ian Langmore

Suppose I have a csv file with 400 columns. I cannot load the entire file into a DataFrame (won't fit in memory). However, I only really want 50 columns, and this will fit in memory. I don't see any built in Pandas way to do this. What do you suggest? I'm open to using the PyTablesinterface, or pandas.io.sql.

假设我有一个包含 400 列的 csv 文件。我无法将整个文件加载到 DataFrame 中（不适合内存）。但是，我真的只想要 50 列，这将适合内存。我没有看到任何内置的 Pandas 方式来做到这一点。你有什么建议？我愿意使用该PyTables界面，或者pandas.io.sql.

The best-case scenario would be a function like: pandas.read_csv(...., columns=['name', 'age',...,'income']). I.e. we pass a list of column names (or numbers) that will be loaded.

在最好的情况下会是这样一个功能： pandas.read_csv(...., columns=['name', 'age',...,'income'])。即我们传递将加载的列名（或数字）列表。

Answer 1

采纳答案by Chang She

There's no default way to do this right now. I would suggest chunking the file and iterating over it and discarding the columns you don't want. So something like pd.concat([x.ix[:, cols_to_keep] for x in pd.read_csv(..., chunksize=200)])

目前没有默认的方法可以做到这一点。我建议将文件分块并对其进行迭代并丢弃您不想要的列。所以像pd.concat([x.ix[:, cols_to_keep] for x in pd.read_csv(..., chunksize=200)])

Answer 2

回答by Wes McKinney

Ian, I implemented a usecolsoption which does exactly what you describe. It will be in upcoming pandas 0.10; development version will be available soon.

伊恩，我实现了一个usecols完全符合你描述的选项。它将在即将到来的 pandas 0.10 中；开发版即将推出。

Since 0.10, you can use usecolslike

因为0.10，你可以使用usecols像

df = pd.read_csv(...., usecols=['name', 'age',..., 'income'])

pandas 如何仅将 csv 文件中的特定列加载到 DataFrame 中

提问by Ian Langmore

采纳答案by Chang She

回答by Wes McKinney

相关推荐

最近更新

标签

pandas 如何仅将 csv 文件中的特定列加载到 DataFrame 中

提问by Ian Langmore

采纳答案by Chang She

回答by Wes McKinney

相关推荐

pandas 添加具有不同索引的熊猫系列而不会获得 NaN

pandas 像在 MATLAB 中一样在 IPython 中保存会话？

pandas python pandas的转换器

pandas 如何在熊猫中将两个数据框与不同的列标签相乘？

相关推荐

最近更新

标签