选择特定的 CSV 列（过滤）- Python/pandas

Question

提问by user3378649

I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example.

我有一个非常大的 CSV 文件，有 100 列。为了说明我的问题，我将使用一个非常基本的示例。

Let's suppose that we have a CSV file.

假设我们有一个 CSV 文件。

in  value   d     f
0    975   f01    5
1    976   F      4
2    977   d4     1
3    978   B6     0
4    979   2C     0

in  value   d     f
0    975   f01    5
1    976   F      4
2    977   d4     1
3    978   B6     0
4    979   2C     0

I want to select a specific columns.

我想选择特定的列。

import pandas
data = pandas.read_csv("ThisFile.csv")

In order to select the first 2 columns I used

为了选择我使用的前两列

data.ix[:,:2]

In order to select different columns like the 2nd and the 4th. What should I do?

为了选择不同的列，如第 2 和第 4 列。我该怎么办？

There is another way to solve this problem by re-writing the CSV file. But it's huge file; So I am avoiding this way.

还有另一种方法可以通过重写 CSV 文件来解决此问题。但这是一个巨大的文件；所以我在避免这种方式。

Answer 1

回答by unutbu

This selects the second and fourth columns (since Python uses 0-based indexing):

这将选择第二列和第四列（因为 Python 使用基于 0 的索引）：

In [272]: df.iloc[:,(1,3)]
Out[272]: 
   value  f
0    975  5
1    976  4
2    977  1
3    978  0
4    979  0

[5 rows x 2 columns]

df.ixcan select by location or label. df.ilocalways selects by location. When indexing by location use df.ilocto signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels.

df.ix可以按位置或标签选择。df.iloc总是按位置选择。当按位置索引时，使用df.iloc更明确地表示您的意图。它也快一点，因为 Pandas 不必检查您的索引是否使用标签。

Another possibility is to use the usecolsparameter:

另一种可能性是使用usecols参数：

data = pandas.read_csv("ThisFile.csv", usecols=[1,3])

This will load only the second and fourth columns into the dataDataFrame.

这将仅将第二列和第四列加载到dataDataFrame 中。

Answer 2

回答by Wai Yip Tung

If you rather select column by name, you can use

如果您更愿意按名称选择列，则可以使用

data[['value','f']]

   value  f
0    975  5
1    976  4
2    977  1
3    978  0
4    979  0

Answer 3

回答by dasilvadaniel

As Wai Yip Tungsaid, you can filter your dataframe while reading by specifying the name of the columns, for example:

正如Wai Yip Tung所说，您可以在阅读时通过指定列名来过滤数据框，例如：

import pandas as pd
data = pd.read_csv("ThisFile.csv")[['value','d']]

This solved my problem.

这解决了我的问题。

选择特定的 CSV 列（过滤）- Python/pandas

提问by user3378649

回答by unutbu

回答by Wai Yip Tung

回答by dasilvadaniel

相关推荐

最近更新

标签

选择特定的 CSV 列（过滤）- Python/pandas

提问by user3378649

回答by unutbu

回答by Wai Yip Tung

回答by dasilvadaniel

相关推荐

“无效参数”错误和python不读取文件

在python中填充队列和管理多处理

为什么我不能在 python 中创建一个轮子？

Python 在 __init__ 方法中设置初始 Django 表单字段值

相关推荐

最近更新

标签

Python 在 init 方法中设置初始 Django 表单字段值