选择特定的 CSV 列(过滤)- Python/pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22394598/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select specific CSV columns (Filtering) - Python/pandas
提问by user3378649
I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example.
我有一个非常大的 CSV 文件,有 100 列。为了说明我的问题,我将使用一个非常基本的示例。
Let's suppose that we have a CSV file.
假设我们有一个 CSV 文件。
in value d f 0 975 f01 5 1 976 F 4 2 977 d4 1 3 978 B6 0 4 979 2C 0
in value d f 0 975 f01 5 1 976 F 4 2 977 d4 1 3 978 B6 0 4 979 2C 0
I want to select a specific columns.
我想选择特定的列。
import pandas
data = pandas.read_csv("ThisFile.csv")
In order to select the first 2 columns I used
为了选择我使用的前两列
data.ix[:,:2]
In order to select different columns like the 2nd and the 4th. What should I do?
为了选择不同的列,如第 2 和第 4 列。我该怎么办?
There is another way to solve this problem by re-writing the CSV file. But it's huge file; So I am avoiding this way.
还有另一种方法可以通过重写 CSV 文件来解决此问题。但这是一个巨大的文件;所以我在避免这种方式。
回答by unutbu
This selects the second and fourth columns (since Python uses 0-based indexing):
这将选择第二列和第四列(因为 Python 使用基于 0 的索引):
In [272]: df.iloc[:,(1,3)]
Out[272]:
value f
0 975 5
1 976 4
2 977 1
3 978 0
4 979 0
[5 rows x 2 columns]
df.ixcan select by location or label. df.ilocalways selects by location. When indexing by location use df.ilocto signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels.
df.ix可以按位置或标签选择。df.iloc总是按位置选择。当按位置索引时,使用df.iloc更明确地表示您的意图。它也快一点,因为 Pandas 不必检查您的索引是否使用标签。
Another possibility is to use the usecolsparameter:
另一种可能性是使用usecols参数:
data = pandas.read_csv("ThisFile.csv", usecols=[1,3])
This will load only the second and fourth columns into the dataDataFrame.
这将仅将第二列和第四列加载到dataDataFrame 中。
回答by Wai Yip Tung
If you rather select column by name, you can use
如果您更愿意按名称选择列,则可以使用
data[['value','f']]
value f
0 975 5
1 976 4
2 977 1
3 978 0
4 979 0
回答by dasilvadaniel
As Wai Yip Tungsaid, you can filter your dataframe while reading by specifying the name of the columns, for example:
正如Wai Yip Tung所说,您可以在阅读时通过指定列名来过滤数据框,例如:
import pandas as pd
data = pd.read_csv("ThisFile.csv")[['value','d']]
This solved my problem.
这解决了我的问题。

