Python 如何在使用 Pandas 读取特定列的 csv 文件时删除它?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48899051/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:53:31  来源:igfitidea点击:

How to drop a specific column of csv file while reading it using pandas?

pythonpandascsvdataframe

提问by Anon George

I need to remove a columnwith label nameat the time of loading a csv using pandas. I am reading csv as follows and want to add parameters inside it to do so. Thanks.

我需要在使用 .csv 加载 csv 时删除带有标签名称。我正在按如下方式读取 csv 并希望在其中添加参数。谢谢。pandas

pd.read_csv("sample.csv")

pd.read_csv("sample.csv")

I know this to do after reading csv:

我在阅读 csv 后知道要这样做:

df.drop('name', axis=1)

回答by Sociopath

If you know the column names prior, you can do it by setting usecolsparameter

如果您事先知道列名,则可以通过设置usecols参数来完成

When you know which columns to use

当您知道要使用哪些列时

Suppose you have csv file with columns ['id','name','last_name']and you want just ['name','last_name']. You can do it as below:

假设您有包含列的 csv 文件,['id','name','last_name']而您只需要['name','last_name']. 你可以这样做:

import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])

when you want first N columns

当你想要前 N 列时

If you don't know the column names but you want first N columns from dataframe. You can do it by

如果您不知道列名,但您想要数据框中的前 N ​​列。你可以通过

import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])

Edit

编辑

When you know name of the column to be dropped

当您知道要删除的列的名称时

# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)

# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])

回答by cs95

Get the column headers from your CSV using pd.read_csvwith nrows=1, then do a subsequent read with usecolsto pull everything but the column(s) you want to omit.

使用pd.read_csvwith从 CSV 中获取列标题nrows=1,然后进行后续读取usecols以提取除要省略的列之外的所有内容。

headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))

Alternatively, you can do the same thing (read only the headers) very efficientlyusing the CSV module,

或者,您可以使用 CSV 模块非常有效地执行相同的操作(仅读取标题),

import csv

with open("sample.csv", 'r') as f:
    header = next(csv.reader(f))
    # For python 2, use
    # header = csv.reader(f).next()

df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))

回答by Ege

Using df= df.drop(['ID','prediction'],axis=1)made the work for me. I dropped 'ID' and 'prediction' columns. Make sure you put them in square brackets like ['column1','column2']. There is need for other complicated solutions.

使用df= df.drop(['ID','prediction'],axis=1)为我工作。我删除了“ID”和“预测”列。确保将它们放在方括号中,例如['column1','column2']. 需要其他复杂的解决方案。