Python 如何在使用 Pandas 读取特定列的 csv 文件时删除它?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48899051/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to drop a specific column of csv file while reading it using pandas?
提问by Anon George
I need to remove a columnwith label nameat the time of loading a csv using pandas
. I am reading csv as follows and want to add parameters inside it to do so. Thanks.
我需要在使用 .csv 加载 csv 时删除带有标签名称的列。我正在按如下方式读取 csv 并希望在其中添加参数。谢谢。pandas
pd.read_csv("sample.csv")
pd.read_csv("sample.csv")
I know this to do after reading csv:
我在阅读 csv 后知道要这样做:
df.drop('name', axis=1)
回答by Sociopath
If you know the column names prior, you can do it by setting usecols
parameter
如果您事先知道列名,则可以通过设置usecols
参数来完成
When you know which columns to use
当您知道要使用哪些列时
Suppose you have csv file with columns ['id','name','last_name']
and you want just ['name','last_name']
. You can do it as below:
假设您有包含列的 csv 文件,['id','name','last_name']
而您只需要['name','last_name']
. 你可以这样做:
import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])
when you want first N columns
当你想要前 N 列时
If you don't know the column names but you want first N columns from dataframe. You can do it by
如果您不知道列名,但您想要数据框中的前 N 列。你可以通过
import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])
Edit
编辑
When you know name of the column to be dropped
当您知道要删除的列的名称时
# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)
# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])
回答by cs95
Get the column headers from your CSV using pd.read_csv
with nrows=1
, then do a subsequent read with usecols
to pull everything but the column(s) you want to omit.
使用pd.read_csv
with从 CSV 中获取列标题nrows=1
,然后进行后续读取usecols
以提取除要省略的列之外的所有内容。
headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))
Alternatively, you can do the same thing (read only the headers) very efficientlyusing the CSV module,
或者,您可以使用 CSV 模块非常有效地执行相同的操作(仅读取标题),
import csv
with open("sample.csv", 'r') as f:
header = next(csv.reader(f))
# For python 2, use
# header = csv.reader(f).next()
df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))
回答by Ege
Using df= df.drop(['ID','prediction'],axis=1)
made the work for me. I dropped 'ID' and 'prediction' columns. Make sure you put them in square brackets like ['column1','column2']
.
There is need for other complicated solutions.
使用df= df.drop(['ID','prediction'],axis=1)
为我工作。我删除了“ID”和“预测”列。确保将它们放在方括号中,例如['column1','column2']
. 需要其他复杂的解决方案。