pandas 在熊猫中按位置或索引访问列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46072129/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Access column by position or index in pandas
提问by jezrael
I have a list as follows and I search it in a csv file to get the item code associate with it. E.g., for 0 -> item code is 11nm
我有一个列表如下,我在一个 csv 文件中搜索它以获取与其关联的项目代码。例如,对于 0 -> 项目代码是 11nm
L = [0, 2]
CSV file:
0, 11nm
1, 22nm
2, 33nm
3, 44nm
I am currently doing it as follows.
我目前正在这样做。
df = pd.read_csv('item_code.csv', sep = ',')
item_codes= df[df["No"].isin(L)]["item_code"].tolist()
However, now I want to know how to do the same thing for a csv file when the file headings (No, item_code) is unavailable.
但是,现在我想知道当文件标题(否,item_code)不可用时如何对 csv 文件执行相同的操作。
Please help me.
请帮我。
回答by cs95
When the column names are unavailable, you can refer to them by index using df.iloc
:
当列名不可用时,您可以使用df.iloc
以下方法通过索引引用它们:
item_codes = df[df.iloc[:, 0].isin(L)].iloc[:, 1].tolist()
MCVE:
MCVE:
import pandas as pd
import numpy as np
import io
text = \
'''0, 11nm
1, 22nm
2, 33nm
3, 44nm'''
buf = io.StringIO(text)
df = pd.read_csv(buf, sep=',\s*', header=None, engine='python') # no column names
print(df)
0 1
0 0 11nm
1 1 22nm
2 2 33nm
3 3 44nm
L = [0, 2]
item_codes = df[df.iloc[:, 0].isin(L)].iloc[:, 1]
print(item_codes)
0 11nm
2 33nm
Name: 1, dtype: object
print(item_codes.tolist())
['11nm', '33nm']
Notes:
笔记:
sep=',\s*'
is a regex pattern (to specify column delimiters)header=None
will prevent any rows from being assignedengine='python'
to select the regex engine
sep=',\s*'
是正则表达式模式(用于指定列分隔符)header=None
将阻止分配任何行engine='python'
选择正则表达式引擎
回答by jezrael
You can use parameter names
for specify columns names, for select column use loc
:
您可以使用参数names
来指定列名称,用于选择列loc
:
df = pd.read_csv('item_code.csv', names=['No','item_code'])
print (df)
No item_code
0 0 11nm
1 1 22nm
2 2 33nm
3 3 44nm
item_codes= df.loc[df["No"].isin(L), "item_code"].tolist()
print (item_codes)
['11nm', '33nm']
Or use parameter header=None
for default columns names 0,1...
:
或使用参数header=None
作为默认列名0,1...
:
df = pd.read_csv('item_code.csv', header=None)
print (df)
0 1
0 0 11nm
1 1 22nm
2 2 33nm
3 3 44nm
#first column selected by position with iloc
item_codes= df.loc[df.iloc[:,0].isin(L), 1].tolist()
print (item_codes)
['11nm', '33nm']
#first column selected by column name
item_codes= df.loc[df[0].isin(L), 1].tolist()
print (item_codes)
['11nm', '33nm']
回答by Mohamed Ali JAMAOUI
After reading the csv file with header=None
, to let pandas know that you don't have a header in your file:
使用 阅读 csv 文件后header=None
,让 Pandas 知道您的文件中没有标题:
df = pd.read_csv('item_code.csv', sep = ',', header=None)
You can use the column index instead of the column name.
您可以使用列索引代替列名。
Like this :
像这样 :
df[df[0].isin(L)][1].tolist()
or this :
或这个 :
df[df.iloc[:,0].isin(L)][1].tolist()
Explanation:
解释:
if you print the dataframe after reading it without header with print(df)
如果在没有标题的情况下阅读数据帧后打印数据帧 print(df)
0 1
0 0 11nm
1 1 22nm
2 2 33nm
3 3 44nm
You will notice that pandas assigns the number [0,1]
to the column names instead of the ["No", "item_code"]
that weren't present as a header. Thus, you can reference each column with its index like this df[0]
or df.iloc[:, 0]
.
您会注意到,pandas 将编号分配[0,1]
给列名,而不是["No", "item_code"]
未作为标题出现的列名。因此,您可以像这样df[0]
或df.iloc[:, 0]
.
The latter df.iloc[:, 0]
tells pandas to take all rows and only column 0
.
后者df.iloc[:, 0]
告诉 Pandas 获取所有行且仅获取 column 0
。