按名称列表切片 Pandas 中的多个列范围
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40698043/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Slicing multiple ranges of columns in Pandas, by list of names
提问by Guga
I am trying to select multiple columns in a Pandas dataframe in two different approaches:
我正在尝试以两种不同的方法在 Pandas 数据框中选择多列:
1)via the columns number, for examples, columns 1-3 and columns 6 onwards.
1) 通过列号,例如,第 1-3 列和第 6 列以后。
and
和
2)via a list of column names, for instance:
2)通过列名列表,例如:
years = list(range(2000,2017))
months = list(range(1,13))
years_month = list(["A", "B", "B"])
for y in years:
for m in months:
y_m = str(y) + "-" + str(m)
years_month.append(y_m)
Then, years_monthwould produce the following:
然后,years_month将产生以下结果:
['A',
'B',
'C',
'2000-1',
'2000-2',
'2000-3',
'2000-4',
'2000-5',
'2000-6',
'2000-7',
'2000-8',
'2000-9',
'2000-10',
'2000-11',
'2000-12',
'2001-1',
'2001-2',
'2001-3',
'2001-4',
'2001-5',
'2001-6',
'2001-7',
'2001-8',
'2001-9',
'2001-10',
'2001-11',
'2001-12']
That said, what is the best(or correct) way to load only the columns in which the names are in the list years_monthin the two approaches?
也就是说,在两种方法中仅加载名称在列表years_month中的列的最佳(或正确)方法是什么?
回答by jezrael
I think you need numpy.r_
for concanecate positions of columns, then use iloc
for selecting:
我认为您需要numpy.r_
连接列的位置,然后iloc
用于选择:
print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])
and for second approach subset by list
:
对于第二种方法子集list
:
print (df[years_month])
Sample:
样本:
df = pd.DataFrame({'2000-1':[1,3,5],
'2000-2':[5,3,6],
'2000-3':[7,8,9],
'2000-4':[1,3,5],
'2000-5':[5,3,6],
'2000-6':[7,8,9],
'2000-7':[1,3,5],
'2000-8':[5,3,6],
'2000-9':[7,4,3],
'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
print (df)
2000-1 2000-2 2000-3 2000-4 2000-5 2000-6 2000-7 2000-8 2000-9 A \
0 1 5 7 1 5 7 1 5 7 1
1 3 3 8 3 3 8 3 3 4 2
2 5 6 9 5 6 9 5 6 3 3
B C
0 4 7
1 5 8
2 6 9
print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])
2000-2 2000-3 2000-7 2000-8 2000-9 A B C
0 5 7 1 5 7 1 4 7
1 3 8 3 3 4 2 5 8
2 6 9 5 6 3 3 6 9
You can also sum of ranges
(cast to list
in python 3
is necessary):
您还可以总结ranges
(强制转换为list
inpython 3
是必要的):
rng = list(range(1,3)) + list(range(6, len(df.columns)))
print (rng)
[1, 2, 6, 7, 8, 9, 10, 11]
print (df.iloc[:, rng])
2000-2 2000-3 2000-7 2000-8 2000-9 A B C
0 5 7 1 5 7 1 4 7
1 3 8 3 3 4 2 5 8
2 6 9 5 6 3 3 6 9
回答by wonce
I'm not sure what exactly you are asking but in general DataFrame.loc
allows you to select by label, DataFrame.iloc
by index.
我不确定您到底在问什么,但通常DataFrame.loc
允许您按标签、DataFrame.iloc
按索引进行选择。
For example selecting columns # 0, 1 and 4:
例如选择第 0、1 和 4 列:
dataframe.iloc[:, [0, 1, 4]]
and selecting columns labelled 'A', 'B' and 'C':
并选择标有“A”、“B”和“C”的列:
dataframe.loc[:, ['A', 'B', 'C']]