Python 通过整数索引选择一行熊猫系列/数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16096627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting a row of pandas series/dataframe by integer index
提问by
I am curious as to why df[2]is not supported, while df.ix[2]and df[2:3]both work.
我很好奇,为什么df[2]不支持,而df.ix[2]与df[2:3]这两个工作。
In [26]: df.ix[2]
Out[26]:
A 1.027680
B 1.514210
C -1.466963
D -0.162339
Name: 2000-01-03 00:00:00
In [27]: df[2:3]
Out[27]:
A B C D
2000-01-03 1.02768 1.51421 -1.466963 -0.162339
I would expect df[2]to work the same way as df[2:3]to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?
我希望df[2]以df[2:3]与 Python 索引约定一致的方式工作。是否有不支持按单个整数索引行的设计原因?
采纳答案by Jeff
echoing @HYRY, see the new docs in 0.11
回应@HYRY,请参阅 0.11 中的新文档
http://pandas.pydata.org/pandas-docs/stable/indexing.html
http://pandas.pydata.org/pandas-docs/stable/indexing.html
Here we have new operators, .ilocto explicity support only integer indexing, and .locto explicity support only label indexing
这里我们有新的操作符,.iloc显式地只支持整数索引,并且.loc显式地只支持标签索引
e.g. imagine this scenario
例如想象一下这个场景
In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))
In [2]: df
Out[2]:
A B
0 1.068932 -0.794307
2 -0.470056 1.192211
4 -0.284561 0.756029
6 1.037563 -0.267820
8 -0.538478 -0.800654
In [5]: df.iloc[[2]]
Out[5]:
A B
4 -0.284561 0.756029
In [6]: df.loc[[2]]
Out[6]:
A B
2 -0.470056 1.192211
[]slices the rows (by label location) only
[]仅对行进行切片(按标签位置)
回答by HYRY
You can think DataFrame as a dict of Series. df[key]try to select the column index by keyand returns a Series object.
您可以将 DataFrame 视为 Series 的字典。df[key]尝试选择列索引key并返回一个 Series 对象。
However slicing inside of [] slices the rows, because it's a very common operation.
然而,在 [] 内部切片会对行进行切片,因为这是一个非常常见的操作。
You can read the document for detail:
您可以阅读文档了解详细信息:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
回答by waitingkuo
You can take a look at the source code.
你可以看看源代码。
DataFramehas a private function _slice()to slice the DataFrame, and it allows the parameter axisto determine which axis to slice. The __getitem__()for DataFramedoesn't set the axis while invoking _slice(). So the _slice()slice it by default axis 0.
DataFrame有一个私有函数_slice()来切片DataFrame,它允许参数axis来确定要切片的轴。在__getitem__()对DataFrame不设置轴,同时调用_slice()。因此_slice(),默认情况下将其切片为轴 0。
You can take a simple experiment, that might help you:
您可以进行一个简单的实验,这可能会对您有所帮助:
print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)
回答by user1401491
you can loop through the data frame like this .
你可以像这样循环遍历数据框。
for ad in range(1,dataframe_c.size):
print(dataframe_c.values[ad])
回答by Pavel Prochazka
To index-based access to the pandas table, one can also consider numpy.as_arrayoption to convert the table to Numpy array as
要基于索引访问 pandas 表,还可以考虑使用numpy.as_array选项将表转换为 Numpy 数组作为
np_df = df.as_matrix()
and then
进而
np_df[i]
would work.
会工作。
回答by Ted Petrou
The primary purpose of the DataFrame indexing operator, []is to select columns.
DataFrame 索引运算符的主要目的[]是选择列。
When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.
当索引运算符传递一个字符串或整数时,它会尝试查找具有该特定名称的列并将其作为系列返回。
So, in the question above: df[2]searches for a column name matching the integer value 2. This column does not exist and a KeyErroris raised.
因此,在上面的问题中:df[2]搜索与整数值匹配的列名2。此列不存在并KeyError引发了 a 。
The DataFrame indexing operator completely changes behavior to select rows when slice notation is used
当使用切片表示法时,DataFrame 索引运算符完全改变了选择行的行为
Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.
奇怪的是,当给定一个切片时,DataFrame 索引运算符选择行并且可以通过整数位置或索引标签来选择行。
df[2:3]
This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.
这将从整数位置 2 到 3 的行开始切片,不包括最后一个元素。所以,只有一行。以下每三行选择从整数位置 6 开始直到但不包括 20 的行。
df[6:20:3]
You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.
如果您的 DataFrame 索引中包含字符串,您还可以使用由字符串标签组成的切片。有关更多详细信息,请参阅有关 .iloc 与 .loc 的此解决方案。
I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc.
我几乎从不将这种切片符号与索引运算符一起使用,因为它不明确且几乎从未使用过。按行切片时,坚持使用.loc/.iloc.

