Python 索引 Pandas 数据框：整数行、命名列

Question

提问by Hillary Sanders

Say dfis a pandas dataframe.

说df是一个熊猫数据框。

df.loc[]only accepts names
df.iloc[]only accepts integers (actual placements)
df.ix[]accepts both names and integers:

df.loc[]只接受名字
df.iloc[]只接受整数（实际位置）
df.ix[]接受名称和整数：

When referencing rows, df.ix[row_idx, ]only wants to be given names. e.g.

引用行时，df.ix[row_idx, ]只想给定名称。例如

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
                   '1' : np.arange(6)})
df = df.ix[2:6]
print(df)

   1      a
2  2  three
3  3   four
4  4   five
5  5    six

df.ix[0, 'a']

throws an error, it doesn't give return 'two'.

抛出错误，它不会返回“二”。

When referencing columns, iloc is prefers integers, not names. e.g.

引用列时， iloc 更喜欢整数，而不是名称。例如

df.ix[2, 1]

returns 'three', not 2. (Although df.idx[2, '1']does return 2).

返回“三”，而不是 2。（虽然df.idx[2, '1']确实返回2）。

Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df)).

奇怪的是，我想要完全相反的功能。通常我的列名很有意义，所以在我的代码中我直接引用了它们。但是由于大量的观察清理，我的pandas数据框中的行名称通常不对应于range(len(df)).

I realize I can use:

我意识到我可以使用：

df.iloc[0].loc['a'] # returns three

But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?

但是好像很丑！有谁知道更好的方法来做到这一点，以便代码看起来像这样？

df.foo[0, 'a'] # returns three

In fact, is it possible to add on my own new method to pandas.core.frame.DataFrames, so e.g. df.idx(rows, cols)is in fact df.iloc[rows].loc[cols]?

事实上，是否可以将我自己的新方法添加到pandas.core.frame.DataFrames 中，例如 df.idx(rows, cols)实际上是这样df.iloc[rows].loc[cols]？

Answer 1

回答by brunston

It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.

这是一个迟到的答案，但@unutbu 的评论仍然有效，并且是解决这个问题的一个很好的方法。

To index a DataFrame with integer rows and named columns (labeled columns):

要使用整数行和命名列（标记列）索引 DataFrame：

df.loc[df.index[#], 'NAME']where #is a valid integer index and NAMEis the name of the column.

df.loc[df.index[#], 'NAME']其中#是有效的整数索引，NAME是列的名称。

Answer 2

回答by Krishna

we can reset the index and then use 0 based indexing like this

我们可以重置索引，然后像这样使用基于 0 的索引

df.reset_index(drop=True).loc[0,'a']

edit: removed []from col name index 'a'so it just outputs the value

编辑：[]从列名索引中删除，'a'所以它只输出值

Answer 3

回答by prashansa agrawal

Something like df["a"][0] is working fine for me. You may try it out!

df["a"][0] 之类的东西对我来说很好用。你可以试试看！

Answer 4

回答by Darkonaut

For getting or setting a singlevalue in a DataFrameby row/column labels, you better use DataFrame.atinstead of DataFrame.loc, as it is ...

要在按行/列标签中获取或设置单个值DataFrame，最好使用DataFrame.at代替DataFrame.loc，因为它是...

faster
you are more explicit about wanting to access only a single value.

快点
您更明确地希望只访问一个值。

How others have already shown, if you start out with an integer position for the row, you still have to find the row-label first with DataFrame.indexas DataFrame.atonly accepts labels:

其他人已经如何显示，如果您从行的整数位置开始，您仍然必须首先使用DataFrame.indexasDataFrame.at只接受标签找到行标签：

df.at[df.index[0], 'a']
# Out: 'three'

Benchmark:

基准：

%timeit df.at[df.index[0], 'a']
# 7.57 μs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.loc[df.index[0], 'a']
# 10.9 μs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.iloc[0, df.columns.get_loc("a")]
# 13.3 μs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For completeness:

为了完整性：

DataFrame.iatfor accessing a single value for a row/column pair by integer position.

DataFrame.iat用于按整数位置访问行/列对的单个值。

Answer 5

回答by Ben

The existing answers seem short-sighted to me.

现有的答案对我来说似乎是短视的。

Problematic Solutions

有问题的解决方案

df.loc[df.index[0], 'a']
The strategy here is to get the row label of the 0th row and then use .locas normal. I see two issues.
1. If df has repeated row labels, df.loc[df.index[0], 'a']could return multiple rows.
2. .locis slower than .ilocso you're sacrificing speed here.
df.reset_index(drop=True).loc[0,'a']
The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0]gives the same result as .iloc[0]. Still, the problem here is runtime, as .locis slower than .ilocand you'll incur a cost for resetting the index.

df.loc[df.index[0], 'a']
这里的策略是获取第0行的行标签，然后.loc正常使用。我看到两个问题。
1. 如果 df 有重复的行标签，则df.loc[df.index[0], 'a']可能返回多行。
2. .loc比.iloc你在这里牺牲速度慢。
df.reset_index(drop=True).loc[0,'a']
这里的策略是重置索引，使行标签变为 0, 1, 2, ... 从而.loc[0]给出与相同的结果.iloc[0]。不过，这里的问题是运行时，因为.loc它比它慢，.iloc并且您将产生重置索引的成本。

Better Solution

更好的解决方案

I suggest following @Landmaster's solution in the comments.

我建议在评论中遵循@Landmaster 的解决方案。

df.iloc[0, df.columns.get_loc("a")]

Essentially, this is the same as df.iloc[0, 0]except we get the column index dynamically using df.columns.get_loc("a"). The multi-column generalization of this would be something like

本质上，这与df.iloc[0, 0]我们使用动态获取列索引相同df.columns.get_loc("a")。这的多列概括将类似于

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]

Python 索引 Pandas 数据框：整数行、命名列

提问by Hillary Sanders

回答by brunston

回答by Krishna

回答by prashansa agrawal

回答by Darkonaut

回答by Ben

Problematic Solutions

有问题的解决方案

Better Solution

更好的解决方案

相关推荐

最近更新

标签

Python 索引 Pandas 数据框：整数行、命名列

提问by Hillary Sanders

回答by brunston

回答by Krishna

回答by prashansa agrawal

回答by Darkonaut

回答by Ben

Problematic Solutions

有问题的解决方案

Better Solution

更好的解决方案

相关推荐

在 Python 中如何获取 dict 的局部视图？

Python 如何从 CSV 文件导入数据并将其存储在变量中？

PyCharm 错误：尝试导入自己的模块时出现“无模块”（python 脚本）

如何更改 Sublime text 3 中的默认 Python 解释器

相关推荐

最近更新

标签