Python 索引 Pandas 数据框:整数行、命名列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28754603/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:41:20  来源:igfitidea点击:

Indexing Pandas data frames: integer rows, named columns

pythonpandasdataframe

提问by Hillary Sanders

Say dfis a pandas dataframe.

df是一个熊猫数据框。

  • df.loc[]only accepts names
  • df.iloc[]only accepts integers (actual placements)
  • df.ix[]accepts both names and integers:
  • df.loc[]只接受名字
  • df.iloc[]只接受整数(实际位置)
  • df.ix[]接受名称和整数:

When referencing rows, df.ix[row_idx, ]only wants to be given names. e.g.

引用行时,df.ix[row_idx, ]只想给定名称。例如

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
                   '1' : np.arange(6)})
df = df.ix[2:6]
print(df)

   1      a
2  2  three
3  3   four
4  4   five
5  5    six

df.ix[0, 'a']

throws an error, it doesn't give return 'two'.

抛出错误,它不会返回“二”。

When referencing columns, iloc is prefers integers, not names. e.g.

引用列时, iloc 更喜欢整数,而不是名称。例如

df.ix[2, 1]

returns 'three', not 2. (Although df.idx[2, '1']does return 2).

返回“三”,而不是 2。(虽然df.idx[2, '1']确实返回2)。

Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df)).

奇怪的是,我想要完全相反的功能。通常我的列名很有意义,所以在我的代码中我直接引用了它们。但是由于大量的观察清理,我的pandas数据框中的行名称通常不对应于range(len(df)).

I realize I can use:

我意识到我可以使用:

df.iloc[0].loc['a'] # returns three

But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?

但是好像很丑!有谁知道更好的方法来做到这一点,以便代码看起来像这样?

df.foo[0, 'a'] # returns three

In fact, is it possible to add on my own new method to pandas.core.frame.DataFrames, so e.g. df.idx(rows, cols)is in fact df.iloc[rows].loc[cols]?

事实上,是否可以将我自己的新方法添加到pandas.core.frame.DataFrames 中,例如 df.idx(rows, cols)实际上是这样df.iloc[rows].loc[cols]

回答by brunston

It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.

这是一个迟到的答案,但@unutbu 的评论仍然有效,并且是解决这个问题的一个很好的方法。

To index a DataFrame with integer rows and named columns (labeled columns):

要使用整数行和命名列(标记列)索引 DataFrame:

df.loc[df.index[#], 'NAME']where #is a valid integer index and NAMEis the name of the column.

df.loc[df.index[#], 'NAME']其中#是有效的整数索引,NAME是列的名称。

回答by Krishna

we can reset the index and then use 0 based indexing like this

我们可以重置索引,然后像这样使用基于 0 的索引

df.reset_index(drop=True).loc[0,'a']

df.reset_index(drop=True).loc[0,'a']

edit: removed []from col name index 'a'so it just outputs the value

编辑:[]从列名索引中删除,'a'所以它只输出值

回答by prashansa agrawal

Something like df["a"][0] is working fine for me. You may try it out!

df["a"][0] 之类的东西对我来说很好用。你可以试试看!

回答by Darkonaut

For getting or setting a singlevalue in a DataFrameby row/column labels, you better use DataFrame.atinstead of DataFrame.loc, as it is ...

要在按行/列标签中获取或设置单个DataFrame,最好使用DataFrame.at代替DataFrame.loc,因为它是...

  1. faster
  2. you are more explicit about wanting to access only a single value.
  1. 快点
  2. 您更明确地希望只访问一个值。

How others have already shown, if you start out with an integer position for the row, you still have to find the row-label first with DataFrame.indexas DataFrame.atonly accepts labels:

其他人已经如何显示,如果您从行的整数位置开始,您仍然必须首先使用DataFrame.indexasDataFrame.at只接受标签找到行标签:

df.at[df.index[0], 'a']
# Out: 'three'

Benchmark:

基准:

%timeit df.at[df.index[0], 'a']
# 7.57 μs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.loc[df.index[0], 'a']
# 10.9 μs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.iloc[0, df.columns.get_loc("a")]
# 13.3 μs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


For completeness:

为了完整性:

DataFrame.iatfor accessing a single value for a row/column pair by integer position.

DataFrame.iat用于按整数位置访问行/列对的单个值。

回答by Ben

The existing answers seem short-sighted to me.

现有的答案对我来说似乎是短视的。

Problematic Solutions

有问题的解决方案

  1. df.loc[df.index[0], 'a']
    The strategy here is to get the row label of the 0th row and then use .locas normal. I see two issues.

    1. If df has repeated row labels, df.loc[df.index[0], 'a']could return multiple rows.
    2. .locis slower than .ilocso you're sacrificing speed here.
  2. df.reset_index(drop=True).loc[0,'a']
    The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0]gives the same result as .iloc[0]. Still, the problem here is runtime, as .locis slower than .ilocand you'll incur a cost for resetting the index.

  1. df.loc[df.index[0], 'a']
    这里的策略是获取第0行的行标签,然后.loc正常使用。我看到两个问题。

    1. 如果 df 有重复的行标签,则df.loc[df.index[0], 'a']可能返回多行。
    2. .loc.iloc你在这里牺牲速度慢。
  2. df.reset_index(drop=True).loc[0,'a']
    这里的策略是重置索引,使行标签变为 0, 1, 2, ... 从而.loc[0]给出与 相同的结果.iloc[0]。不过,这里的问题是运行时,因为.loc它比它慢,.iloc并且您将产生重置索引的成本。

Better Solution

更好的解决方案

I suggest following @Landmaster's solution in the comments.

我建议在评论中遵循@Landmaster 的解决方案。

df.iloc[0, df.columns.get_loc("a")]

df.iloc[0, df.columns.get_loc("a")]

Essentially, this is the same as df.iloc[0, 0]except we get the column index dynamically using df.columns.get_loc("a"). The multi-column generalization of this would be something like

本质上,这与df.iloc[0, 0]我们使用 动态获取列索引相同df.columns.get_loc("a")。这的多列概括将类似于

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]