Python pandas:使用 loc 迭代 DataFrame 索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27501694/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:53:23  来源:igfitidea点击:

pandas: iterating over DataFrame index with loc

pythonpandasindexing

提问by user3176500

I can't seem to find the reasoning behind the behaviour of .loc. I know it is label based, so if I iterate over Index object the following minimal example should work. But it doesn't. I googled of course but I need additional explanation from someone who has already got a grip on indexing.

我似乎无法找到 .loc 行为背后的原因。我知道它是基于标签的,所以如果我遍历 Index 对象,下面的最小示例应该可以工作。但事实并非如此。我当然用谷歌搜索,但我需要已经掌握索引的人的额外解释。

import datetime
import pandas as pd

dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 15), freq='D'),   columns=['Date'])
df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])

for idx in df.index:
    print df.loc[idx, 'Weekday']

采纳答案by unutbu

The problem is not in df.loc; df.loc[idx, 'Weekday']is just returning a Series. The surprising behavior is due to the way pd.Seriestries to cast datetime-like values to Timestamps.

问题不在df.locdf.loc[idx, 'Weekday']只是返回一个系列。令人惊讶的行为是由于pd.Series尝试将类似日期时间的值转换为时间戳的方式。

df.loc[0, 'Weekday']

forms the Series

形成系列

pd.Series(np.array([pd.Timestamp('2014-01-01 00:00:00'), 'WED'], dtype=object))

When pd.Series(...)is called, it tries to cast the datato an appropriate dtype.

pd.Series(...)被调用时,它会尝试将数据转换为适当的 dtype。

If you trace through the code, you'll find that it eventually arrives at these lines in pandas.core.common._possibly_infer_to_datetimelike:

如果您跟踪代码,您会发现它最终到达pandas.core.common._possibly_infer_to_datetimelike中的这些行

sample = v[:min(3,len(v))]
inferred_type = lib.infer_dtype(sample)

which is inspecting the first few elements of the data and trying to infer the dtype. When one of the values is a pd.Timestamp, Pandas checks to see if all the data can be cast as Timestamps. Indeed, 'Wed'can be cast to pd.Timestamp:

它正在检查数据的前几个元素并尝试推断 dtype。当其中一个值是 pd.Timestamp 时,Pandas 会检查是否所有数据都可以转换为时间戳。确实,'Wed'可以强制转换为 pd.Timestamp:

In [138]: pd.Timestamp('Wed')
Out[138]: Timestamp('2014-12-17 00:00:00')

This is the root of the problem, which results in pd.Seriesreturning two Timestamps instead of a Timestamp and a string:

这是问题的根源,导致pd.Series返回两个时间戳而不是时间戳和字符串:

In [139]: pd.Series(np.array([pd.Timestamp('2014-01-01 00:00:00'), 'WED'], dtype=object))
Out[139]: 
0   2014-01-01
1   2014-12-17
dtype: datetime64[ns]

and thus this returns

因此这返回

In [140]: df.loc[0, 'Weekday']
Out[140]: Timestamp('2014-12-17 00:00:00')

instead of 'Wed'.

而不是'Wed'.



Alternative: select the Series df['Weekday']first:

替代方法:df['Weekday']首先选择系列

There are many workarounds; EdChum shows that adding a non-datelike (integer) value to the sample can prevent pd.Series from casting all the values to Timestamps.

有很多解决方法;EdChum 表明,向样本添加非日期(整数)值可以防止 pd.Series 将所有值转换为时间戳。

Alternatively, you could access df['Weekdays']beforeusing .loc:

或者,您可以df['Weekdays']使用之前访问.loc

for idx in df.index:
    print df['Weekday'].loc[idx]


Alternative: df.loc[[idx], 'Weekday']:

替代:df.loc[[idx], 'Weekday']

Another alternative is

另一种选择是

for idx in df.index:
    print df.loc[[idx], 'Weekday'].item()

df.loc[[idx], 'Weekday']first selects the DataFramedf.loc[[idx]]. For example, when idxequals 0,

df.loc[[idx], 'Weekday']首先选择DataFramedf.loc[[idx]]。例如,当idx等于时0

In [10]: df.loc[[0]]
Out[10]: 
        Date Weekday
0 2014-01-01     WED

whereas df.loc[0]returns the Series:

df.loc[0]返回系列:

In [11]: df.loc[0]
Out[11]: 
Date      2014-01-01
Weekday   2014-12-17
Name: 0, dtype: datetime64[ns]

Series tries to cast the values to a single useful dtype. DataFrames can have a different dtype for each column. So the Timestamp in the Datecolumn does not affect the dtype of the value in the Weekdaycolumn.

Series 尝试将值转换为单个有用的 dtype。DataFrame 可以为每一列使用不同的 dtype。所以Date列中的时间戳不会影响列中值的 dtype Weekday

So the problem was avoided by using an index selector which returns a DataFrame.

因此,通过使用返回 DataFrame 的索引选择器避免了该问题。



Alternative: use integers for Weekday

替代方案:在工作日使用整数

Yet another alternative is to store the isoweekday integer in Weekday, and convert to strings only at the end when you print:

另一种选择是将 isoweekday 整数存储在 中Weekday,并仅在打印结束时转换为字符串:

import datetime
import pandas as pd

dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 15), freq='D'),   columns=['Date'])
df['Weekday'] = df['Date'].dt.weekday+1   # add 1 for isoweekday

for idx in df.index:
    print dict_weekday[df.loc[idx, 'Weekday']]


Alternative: use df.ix:

替代:使用df.ix

df.locis a _LocIndexer, whereas df.ixis a _IXIndexer. They have different __getitem__methods. If you step through the code (for example, using pdb) you'll find that df.ixcalls df.getvalue:

df.loc是 a _LocIndexer,而df.ix是 a _IXIndexer。他们有不同的__getitem__方法。如果您单步执行代码(例如,使用 pdb),您会发现df.ix调用df.getvalue

def __getitem__(self, key):
    if type(key) is tuple:
        try:
            values = self.obj.get_value(*key)

and the DataFrame method df.get_valuesucceeds in returning 'WED':

并且 DataFrame 方法df.get_value成功返回'WED'

In [14]: df.get_value(0, 'Weekday')
Out[14]: 'WED'

This is why df.ixis another alternative that works here.

这就是为什么df.ix这里有另一种选择。

回答by EdChum

This seems like a bug to me, for reference I am using python 3.3.5 64-bit, pandas 0.15.1 and numpy 1.9.1:

这对我来说似乎是一个错误,作为参考,我使用的是 python 3.3.5 64 位、pandas 0.15.1 和 numpy 1.9.1:

Your code shows that although it is printing as strings the dtype is a timestamp:

您的代码显示,虽然它打印为字符串,但 dtype 是一个时间戳:

In [56]:

df.iloc[0]['Weekday']
Out[56]:
Timestamp('2014-12-17 00:00:00')

If I do the following then it stays as a string:

如果我执行以下操作,则它保留为字符串:

In [58]:

df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])
df['WeekdayInt'] = df['Date'].map(lambda x: x.isoweekday())
df.iloc[0]['Weekday']
Out[58]:
'WED'

The above is odd as all I did was add a second column.

以上很奇怪,因为我所做的只是添加了第二列。

Similarly if I create a column to store the int day value and then perform the apply then it works also:

同样,如果我创建一个列来存储 int day 值,然后执行应用,那么它也可以工作:

In [60]:

df['WeekdayInt'] = df['Date'].map(lambda x: x.isoweekday())
df['Weekday'] = df['WeekdayInt'].apply(lambda x: dict_weekday[x])
df.iloc[0]['Weekday']
Out[60]:
'WED'

It looks like the dtype is somehow persisting or not being assigned correctly if it's the first column appended.

如果它是附加的第一列,看起来 dtype 以某种方式持续存在或未正确分配。