Python pandas:使用 loc 迭代 DataFrame 索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27501694/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: iterating over DataFrame index with loc
提问by user3176500
I can't seem to find the reasoning behind the behaviour of .loc. I know it is label based, so if I iterate over Index object the following minimal example should work. But it doesn't. I googled of course but I need additional explanation from someone who has already got a grip on indexing.
我似乎无法找到 .loc 行为背后的原因。我知道它是基于标签的,所以如果我遍历 Index 对象,下面的最小示例应该可以工作。但事实并非如此。我当然用谷歌搜索,但我需要已经掌握索引的人的额外解释。
import datetime
import pandas as pd
dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 15), freq='D'), columns=['Date'])
df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])
for idx in df.index:
print df.loc[idx, 'Weekday']
采纳答案by unutbu
The problem is not in df.loc
;
df.loc[idx, 'Weekday']
is just returning a Series.
The surprising behavior is due to the way pd.Series
tries to cast datetime-like values to Timestamps.
问题不在df.loc
;
df.loc[idx, 'Weekday']
只是返回一个系列。令人惊讶的行为是由于pd.Series
尝试将类似日期时间的值转换为时间戳的方式。
df.loc[0, 'Weekday']
forms the Series
形成系列
pd.Series(np.array([pd.Timestamp('2014-01-01 00:00:00'), 'WED'], dtype=object))
When pd.Series(...)
is called, it tries to cast the datato an appropriate dtype.
当pd.Series(...)
被调用时,它会尝试将数据转换为适当的 dtype。
If you trace through the code, you'll find that it eventually arrives at these lines in pandas.core.common._possibly_infer_to_datetimelike:
如果您跟踪代码,您会发现它最终到达pandas.core.common._possibly_infer_to_datetimelike中的这些行:
sample = v[:min(3,len(v))]
inferred_type = lib.infer_dtype(sample)
which is inspecting the first few elements of the data and trying to infer the dtype.
When one of the values is a pd.Timestamp, Pandas checks to see if all the data can be cast as Timestamps. Indeed, 'Wed'
can be cast to pd.Timestamp:
它正在检查数据的前几个元素并尝试推断 dtype。当其中一个值是 pd.Timestamp 时,Pandas 会检查是否所有数据都可以转换为时间戳。确实,'Wed'
可以强制转换为 pd.Timestamp:
In [138]: pd.Timestamp('Wed')
Out[138]: Timestamp('2014-12-17 00:00:00')
This is the root of the problem, which results in pd.Series
returning
two Timestamps instead of a Timestamp and a string:
这是问题的根源,导致pd.Series
返回两个时间戳而不是时间戳和字符串:
In [139]: pd.Series(np.array([pd.Timestamp('2014-01-01 00:00:00'), 'WED'], dtype=object))
Out[139]:
0 2014-01-01
1 2014-12-17
dtype: datetime64[ns]
and thus this returns
因此这返回
In [140]: df.loc[0, 'Weekday']
Out[140]: Timestamp('2014-12-17 00:00:00')
instead of 'Wed'
.
而不是'Wed'
.
Alternative: select the Series df['Weekday']
first:
替代方法:df['Weekday']
首先选择系列:
There are many workarounds; EdChum shows that adding a non-datelike (integer) value to the sample can prevent pd.Series from casting all the values to Timestamps.
有很多解决方法;EdChum 表明,向样本添加非日期(整数)值可以防止 pd.Series 将所有值转换为时间戳。
Alternatively, you could access df['Weekdays']
beforeusing .loc
:
或者,您可以df['Weekdays']
在使用之前访问.loc
:
for idx in df.index:
print df['Weekday'].loc[idx]
Alternative: df.loc[[idx], 'Weekday']
:
替代:df.loc[[idx], 'Weekday']
:
Another alternative is
另一种选择是
for idx in df.index:
print df.loc[[idx], 'Weekday'].item()
df.loc[[idx], 'Weekday']
first selects the DataFramedf.loc[[idx]]
. For example, when idx
equals 0
,
df.loc[[idx], 'Weekday']
首先选择DataFramedf.loc[[idx]]
。例如,当idx
等于时0
,
In [10]: df.loc[[0]]
Out[10]:
Date Weekday
0 2014-01-01 WED
whereas df.loc[0]
returns the Series:
而df.loc[0]
返回系列:
In [11]: df.loc[0]
Out[11]:
Date 2014-01-01
Weekday 2014-12-17
Name: 0, dtype: datetime64[ns]
Series tries to cast the values to a single useful dtype. DataFrames can have a different dtype for each column. So the Timestamp in the Date
column does not affect the dtype of the value in the Weekday
column.
Series 尝试将值转换为单个有用的 dtype。DataFrame 可以为每一列使用不同的 dtype。所以Date
列中的时间戳不会影响列中值的 dtype Weekday
。
So the problem was avoided by using an index selector which returns a DataFrame.
因此,通过使用返回 DataFrame 的索引选择器避免了该问题。
Alternative: use integers for Weekday
替代方案:在工作日使用整数
Yet another alternative is to store the isoweekday integer in Weekday
, and convert to strings only at the end when you print:
另一种选择是将 isoweekday 整数存储在 中Weekday
,并仅在打印结束时转换为字符串:
import datetime
import pandas as pd
dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 15), freq='D'), columns=['Date'])
df['Weekday'] = df['Date'].dt.weekday+1 # add 1 for isoweekday
for idx in df.index:
print dict_weekday[df.loc[idx, 'Weekday']]
Alternative: use df.ix
:
替代:使用df.ix
:
df.loc
is a _LocIndexer
, whereas df.ix
is a _IXIndexer
. They have
different __getitem__
methods. If you step through the code (for example, using pdb) you'll find that df.ix
calls df.getvalue
:
df.loc
是 a _LocIndexer
,而df.ix
是 a _IXIndexer
。他们有不同的__getitem__
方法。如果您单步执行代码(例如,使用 pdb),您会发现df.ix
调用df.getvalue
:
def __getitem__(self, key):
if type(key) is tuple:
try:
values = self.obj.get_value(*key)
and the DataFrame method df.get_value
succeeds in returning 'WED'
:
并且 DataFrame 方法df.get_value
成功返回'WED'
:
In [14]: df.get_value(0, 'Weekday')
Out[14]: 'WED'
This is why df.ix
is another alternative that works here.
这就是为什么df.ix
这里有另一种选择。
回答by EdChum
This seems like a bug to me, for reference I am using python 3.3.5 64-bit, pandas 0.15.1 and numpy 1.9.1:
这对我来说似乎是一个错误,作为参考,我使用的是 python 3.3.5 64 位、pandas 0.15.1 和 numpy 1.9.1:
Your code shows that although it is printing as strings the dtype is a timestamp:
您的代码显示,虽然它打印为字符串,但 dtype 是一个时间戳:
In [56]:
df.iloc[0]['Weekday']
Out[56]:
Timestamp('2014-12-17 00:00:00')
If I do the following then it stays as a string:
如果我执行以下操作,则它保留为字符串:
In [58]:
df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])
df['WeekdayInt'] = df['Date'].map(lambda x: x.isoweekday())
df.iloc[0]['Weekday']
Out[58]:
'WED'
The above is odd as all I did was add a second column.
以上很奇怪,因为我所做的只是添加了第二列。
Similarly if I create a column to store the int day value and then perform the apply then it works also:
同样,如果我创建一个列来存储 int day 值,然后执行应用,那么它也可以工作:
In [60]:
df['WeekdayInt'] = df['Date'].map(lambda x: x.isoweekday())
df['Weekday'] = df['WeekdayInt'].apply(lambda x: dict_weekday[x])
df.iloc[0]['Weekday']
Out[60]:
'WED'
It looks like the dtype is somehow persisting or not being assigned correctly if it's the first column appended.
如果它是附加的第一列,看起来 dtype 以某种方式持续存在或未正确分配。