Pandas 重新索引并填充缺失值:“索引必须是单调的”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37982170/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas reindex and fill missing values: "Index must be monotonic"
提问by michael_j_ward
In answering this stackoverflow question, I found some interesting behavior when using a fill method while reindexing a dataframe.
在回答这个 stackoverflow 问题时,我发现在重新索引数据帧时使用填充方法时出现了一些有趣的行为。
This old bug reportin pandas says that df.reindex(newIndex,method='ffill')
should be equivalent to df.reindex(newIndex).ffill()
, but that is NOT the behavior I'm witnessing
Pandas中的这个旧错误报告说df.reindex(newIndex,method='ffill')
应该相当于df.reindex(newIndex).ffill()
,但这不是我目睹的行为
Here's a code snippet that illustrates the behavior
这是一个说明行为的代码片段
df = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-05-04', '2016-06-03']))
newIndex = pd.DatetimeIndex(['2016-05-04', '2016-06-01', '2016-06-02', '2016-06-03', '2016-06-05'])
print(df.reindex(newIndex).ffill())
print(df.reindex(newIndex, method='ffill'))
The first print statement works as expected. The second raises a
第一个打印语句按预期工作。第二个提出了一个
ValueError: index must be monotonic increasing or decreasing
What's going on here?
这里发生了什么?
EDIT: Note that the sample df
intentionallyhas a non-monotonic index. The question pertains to the order of operations in df.reindex(newIndex, method='ffil')
. My expectation is as the bug-report says it should work- first reindex with the new index and then fill.
编辑:请注意,样本df
有意具有非单调索引。该问题与 中的操作顺序有关df.reindex(newIndex, method='ffil')
。我的期望是错误报告说它应该工作 - 首先使用新索引重新索引然后填充。
As you can see, the newIndex.is_monotonic
is True
, and the fill works when called separately but fails when called as a parameter to reindex
.
如您所见,newIndex.is_monotonic
isTrue
和 fill 在单独调用时有效,但在作为参数调用时失败reindex
。
回答by piRSquared
Some element of reindex
requires the incoming index to be sorted. I'm deducing that when method
is passed, it fails to presort the incoming index and subsequently fails. I'm drawing this conclusion based on the fact that this works:
的某些元素reindex
要求对传入索引进行排序。我推断当method
传递时,它无法对传入的索引进行预排序并随后失败。我根据以下事实得出这个结论:
print df.sort_index().reindex(newIndex.sort_values(), method='ffill')
回答by Eric
It seems that this needs to be done on the columns as well.
似乎这也需要在列上完成。
In[76]: frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'],columns=['Ohio', 'Texas', 'California'])
In[77]: frame.reindex(index=['a','b','c','d'],method='ffill',columns=states)
---> ValueError: index must be monotonic increasing or decreasing
In[78]: frame.reindex(index=['a','b','c','d'],method='ffill',columns=states.sort())
Out[78]:
Ohio Texas California
a 0 1 2
b 0 1 2
c 3 4 5
d 6 7 8