Python “ValueError:无法从重复轴重新索引”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27711623/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"ValueError: cannot reindex from a duplicate axis"
提问by Marzia
I have the following df:
我有以下 df:
Timestamp A B C ...
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
And I would like to make the following:
我想做以下几点:
Timestamp A B C ...
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 08:23:00 NaN NaN NaN NaN
2014-11-09 08:24:00 105 NaN 1 NaN
2014-11-09 08:25:00 NaN NaN NaN NaN
2014-11-09 08:26:00 NaN NaN NaN NaN
2014-11-09 08:27:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
That is: I would like to merge the columns with the same Timestamp (I have 17 columns), resample at 1 min granularity and for those column with no values I would like to have NaN.
那就是:我想合并具有相同时间戳的列(我有 17 列),以 1 分钟的粒度重新采样,对于那些没有值的列,我想要 NaN。
I started in the following ways:
我从以下几个方面着手:
df.groupby('Timestamp').sum()
and
和
df = df.resample('1Min', how='max')
but I obtained the following error:
但我得到了以下错误:
ValueError: cannot reindex from a duplicate axis
How can I solve this problem? I'm just learning Python so I don't have experience at all.
我怎么解决这个问题?我只是在学习 Python,所以我根本没有经验。
Thank you!
谢谢!
回答by Anzel
Assumed that you have your Timestampas index to begin with, you need to do the resample first, and reset_indexbefore doing a groupby, here's the working sample:
假设你有你的Timestampas 索引,你需要先做重采样,reset_index在做 a 之前groupby,这是工作示例:
import pandas as pd
df
A B C ...
Timestamp
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum()
A B C ...
Timestamp
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
2014-11-09 00:03:00 NaN NaN NaN NaN
2014-11-09 00:04:00 NaN NaN NaN NaN
...
2014-11-09 09:17:00 NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
Hope this helps.
希望这可以帮助。
Updated:
更新:
As said in comment, your 'Timestamp' isn't datetime and probably as string so you cannot resample by DatetimeIndex, just reset_index and convert it something like this:
正如评论中所说,您的“时间戳”不是日期时间,可能是字符串,因此您无法通过 DatetimeIndex 重新采样,只需 reset_index 并将其转换如下:
df = df.reset_index()
df['ts'] = pd.to_datetime(df['Timestamp'])
# 'ts' is now datetime of 'Timestamp', you just need to set it to index
df = df.set_index('ts')
...
Now just run the previous code again but replace 'Timestamp' with 'ts' and you should be OK.
现在只需再次运行之前的代码,但将 'Timestamp' 替换为 'ts' 就可以了。

