Python “ValueError:无法从重复轴重新索引”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27711623/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"ValueError: cannot reindex from a duplicate axis"
提问by Marzia
I have the following df:
我有以下 df:
Timestamp A B C ...
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
And I would like to make the following:
我想做以下几点:
Timestamp A B C ...
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 08:23:00 NaN NaN NaN NaN
2014-11-09 08:24:00 105 NaN 1 NaN
2014-11-09 08:25:00 NaN NaN NaN NaN
2014-11-09 08:26:00 NaN NaN NaN NaN
2014-11-09 08:27:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
That is: I would like to merge the columns with the same Timestamp (I have 17 columns), resample at 1 min granularity and for those column with no values I would like to have NaN.
那就是:我想合并具有相同时间戳的列(我有 17 列),以 1 分钟的粒度重新采样,对于那些没有值的列,我想要 NaN。
I started in the following ways:
我从以下几个方面着手:
df.groupby('Timestamp').sum()
and
和
df = df.resample('1Min', how='max')
but I obtained the following error:
但我得到了以下错误:
ValueError: cannot reindex from a duplicate axis
How can I solve this problem? I'm just learning Python so I don't have experience at all.
我怎么解决这个问题?我只是在学习 Python,所以我根本没有经验。
Thank you!
谢谢!
回答by Anzel
Assumed that you have your Timestamp
as index to begin with, you need to do the resample first, and reset_index
before doing a groupby
, here's the working sample:
假设你有你的Timestamp
as 索引,你需要先做重采样,reset_index
在做 a 之前groupby
,这是工作示例:
import pandas as pd
df
A B C ...
Timestamp
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum()
A B C ...
Timestamp
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
2014-11-09 00:03:00 NaN NaN NaN NaN
2014-11-09 00:04:00 NaN NaN NaN NaN
...
2014-11-09 09:17:00 NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
Hope this helps.
希望这可以帮助。
Updated:
更新:
As said in comment, your 'Timestamp' isn't datetime and probably as string so you cannot resample by DatetimeIndex, just reset_index and convert it something like this:
正如评论中所说,您的“时间戳”不是日期时间,可能是字符串,因此您无法通过 DatetimeIndex 重新采样,只需 reset_index 并将其转换如下:
df = df.reset_index()
df['ts'] = pd.to_datetime(df['Timestamp'])
# 'ts' is now datetime of 'Timestamp', you just need to set it to index
df = df.set_index('ts')
...
Now just run the previous code again but replace 'Timestamp' with 'ts' and you should be OK.
现在只需再次运行之前的代码,但将 'Timestamp' 替换为 'ts' 就可以了。