Python “ValueError：无法从重复轴重新索引”

Question

提问by Marzia

I have the following df:

我有以下 df：

Timestamp                            A      B      C     ...     
2014-11-09 00:00:00                     NaN     1      NaN   NaN      
2014-11-09 00:00:00                      2     NaN     NaN   NaN             
2014-11-09 00:00:00                     NaN    NaN     3     NaN   
2014-11-09 08:24:00                     NaN    NaN     1     NaN         
2014-11-09 08:24:00                     105    NaN     NaN   NaN           
2014-11-09 09:19:00                     NaN    NaN     23    NaN

And I would like to make the following:

我想做以下几点：

Timestamp                            A      B      C     ...     
2014-11-09 00:00:00                  2      1      3     NaN      
2014-11-09 00:01:00                  NaN    NaN    NaN   NaN
2014-11-09 00:02:00                  NaN    NaN    NaN   NaN
...                                  NaN    NaN    NaN   NaN
2014-11-09 08:23:00                  NaN    NaN    NaN   NaN
2014-11-09 08:24:00                  105    NaN     1    NaN         
2014-11-09 08:25:00                  NaN    NaN     NaN  NaN     
2014-11-09 08:26:00                  NaN    NaN     NaN  NaN
2014-11-09 08:27:00                  NaN    NaN     NaN  NaN      
...                                  NaN    NaN     NaN  NaN      
2014-11-09 09:18:00                  NaN    NaN     NaN  NaN  
2014-11-09 09:19:00                  NaN    NaN     23   NaN

That is: I would like to merge the columns with the same Timestamp (I have 17 columns), resample at 1 min granularity and for those column with no values I would like to have NaN.

那就是：我想合并具有相同时间戳的列（我有 17 列），以 1 分钟的粒度重新采样，对于那些没有值的列，我想要 NaN。

I started in the following ways:

我从以下几个方面着手：

df.groupby('Timestamp').sum()

and

和

df = df.resample('1Min', how='max')

but I obtained the following error:

但我得到了以下错误：

ValueError: cannot reindex from a duplicate axis

How can I solve this problem? I'm just learning Python so I don't have experience at all.

我怎么解决这个问题？我只是在学习 Python，所以我根本没有经验。

Thank you!

谢谢！

Answer 1

回答by Anzel

Assumed that you have your Timestampas index to begin with, you need to do the resample first, and reset_indexbefore doing a groupby, here's the working sample:

假设你有你的Timestampas 索引，你需要先做重采样，reset_index在做 a 之前groupby，这是工作示例：

import pandas as pd

df
                       A   B   C  ...
Timestamp                            
2014-11-09 00:00:00  NaN   1 NaN  NaN
2014-11-09 00:00:00    2 NaN NaN  NaN
2014-11-09 00:00:00  NaN NaN   3  NaN
2014-11-09 08:24:00  NaN NaN   1  NaN
2014-11-09 08:24:00  105 NaN NaN  NaN
2014-11-09 09:19:00  NaN NaN  23  NaN

df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum()

                      A   B   C  ...
Timestamp                           
2014-11-09 00:00:00   2   1   3  NaN
2014-11-09 00:01:00 NaN NaN NaN  NaN
2014-11-09 00:02:00 NaN NaN NaN  NaN
2014-11-09 00:03:00 NaN NaN NaN  NaN
2014-11-09 00:04:00 NaN NaN NaN  NaN
...
2014-11-09 09:17:00 NaN NaN NaN  NaN
2014-11-09 09:18:00 NaN NaN NaN  NaN
2014-11-09 09:19:00 NaN NaN  23  NaN

Hope this helps.

希望这可以帮助。

Updated:

更新：

As said in comment, your 'Timestamp' isn't datetime and probably as string so you cannot resample by DatetimeIndex, just reset_index and convert it something like this:

正如评论中所说，您的“时间戳”不是日期时间，可能是字符串，因此您无法通过 DatetimeIndex 重新采样，只需 reset_index 并将其转换如下：

df = df.reset_index()
df['ts'] = pd.to_datetime(df['Timestamp'])
# 'ts' is now datetime of 'Timestamp', you just need to set it to index
df = df.set_index('ts')
...

Now just run the previous code again but replace 'Timestamp' with 'ts' and you should be OK.

现在只需再次运行之前的代码，但将 'Timestamp' 替换为 'ts' 就可以了。

Python “ValueError：无法从重复轴重新索引”

提问by Marzia

回答by Anzel

Updated:

更新：

相关推荐

最近更新

标签

Python “ValueError：无法从重复轴重新索引”

提问by Marzia

回答by Anzel

Updated:

更新：

相关推荐

无法使用 python PDFKIT 创建 pdf 错误：“找不到 wkhtmltopdf 可执行文件：”

什么不能分配给函数调用是什么意思（python）

Python 'if __name__ == "__main__" 的目的：'

C++ 等效于 Python 字典

相关推荐

最近更新

标签

Python 'if name == "main" 的目的：'