pandas 处理 ValueError 的便捷方法:无法从重复轴重新索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51953869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convenient way to deal with ValueError: cannot reindex from a duplicate axis
提问by dia
I am able to search suggestions that show the 'cause' of this error message, but not how to address it -
我可以搜索显示此错误消息“原因”的建议,但无法搜索解决方法 -
I encounter this problem every time I try to add a new column to a pandas dataframe by concatenating string values in 2 existing columns.
每次尝试通过连接 2 个现有列中的字符串值向 Pandas 数据框添加新列时,我都会遇到此问题。
For instance:
例如:
wind['timestamp'] = wind['DATE (MM/DD/YYYY)'] + ' ' + temp['stamp']
It works if the first item and the second merged with ' ' are each separate dataframe/series.
如果第一个项目和第二个与 ' ' 合并的每个项目都是单独的数据帧/系列,则它有效。
These attempts are to have date & time merged into the same column so that they get recognized as datetime stamps by pandas library.
这些尝试是将日期和时间合并到同一列中,以便 Pandas 库将它们识别为日期时间戳。
I am not certain if I am wrongly using the command or if it is the pandas library features are internally limited, as it keeps returning the duplicate axis
error msg. I understand the latter is highly unlikely hahaha ...
我不确定我是否错误地使用了该命令,或者是 Pandas 库功能在内部受到限制,因为它不断返回duplicate axis
错误消息。我知道后者不太可能哈哈哈......
Could I hear some quick and easy solution out of this?
我能听到一些快速简便的解决方案吗?
I mean, I thought sum/subtract and all these operations between column values in a dataframe would be quite easy. Shouldn't be too hard to have it visible on the table either right?
我的意思是,我认为数据框中的列值之间的求和/减法以及所有这些操作都非常简单。让它在桌子上可见应该不会太难吧?
回答by jpp
Operations between series require non-duplicated indices, otherwise Pandas doesn't know how to align values in calculations. This isn't case with your data currently.
系列之间的操作需要非重复的索引,否则 Pandas 不知道如何在计算中对齐值。目前您的数据并非如此。
If you are certain that your series are aligned by position, you can call reset_index
on each dataframe:
如果您确定您的系列按 position对齐,则可以调用reset_index
每个数据框:
wind = pd.DataFrame({'DATE (MM/DD/YYYY)': ['2018-01-01', '2018-02-01', '2018-03-01']})
temp = pd.DataFrame({'stamp': ['1', '2', '3']}, index=[0, 1, 1])
# ATTEMPT 1: FAIL
wind['timestamp'] = wind['DATE (MM/DD/YYYY)'] + ' ' + temp['stamp']
# ValueError: cannot reindex from a duplicate axis
# ATTEMPT 2: SUCCESS
wind = wind.reset_index(drop=True)
temp = temp.reset_index(drop=True)
wind['timestamp'] = wind['DATE (MM/DD/YYYY)'] + ' ' + temp['stamp']
print(wind)
DATE (MM/DD/YYYY) timestamp
0 2018-01-01 2018-01-01 1
1 2018-02-01 2018-02-01 2
2 2018-03-01 2018-03-01 3