pandas ValueError:对于 orient='columns',DataFrame 索引必须是唯一的
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29271520/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: DataFrame index must be unique for orient='columns'
提问by user3675188
I merged many dataframes into bigger one,
我将许多数据帧合并成更大的数据帧,
pd.concat(dfs, axis=0)
pd.concat(dfs, axis=0)
then I can notdump it into json
然后我不能将它转储到json
(Pdb) df.to_json()
*** ValueError: DataFrame index must be unique for orient='columns'.
How could I fix it ?
我怎么能修好呢?


回答by davs2rt
The error indicates that your dataframe index has non-unique (repeated) values. Since it appears you're not using the index, you could create a new one with:
该错误表明您的数据帧索引具有非唯一(重复)值。由于看起来您没有使用索引,您可以创建一个新的索引:
df.reset_index(inplace=True)
or
或者
df.reset_index(drop=True, inplace=True)if you want to remove the previous index.
df.reset_index(drop=True, inplace=True)如果要删除以前的索引。
See http://pandas.pydata.org/pandas-docs/stable/indexing.html#set-reset-index
见http://pandas.pydata.org/pandas-docs/stable/indexing.html#set-reset-index
回答by Jónás Balázs
Pandas provide different strategies to format data in JSON. The 'orient' parameter has 5 allowed values as described here: Pandas IO tools documentation. The 'index' and 'columns' strategy requires unique index, while the others not.
Pandas 提供了不同的策略来格式化 JSON 中的数据。'orient' 参数有 5 个允许值,如下所述:Pandas IO 工具文档。“索引”和“列”策略需要唯一索引,而其他策略则不需要。
Another solution is possible if you have a primary key you can modify the index of the DataFrame. eg.
如果您有主键,则可以使用另一种解决方案,您可以修改 DataFrame 的索引。例如。
df = df.set_index(['col1', 'col2'])
Example here: Set multi column index in pandas
此处示例:在 Pandas 中设置多列索引
回答by Gerard
In my case I had duplicate columns in my pandas DataFrame. I read from a SQL query that did a join on two columns, which is allowed but becomes problematic when you want to create a JSON. Drop the columns:
就我而言,我的 Pandas DataFrame 中有重复的列。我从对两列进行连接的 SQL 查询中读取数据,这是允许的,但是当您想要创建 JSON 时会出现问题。删除列:
df = df.drop(columns="duplicate_column")
df = df.drop(columns="duplicate_column")
Or simply rename them
或者干脆重命名它们
df.rename(index=str, columns={"duplicate_column": "duplicate_column_2"})
df.rename(index=str, columns={"duplicate_column": "duplicate_column_2"})
In my case where I used SQL it's better to change the query to not return the duplicate column you are joining on.
在我使用 SQL 的情况下,最好更改查询以不返回您加入的重复列。
回答by Brian
Could try dropping duplicates.
可以尝试删除重复项。
df = df.drop_duplicates()

