pandas ValueError:对于 orient='columns',DataFrame 索引必须是唯一的

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29271520/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:06:17  来源:igfitidea点击:

ValueError: DataFrame index must be unique for orient='columns'

pythonpandas

提问by user3675188

I merged many dataframes into bigger one,

我将许多数据帧合并成更大的数据帧,

pd.concat(dfs, axis=0)

pd.concat(dfs, axis=0)

then I can notdump it into json

然后我不能将它转储到json

(Pdb) df.to_json()
*** ValueError: DataFrame index must be unique for orient='columns'.

How could I fix it ?

我怎么能修好呢?

回答by davs2rt

The error indicates that your dataframe index has non-unique (repeated) values. Since it appears you're not using the index, you could create a new one with:

该错误表明您的数据帧索引具有非唯一(重复)值。由于看起来您没有使用索引,您可以创建一个新的索引:

df.reset_index(inplace=True)

or

或者

df.reset_index(drop=True, inplace=True)if you want to remove the previous index.

df.reset_index(drop=True, inplace=True)如果要删除以前的索引。

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#set-reset-index

http://pandas.pydata.org/pandas-docs/stable/indexing.html#set-reset-index

回答by Jónás Balázs

Pandas provide different strategies to format data in JSON. The 'orient' parameter has 5 allowed values as described here: Pandas IO tools documentation. The 'index' and 'columns' strategy requires unique index, while the others not.

Pandas 提供了不同的策略来格式化 JSON 中的数据。'orient' 参数有 5 个允许值,如下所述:Pandas IO 工具文档。“索引”和“列”策略需要唯一索引,而其他策略则不需要。

Another solution is possible if you have a primary key you can modify the index of the DataFrame. eg.

如果您有主键,则可以使用另一种解决方案,您可以修改 DataFrame 的索引。例如。

df = df.set_index(['col1', 'col2'])

Example here: Set multi column index in pandas

此处示例:在 Pandas 中设置多列索引

回答by Gerard

In my case I had duplicate columns in my pandas DataFrame. I read from a SQL query that did a join on two columns, which is allowed but becomes problematic when you want to create a JSON. Drop the columns:

就我而言,我的 Pandas DataFrame 中有重复的列。我从对两列进行连接的 SQL 查询中读取数据,这是允许的,但是当您想要创建 JSON 时会出现问题。删除列:

df = df.drop(columns="duplicate_column")

df = df.drop(columns="duplicate_column")

Or simply rename them

或者干脆重命名它们

df.rename(index=str, columns={"duplicate_column": "duplicate_column_2"})

df.rename(index=str, columns={"duplicate_column": "duplicate_column_2"})

In my case where I used SQL it's better to change the query to not return the duplicate column you are joining on.

在我使用 SQL 的情况下,最好更改查询以不返回您加入的重复列。

回答by Brian

Could try dropping duplicates.

可以尝试删除重复项。

df = df.drop_duplicates()