pandas ValueError：对于 orient='columns'，DataFrame 索引必须是唯一的

Question

提问by user3675188

I merged many dataframes into bigger one,

我将许多数据帧合并成更大的数据帧，

pd.concat(dfs, axis=0)

then I can notdump it into json

然后我不能将它转储到json

(Pdb) df.to_json()
*** ValueError: DataFrame index must be unique for orient='columns'.

How could I fix it ?

我怎么能修好呢？

Answer 1

回答by davs2rt

The error indicates that your dataframe index has non-unique (repeated) values. Since it appears you're not using the index, you could create a new one with:

该错误表明您的数据帧索引具有非唯一（重复）值。由于看起来您没有使用索引，您可以创建一个新的索引：

df.reset_index(inplace=True)

or

或者

df.reset_index(drop=True, inplace=True)if you want to remove the previous index.

df.reset_index(drop=True, inplace=True)如果要删除以前的索引。

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#set-reset-index

见http://pandas.pydata.org/pandas-docs/stable/indexing.html#set-reset-index

Answer 2

回答by Jónás Balázs

Pandas provide different strategies to format data in JSON. The 'orient' parameter has 5 allowed values as described here: Pandas IO tools documentation. The 'index' and 'columns' strategy requires unique index, while the others not.

Pandas 提供了不同的策略来格式化 JSON 中的数据。'orient' 参数有 5 个允许值，如下所述：Pandas IO 工具文档。“索引”和“列”策略需要唯一索引，而其他策略则不需要。

Another solution is possible if you have a primary key you can modify the index of the DataFrame. eg.

如果您有主键，则可以使用另一种解决方案，您可以修改 DataFrame 的索引。例如。

df = df.set_index(['col1', 'col2'])

Example here: Set multi column index in pandas

此处示例：在 Pandas 中设置多列索引

Answer 3

回答by Gerard

In my case I had duplicate columns in my pandas DataFrame. I read from a SQL query that did a join on two columns, which is allowed but becomes problematic when you want to create a JSON. Drop the columns:

就我而言，我的 Pandas DataFrame 中有重复的列。我从对两列进行连接的 SQL 查询中读取数据，这是允许的，但是当您想要创建 JSON 时会出现问题。删除列：

df = df.drop(columns="duplicate_column")

Or simply rename them

或者干脆重命名它们

df.rename(index=str, columns={"duplicate_column": "duplicate_column_2"})

In my case where I used SQL it's better to change the query to not return the duplicate column you are joining on.

在我使用 SQL 的情况下，最好更改查询以不返回您加入的重复列。

Answer 4

回答by Brian

Could try dropping duplicates.

可以尝试删除重复项。

df = df.drop_duplicates()

pandas ValueError：对于 orient='columns'，DataFrame 索引必须是唯一的

提问by user3675188

回答by davs2rt

回答by Jónás Balázs

回答by Gerard

回答by Brian

相关推荐

最近更新

标签

pandas ValueError：对于 orient='columns'，DataFrame 索引必须是唯一的

提问by user3675188

回答by davs2rt

回答by Jónás Balázs

回答by Gerard

回答by Brian

相关推荐

pandas AttributeError: 'module' 对象没有属性 'hist'

pandas python 应用函数列出并返回数据框

pandas python通过列表创建一个带有一行的数据框

Pandas：ValueError - 操作数无法与形状一起广播

相关推荐

最近更新

标签