pandas 熊猫合并如何避免未命名的列

Question

提问by Cheng

There are two DataFrames that I want to merge:

我想合并两个数据帧：

DataFrame A columns: index, userid, locale  (2000 rows)  
DataFrame B columns: index, userid, age     (300 rows)

When I perform the following:

当我执行以下操作时：

pd.merge(A, B, on='userid', how='outer')

I got a DataFrame with the following columns:

我得到了一个包含以下列的 DataFrame：

index, Unnamed:0, userid, locale, age

索引，未命名：0，用户 ID，语言环境，年龄

The indexcolumn and the Unnamed:0column are identical. I guess the Unnamed:0column is the index column of DataFrame B.

的index列和Unnamed:0列是相同的。我猜该Unnamed:0列是 DataFrame B 的索引列。

My question is: is there a way to avoid this Unnamedcolumn when merging two DFs?

我的问题是：Unnamed合并两个 DF 时有没有办法避免此列？

I can drop the Unnamedcolumn afterwards, but just wondering if there is a better way to do it.

之后我可以删除该Unnamed列，但只是想知道是否有更好的方法来做到这一点。

Answer 1

采纳答案by Thanos

In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as indexis loaded as a regular column.

总之，您正在做的是将索引保存到文件中，当您从文件中回读时，之前保存为的列将作为index常规列加载。

There are a few ways to deal with this:

有几种方法可以解决这个问题：

Method 1

方法一

When saving a pandas.DataFrameto disk, use index=Falselike this:

将 a 保存pandas.DataFrame到磁盘时，请index=False像这样使用：

df.to_csv(path, index=False)

Method 2

方法二

When reading from file, you can define the column that is to be used as index, like this:

从文件中读取时，您可以定义要用作索引的列，如下所示：

df = pd.read_csv(path, index_col='index')

Method 3

方法三

If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this:

如果方法 #2 由于某种原因不适合您，您可以随时设置该列稍后用作索引，如下所示：

df.set_index('index', inplace=True)

After this point, your datafame should look like this:

在此之后，您的 datafame 应如下所示：

        userid    locale    age
index
    0    A1092     EN-US     31
    1    B9032     SV-SE     23

I hope this helps.

我希望这有帮助。

Answer 2

回答by MaxU

Either don't write index when saving DataFrame to CSV file (df.to_csv('...', index=False)) or if you have to deal with CSV files, which you can't change/edit, use usecolsparameter:

将 DataFrame 保存到 CSV 文件 ( df.to_csv('...', index=False))时不要写入索引，或者如果您必须处理无法更改/编辑的 CSV 文件，请使用usecols参数：

A = pd.read_csv('/path/to/fileA.csv', usecols=['userid','locale'])

in order to get rid of the Unnamed:0column ...

为了摆脱Unnamed:0柱子......

pandas 熊猫合并如何避免未命名的列

提问by Cheng

采纳答案by Thanos

回答by MaxU

相关推荐

最近更新

标签

pandas 熊猫合并如何避免未命名的列

提问by Cheng

采纳答案by Thanos

回答by MaxU

相关推荐

pandas 从现有数据帧 python 中选择特定行创建一个新的数据帧

为 Pandas Dataframe Boxplot() 设置 y 轴比例，3 个偏差？

pandas 使用 Python 刷新 Excel 外部数据

将 groupby 输出到 csv 文件 pandas

相关推荐

最近更新

标签