pandas 熊猫合并如何避免未命名的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41087619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:36:04  来源:igfitidea点击:

Pandas merge how to avoid unnamed column

pythonpandas

提问by Cheng

There are two DataFrames that I want to merge:

我想合并两个数据帧:

DataFrame A columns: index, userid, locale  (2000 rows)  
DataFrame B columns: index, userid, age     (300 rows)

When I perform the following:

当我执行以下操作时:

pd.merge(A, B, on='userid', how='outer')

I got a DataFrame with the following columns:

我得到了一个包含以下列的 DataFrame:

index, Unnamed:0, userid, locale, age

索引,未命名:0,用户 ID,语言环境,年龄

The indexcolumn and the Unnamed:0column are identical. I guess the Unnamed:0column is the index column of DataFrame B.

index列和Unnamed:0列是相同的。我猜该Unnamed:0列是 DataFrame B 的索引列。

My question is: is there a way to avoid this Unnamedcolumn when merging two DFs?

我的问题是:Unnamed合并两个 DF 时有没有办法避免此列?

I can drop the Unnamedcolumn afterwards, but just wondering if there is a better way to do it.

之后我可以删除该Unnamed列,但只是想知道是否有更好的方法来做到这一点。

采纳答案by Thanos

In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as indexis loaded as a regular column.

总之,您正在做的是将索引保存到文件中,当您从文件中回读时,之前保存为的列将作为index常规列加载。

There are a few ways to deal with this:

有几种方法可以解决这个问题:

Method 1

方法一

When saving a pandas.DataFrameto disk, use index=Falselike this:

将 a 保存pandas.DataFrame到磁盘时,请index=False像这样使用:

df.to_csv(path, index=False)

Method 2

方法二

When reading from file, you can define the column that is to be used as index, like this:

从文件中读取时,您可以定义要用作索引的列,如下所示:

df = pd.read_csv(path, index_col='index')

Method 3

方法三

If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this:

如果方法 #2 由于某种原因不适合您,您可以随时设置该列稍后用作索引,如下所示:

df.set_index('index', inplace=True)

After this point, your datafame should look like this:

在此之后,您的 datafame 应如下所示:

        userid    locale    age
index
    0    A1092     EN-US     31
    1    B9032     SV-SE     23

I hope this helps.

我希望这有帮助。

回答by MaxU

Either don't write index when saving DataFrame to CSV file (df.to_csv('...', index=False)) or if you have to deal with CSV files, which you can't change/edit, use usecolsparameter:

将 DataFrame 保存到 CSV 文件 ( df.to_csv('...', index=False))时不要写入索引,或者如果您必须处理无法更改/编辑的 CSV 文件,请使用usecols参数:

A = pd.read_csv('/path/to/fileA.csv', usecols=['userid','locale'])

in order to get rid of the Unnamed:0column ...

为了摆脱Unnamed:0柱子......