Python 如何摆脱熊猫数据框中的“未命名:0”列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36519086/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:59:06  来源:igfitidea点击:

How to get rid of "Unnamed: 0" column in a pandas DataFrame?

pythonpandascsvdataframe

提问by Michael Perdue

I have a situation wherein sometimes when I read a csvfrom dfI get an unwanted index-like column named unnamed:0.

我有一种情况,有时当我读取 a 时csvdf我会得到一个不需要的类似索引的列,名为unnamed:0.

file.csv

file.csv

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

The CSV is read with this:

CSV 是这样读取的:

pd.read_csv('file.csv')

   Unnamed: 0  A  B  C
0           0  1  2  3
1           1  4  5  6
2           2  7  8  9

This is very annoying! Does anyone have an idea on how to get rid of this?

这很烦人!有没有人知道如何摆脱这个?

回答by EdChum

It's the index column, pass index=Falseto not write it out, see the docs

这是索引列,传递index=False给不写出来,看文档

Example:

例子:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

compare with:

与之比较:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

You could also optionally tell read_csvthat the first column is the index column by passing index_col=0:

您还可以选择read_csv通过传递来判断第一列是索引列index_col=0

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

回答by cs95

This issue most likely manifests because your CSV was saved along with its RangeIndex(which usually doesn't have a name). The fix would actually need to be done when saving the DataFrame, but this isn't always an option.

此问题很可能是因为您的 CSV 与其一起保存RangeIndex(通常没有名称)。保存 DataFrame 时实际上需要完成修复,但这并不总是一个选项。

Avoiding the Problem: read_csvwith index_colargument

避免问题:read_csvindex_col论点

IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0]argument to pd.read_csv, this reads in the first column as the index.

IMO,最简单的解决方案是将未命名的列读取为index。为 指定一个index_col=[0]参数pd.read_csv,这会读取第一列作为索引。

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')

pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Note
You could have avoided this in the first place by using index=Falsewhen creating the output CSV, if your DataFrame does not have an index to begin with.

df.to_csv('file.csv', index=False)

But as mentioned above, this isn't always an option.

注意如果您的 DataFrame 没有开始的索引,
您可以首先通过index=False在创建输出 CSV 时使用来避免这种情况。

df.to_csv('file.csv', index=False)

但如上所述,这并不总是一种选择。



Stopgap Solution: Filtering with str.match

权宜之计:过滤 str.match

If you cannot modify the code to read/write the CSV file, you can just remove the column by filteringwith str.match:

如果您无法修改代码以读取/写入 CSV 文件,则可以通过使用str.match以下内容过滤删除该列

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

回答by Brendan

Another case that this might be happening is if your data was improperly written to your csvto have each row end with a comma. This will leave you with an unnamed column Unnamed: xat the end of your data when you try to read it into a df.

可能发生这种情况的另一种情况是,如果您的数据未正确写入您的csv每行以逗号结尾。这会留下一个未命名列Unnamed: x在您的数据的结尾,当你尝试将其读入df

回答by Sarah

To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

要使用所有未命名的列,您还可以使用正则表达式,例如 df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

回答by ssareen

Simply delete that column using: del df['column_name']

只需使用以下命令删除该列: del df['column_name']