Python 如何摆脱熊猫数据框中的“未命名：0”列？

Question

提问by Michael Perdue

I have a situation wherein sometimes when I read a csvfrom dfI get an unwanted index-like column named unnamed:0.

我有一种情况，有时当我读取 a 时csv，df我会得到一个不需要的类似索引的列，名为unnamed:0.

file.csv

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

The CSV is read with this:

CSV 是这样读取的：

pd.read_csv('file.csv')

   Unnamed: 0  A  B  C
0           0  1  2  3
1           1  4  5  6
2           2  7  8  9

This is very annoying! Does anyone have an idea on how to get rid of this?

这很烦人！有没有人知道如何摆脱这个？

Answer 1

回答by EdChum

It's the index column, pass index=Falseto not write it out, see the docs

这是索引列，传递index=False给不写出来，看文档

Example:

例子：

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

compare with:

与之比较：

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

You could also optionally tell read_csvthat the first column is the index column by passing index_col=0:

您还可以选择read_csv通过传递来判断第一列是索引列index_col=0：

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

Answer 2

回答by cs95

This issue most likely manifests because your CSV was saved along with its RangeIndex(which usually doesn't have a name). The fix would actually need to be done when saving the DataFrame, but this isn't always an option.

此问题很可能是因为您的 CSV 与其一起保存RangeIndex（通常没有名称）。保存 DataFrame 时实际上需要完成修复，但这并不总是一个选项。

Avoiding the Problem: `read_csv`with `index_col`argument

避免问题：`read_csv`有`index_col`论点

IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0]argument to pd.read_csv, this reads in the first column as the index.

IMO，最简单的解决方案是将未命名的列读取为index。为指定一个index_col=[0]参数pd.read_csv，这会读取第一列作为索引。

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')

pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Note
You could have avoided this in the first place by using index=Falsewhen creating the output CSV, if your DataFrame does not have an index to begin with.
df.to_csv('file.csv', index=False)
But as mentioned above, this isn't always an option.

注意如果您的 DataFrame 没有开始的索引，
您可以首先通过index=False在创建输出 CSV 时使用来避免这种情况。
df.to_csv('file.csv', index=False)
但如上所述，这并不总是一种选择。

Stopgap Solution: Filtering with `str.match`

权宜之计：过滤 `str.match`

If you cannot modify the code to read/write the CSV file, you can just remove the column by filteringwith str.match:

如果您无法修改代码以读取/写入 CSV 文件，则可以通过使用str.match以下内容过滤来删除该列：

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Answer 3

回答by Brendan

Another case that this might be happening is if your data was improperly written to your csvto have each row end with a comma. This will leave you with an unnamed column Unnamed: xat the end of your data when you try to read it into a df.

可能发生这种情况的另一种情况是，如果您的数据未正确写入您的csv每行以逗号结尾。这会留下一个未命名列Unnamed: x在您的数据的结尾，当你尝试将其读入df。

Answer 4

回答by Sarah

To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

要使用所有未命名的列，您还可以使用正则表达式，例如 df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

Answer 5

回答by ssareen

Simply delete that column using: del df['column_name']

只需使用以下命令删除该列： del df['column_name']

Python 如何摆脱熊猫数据框中的“未命名：0”列？

提问by Michael Perdue

回答by EdChum

回答by cs95

Avoiding the Problem: `read_csv`with `index_col`argument

避免问题：`read_csv`有`index_col`论点

Stopgap Solution: Filtering with `str.match`

权宜之计：过滤 `str.match`

回答by Brendan

回答by Sarah

回答by ssareen

相关推荐

最近更新

标签

Python 如何摆脱熊猫数据框中的“未命名：0”列？

提问by Michael Perdue

回答by EdChum

回答by cs95

Avoiding the Problem: read_csvwith index_colargument

避免问题：read_csv有index_col论点

Stopgap Solution: Filtering with str.match

权宜之计：过滤 str.match

回答by Brendan

回答by Sarah

回答by ssareen

相关推荐

Python 如何在后台运行 Flask Server

Python Matplotlib：使用与先前轴相同的参数添加轴

Python 更改 PIL 中的像素颜色值

如何在selenium中使用chrome webdriver下载python中的文件？

相关推荐

最近更新

标签

Avoiding the Problem: `read_csv`with `index_col`argument

避免问题：`read_csv`有`index_col`论点

Stopgap Solution: Filtering with `str.match`

权宜之计：过滤 `str.match`