Python 如何摆脱熊猫数据框中的“未命名:0”列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36519086/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get rid of "Unnamed: 0" column in a pandas DataFrame?
提问by Michael Perdue
I have a situation wherein sometimes when I read a csv
from df
I get an unwanted index-like column named unnamed:0
.
我有一种情况,有时当我读取 a 时csv
,df
我会得到一个不需要的类似索引的列,名为unnamed:0
.
file.csv
file.csv
,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9
The CSV is read with this:
CSV 是这样读取的:
pd.read_csv('file.csv')
Unnamed: 0 A B C
0 0 1 2 3
1 1 4 5 6
2 2 7 8 9
This is very annoying! Does anyone have an idea on how to get rid of this?
这很烦人!有没有人知道如何摆脱这个?
回答by EdChum
It's the index column, pass index=False
to not write it out, see the docs
这是索引列,传递index=False
给不写出来,看文档
Example:
例子:
In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))
Out[37]:
Unnamed: 0 a b c
0 0 0.109066 -1.112704 -0.545209
1 1 0.447114 1.525341 0.317252
2 2 0.507495 0.137863 0.886283
3 3 1.452867 1.888363 1.168101
4 4 0.901371 -0.704805 0.088335
compare with:
与之比较:
In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))
Out[38]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335
You could also optionally tell read_csv
that the first column is the index column by passing index_col=0
:
您还可以选择read_csv
通过传递来判断第一列是索引列index_col=0
:
In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)
Out[40]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335
回答by cs95
This issue most likely manifests because your CSV was saved along with its RangeIndex
(which usually doesn't have a name). The fix would actually need to be done when saving the DataFrame, but this isn't always an option.
此问题很可能是因为您的 CSV 与其一起保存RangeIndex
(通常没有名称)。保存 DataFrame 时实际上需要完成修复,但这并不总是一个选项。
Avoiding the Problem: read_csv
with index_col
argument
避免问题:read_csv
有index_col
论点
IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0]
argument to pd.read_csv
, this reads in the first column as the index.
IMO,最简单的解决方案是将未命名的列读取为index。为 指定一个index_col=[0]
参数pd.read_csv
,这会读取第一列作为索引。
df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
# Save DataFrame to CSV.
df.to_csv('file.csv')
pd.read_csv('file.csv')
Unnamed: 0 a b c
0 0 x x x
1 1 x x x
2 2 x x x
3 3 x x x
4 4 x x x
# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
Note
You could have avoided this in the first place by usingindex=False
when creating the output CSV, if your DataFrame does not have an index to begin with.df.to_csv('file.csv', index=False)
But as mentioned above, this isn't always an option.
注意如果您的 DataFrame 没有开始的索引,
您可以首先通过index=False
在创建输出 CSV 时使用来避免这种情况。df.to_csv('file.csv', index=False)
但如上所述,这并不总是一种选择。
Stopgap Solution: Filtering with str.match
权宜之计:过滤 str.match
If you cannot modify the code to read/write the CSV file, you can just remove the column by filteringwith str.match
:
如果您无法修改代码以读取/写入 CSV 文件,则可以通过使用str.match
以下内容过滤来删除该列:
df
Unnamed: 0 a b c
0 0 x x x
1 1 x x x
2 2 x x x
3 3 x x x
4 4 x x x
df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')
df.columns.str.match('Unnamed')
# array([ True, False, False, False])
df.loc[:, ~df.columns.str.match('Unnamed')]
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
回答by Brendan
Another case that this might be happening is if your data was improperly written to your csv
to have each row end with a comma. This will leave you with an unnamed column Unnamed: x
at the end of your data when you try to read it into a df
.
可能发生这种情况的另一种情况是,如果您的数据未正确写入您的csv
每行以逗号结尾。这会留下一个未命名列Unnamed: x
在您的数据的结尾,当你尝试将其读入df
。
回答by Sarah
To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)
要使用所有未命名的列,您还可以使用正则表达式,例如 df.drop(df.filter(regex="Unname"),axis=1, inplace=True)
回答by ssareen
Simply delete that column using: del df['column_name']
只需使用以下命令删除该列: del df['column_name']