Python 如何从 Pandas DataFrame 标头中去除空格?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21606987/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I strip the whitespace from Pandas DataFrame headers?
提问by Spike Williams
I am parsing data from an Excel file that has extra white space in some of the column headings.
我正在解析一个 Excel 文件中的数据,该文件在某些列标题中有额外的空格。
When I check the columns of the resulting dataframe, with df.columns, I see:
当我检查结果数据框的列时,使用df.columns,我看到:
Index(['Year', 'Month ', 'Value'])
^
# Note the unwanted trailing space on 'Month '
Consequently, I can't do:
因此,我不能这样做:
df["Month"]
df["Month"]
Because it will tell me the column is not found, as I asked for "Month", not "Month ".
因为它会告诉我没有找到该列,因为我要求的是“月”,而不是“月”。
My question, then, is how can I strip out the unwanted white space from the column headings?
那么,我的问题是如何从列标题中去除不需要的空白?
采纳答案by TomAugspurger
You can give functions to the renamemethod. The str.strip()method should do what you want.
您可以为该rename方法提供功能。该str.strip()方法应该做你想做的。
In [5]: df
Out[5]:
Year Month Value
0 1 2 3
[1 rows x 3 columns]
In [6]: df.rename(columns=lambda x: x.strip())
Out[6]:
Year Month Value
0 1 2 3
[1 rows x 3 columns]
Note: that this returns a DataFrameobject and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes take place, use:
注意:这会返回一个DataFrame对象,并在屏幕上显示为输出,但实际上并未在您的列上设置更改。要进行更改,请使用:
- Use the
inplace=Trueargument [docs]
- 使用
inplace=True参数[docs]
df.rename(columns=lambda x: x.strip(), inplace=True)
- Assign it back to your
dfvariable:
- 将其分配回您的
df变量:
df = df.rename(columns=lambda x: x.strip())
回答by EdChum
You can now just call .str.stripon the columns if you're using a recent version:
.str.strip如果您使用的是最新版本,您现在可以只调用列:
In [5]:
df = pd.DataFrame(columns=['Year', 'Month ', 'Value'])
print(df.columns.tolist())
df.columns = df.columns.str.strip()
df.columns.tolist()
['Year', 'Month ', 'Value']
Out[5]:
['Year', 'Month', 'Value']
Timings
时间安排
In[26]:
df = pd.DataFrame(columns=[' year', ' month ', ' day', ' asdas ', ' asdas', 'as ', ' sa', ' asdas '])
df
Out[26]:
Empty DataFrame
Columns: [ year, month , day, asdas , asdas, as , sa, asdas ]
%timeit df.rename(columns=lambda x: x.strip())
%timeit df.columns.str.strip()
1000 loops, best of 3: 293 μs per loop
10000 loops, best of 3: 143 μs per loop
So str.stripis ~2X faster, I expect this to scale better for larger dfs
所以str.strip是 ~2X 快,我希望这对于更大的 dfs 可以更好地扩展
回答by Eric Duminil
If you use CSV format to export from Excel and read as Pandas DataFrame, you can specify:
如果使用 CSV 格式从 Excel 导出并读取为 Pandas DataFrame,则可以指定:
skipinitialspace=True
when calling pd.read_csv.
打电话时pd.read_csv。
From the documentation:
从文档:
skipinitialspace : bool, default False
Skip spaces after delimiter.
skipinitialspace : bool,默认为 False
Skip spaces after delimiter.

