Python 如何从 Pandas DataFrame 标头中去除空格?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21606987/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:14:38  来源:igfitidea点击:

How can I strip the whitespace from Pandas DataFrame headers?

pythonpandaswhitespace

提问by Spike Williams

I am parsing data from an Excel file that has extra white space in some of the column headings.

我正在解析一个 Excel 文件中的数据,该文件在某些​​列标题中有额外的空格。

When I check the columns of the resulting dataframe, with df.columns, I see:

当我检查结果数据框的列时,使用df.columns,我看到:

Index(['Year', 'Month ', 'Value'])
                     ^
#                    Note the unwanted trailing space on 'Month '

Consequently, I can't do:

因此,我不能这样做:

df["Month"]

df["Month"]

Because it will tell me the column is not found, as I asked for "Month", not "Month ".

因为它会告诉我没有找到该列,因为我要求的是“月”,而不是“月”。

My question, then, is how can I strip out the unwanted white space from the column headings?

那么,我的问题是如何从列标题中去除不需要的空白?

采纳答案by TomAugspurger

You can give functions to the renamemethod. The str.strip()method should do what you want.

您可以为该rename方法提供功能。该str.strip()方法应该做你想做的。

In [5]: df
Out[5]: 
   Year  Month   Value
0     1       2      3

[1 rows x 3 columns]

In [6]: df.rename(columns=lambda x: x.strip())
Out[6]: 
   Year  Month  Value
0     1      2      3

[1 rows x 3 columns]

Note: that this returns a DataFrameobject and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes take place, use:

注意:这会返回一个DataFrame对象,并在屏幕上显示为输出,但实际上并未在您的列上设置更改。要进行更改,请使用:

  1. Use the inplace=Trueargument [docs]
  1. 使用inplace=True参数[docs]
df.rename(columns=lambda x: x.strip(), inplace=True)
  1. Assign it back to your dfvariable:
  1. 将其分配回您的df变量:
df = df.rename(columns=lambda x: x.strip())

回答by EdChum

You can now just call .str.stripon the columns if you're using a recent version:

.str.strip如果您使用的是最新版本,您现在可以只调用列:

In [5]:
df = pd.DataFrame(columns=['Year', 'Month ', 'Value'])
print(df.columns.tolist())
df.columns = df.columns.str.strip()
df.columns.tolist()

['Year', 'Month ', 'Value']
Out[5]:
['Year', 'Month', 'Value']

Timings

时间安排

In[26]:
df = pd.DataFrame(columns=[' year', ' month ', ' day', ' asdas ', ' asdas', 'as ', '  sa', ' asdas '])
df
Out[26]: 
Empty DataFrame
Columns: [ year,  month ,  day,  asdas ,  asdas, as ,   sa,  asdas ]


%timeit df.rename(columns=lambda x: x.strip())
%timeit df.columns.str.strip()
1000 loops, best of 3: 293 μs per loop
10000 loops, best of 3: 143 μs per loop

So str.stripis ~2X faster, I expect this to scale better for larger dfs

所以str.strip是 ~2X 快,我希望这对于更大的 dfs 可以更好地扩展

回答by Eric Duminil

If you use CSV format to export from Excel and read as Pandas DataFrame, you can specify:

如果使用 CSV 格式从 Excel 导出并读取为 Pandas DataFrame,则可以指定:

skipinitialspace=True

when calling pd.read_csv.

打电话时pd.read_csv

From the documentation:

文档

skipinitialspace : bool, default False

Skip spaces after delimiter.

skipinitialspace : bool,默认为 False

Skip spaces after delimiter.