Python 如何从 Pandas DataFrame 标头中去除空格？

Question

提问by Spike Williams

I am parsing data from an Excel file that has extra white space in some of the column headings.

我正在解析一个 Excel 文件中的数据，该文件在某些列标题中有额外的空格。

When I check the columns of the resulting dataframe, with df.columns, I see:

当我检查结果数据框的列时，使用df.columns，我看到：

Index(['Year', 'Month ', 'Value'])
                     ^
#                    Note the unwanted trailing space on 'Month '

Consequently, I can't do:

因此，我不能这样做：

df["Month"]

Because it will tell me the column is not found, as I asked for "Month", not "Month ".

因为它会告诉我没有找到该列，因为我要求的是“月”，而不是“月”。

My question, then, is how can I strip out the unwanted white space from the column headings?

那么，我的问题是如何从列标题中去除不需要的空白？

Answer 1

采纳答案by TomAugspurger

You can give functions to the renamemethod. The str.strip()method should do what you want.

您可以为该rename方法提供功能。该str.strip()方法应该做你想做的。

In [5]: df
Out[5]: 
   Year  Month   Value
0     1       2      3

[1 rows x 3 columns]

In [6]: df.rename(columns=lambda x: x.strip())
Out[6]: 
   Year  Month  Value
0     1      2      3

[1 rows x 3 columns]

Note: that this returns a DataFrameobject and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes take place, use:

注意：这会返回一个DataFrame对象，并在屏幕上显示为输出，但实际上并未在您的列上设置更改。要进行更改，请使用：

Use the inplace=Trueargument [docs]

使用inplace=True参数[docs]

df.rename(columns=lambda x: x.strip(), inplace=True)

Assign it back to your dfvariable:

将其分配回您的df变量：

df = df.rename(columns=lambda x: x.strip())

Answer 2

回答by EdChum

You can now just call .str.stripon the columns if you're using a recent version:

.str.strip如果您使用的是最新版本，您现在可以只调用列：

In [5]:
df = pd.DataFrame(columns=['Year', 'Month ', 'Value'])
print(df.columns.tolist())
df.columns = df.columns.str.strip()
df.columns.tolist()

['Year', 'Month ', 'Value']
Out[5]:
['Year', 'Month', 'Value']

Timings

时间安排

In[26]:
df = pd.DataFrame(columns=[' year', ' month ', ' day', ' asdas ', ' asdas', 'as ', '  sa', ' asdas '])
df
Out[26]: 
Empty DataFrame
Columns: [ year,  month ,  day,  asdas ,  asdas, as ,   sa,  asdas ]


%timeit df.rename(columns=lambda x: x.strip())
%timeit df.columns.str.strip()
1000 loops, best of 3: 293 μs per loop
10000 loops, best of 3: 143 μs per loop

So str.stripis ~2X faster, I expect this to scale better for larger dfs

所以str.strip是 ~2X 快，我希望这对于更大的 dfs 可以更好地扩展

Answer 3

回答by Eric Duminil

If you use CSV format to export from Excel and read as Pandas DataFrame, you can specify:

如果使用 CSV 格式从 Excel 导出并读取为 Pandas DataFrame，则可以指定：

skipinitialspace=True

when calling pd.read_csv.

打电话时pd.read_csv。

From the documentation:

从文档：

skipinitialspace : bool, default False
Skip spaces after delimiter.

skipinitialspace : bool，默认为 False
Skip spaces after delimiter.

Python 如何从 Pandas DataFrame 标头中去除空格？

提问by Spike Williams

采纳答案by TomAugspurger

回答by EdChum

回答by Eric Duminil

相关推荐

最近更新

标签

Python 如何从 Pandas DataFrame 标头中去除空格？

提问by Spike Williams

采纳答案by TomAugspurger

回答by EdChum

回答by Eric Duminil

相关推荐

如何在 sublime text 3 上运行 python 代码？

为什么 Python 3 中的`input` 抛出 NameError: name... is not defined

Python 类型错误：强制转换为 Unicode：需要字符串或缓冲区，找到列表

Python 类型错误：“模块”对象不可调用（导入 selenium 时）

相关推荐

最近更新

标签