pandas 熊猫修剪数据帧中的前导和尾随空白

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49551336/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:23:06  来源:igfitidea点击:

Pandas trim leading & trailing white space in a dataframe

pythonpandas

提问by S.Gu

develop a function that Trims leading & trailing white space.

开发一个修剪前导和尾随空白的功能。

here is a simple sample, but real file contains far more complex rows and columns.

这是一个简单的示例,但实际文件包含更复杂的行和列。

df=pd.DataFrame([["A b ",2,3],[np.nan,2,3],\
[" random",43,4],[" any txt is possible "," 2 1",22],\
["",23,99],[" help ",23,np.nan]],columns=['A','B','C'])

the result should eliminate all leading & trailing white space, but retain the space inbetween the text.

结果应该消除所有前导和尾随空格,但保留文本之间的空间。

df=pd.DataFrame([["A b",2,3],[np.nan,2,3],\
["random",43,4],["any txt is possible","2 1",22],\
["",23,99],["help",23,np.nan]],columns=['A','B','C'])

Mind that the function needs to cover all possible situations. thank you

请注意,该功能需要涵盖所有可能的情况。谢谢你

回答by jezrael

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

我认为需要检查值是否为字符串,因为列中的混合值 - 带有字符串的数字以及每个字符串调用strip

df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
                     A    B     C
0                  A b    2   3.0
1                  NaN    2   3.0
2               random   43   4.0
3  any txt is possible  2 1  22.0
4                        23  99.0
5                 help   23   NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

如果列具有相同的 dtypes,则不会NaN像您的示例中那样为列中的数值获取s B

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
                     A    B     C
0                  A b  NaN   3.0
1                  NaN  NaN   3.0
2               random  NaN   4.0
3  any txt is possible  2 1  22.0
4                       NaN  99.0
5                 help  NaN   NaN