pandas 通过pandas数据框用空格替换str列的换行符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46522652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:33:34  来源:igfitidea点击:

Replacing newlines with spaces for str columns through pandas dataframe

pythonstringpandasreplacestrip

提问by alvas

Given an example dataframe with the 2nd and 3rd columns of free text, e.g.

给定一个带有自由文本第 2 和第 3 列的示例数据框,例如

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> pd.DataFrame(lol)
   0  1          2          3
0  1  2        abc   foo\nbar
1  3  1  def\nhaha  love it\n

The goal is to replace the \nto (whitespace) and strip the string in column 2 and 3 to achieve:

目标是替换\nto (空格)并去除第 2 列和第 3 列中的字符串以实现:

>>> pd.DataFrame(lol)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

How to replace newlines with spaces for specific columns through pandas dataframe?

如何通过pandas数据框用特定列的空格替换换行符?

I have tried this:

我试过这个:

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]

>>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()

>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]

>>> pd.DataFrame(lol2)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

But there must be a better/simpler way.

但必须有更好/更简单的方法。

回答by jezrael

Use replace- first first and last strip and then replace \n:

使用replace- 第一个和最后一个条带,然后替换\n

df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

回答by Wiktor Stribi?ew

You may use the following two regex replace approach:

您可以使用以下两种正则表达式替换方法:

>>> df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True)
>>> df
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
>>> 

Details

细节

  • '\A\s+|\s+\Z'-> ''will act like strip()removing all leading and trailing whitespace:
    • \A\s+- matches 1 or more whitespace symbols at the start of the string
    • |- or
    • \s+\Z- matches 1 or more whitespace symbols at the end of the string
  • '\n'-> ' 'will replace any newline with a space.
  • '\A\s+|\s+\Z'->''就像strip()删除所有前导和尾随空格:
    • \A\s+- 匹配字符串开头的 1 个或多个空格符号
    • |- 或者
    • \s+\Z- 匹配字符串末尾的 1 个或多个空白符号
  • '\n'->' '将用空格替换任何换行符。

回答by zipa

You can select_dtypesto select columns of type objectand use applymapon those columns.

您可以select_dtypes选择类型的列objectapplymap在这些列上使用。

Because there is no inplaceargument for these functions, this would be a workaround to make change to the dataframe:

因为inplace这些函数没有参数,所以这将是对数据框进行更改的一种解决方法:

strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3
#0  1  2       abc  foo bar
#1  3  1  def haha  love it

回答by Mohamed Ali JAMAOUI

Adding to the other nice answers, this is a vectorized version of your initial idea:

除了其他不错的答案之外,这是您最初想法的矢量化版本:

columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       for col in columns] 


Details:

细节:

In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  

In [50]: df
Out[50]: 
   0  1        2         3
0  1  2      abc  def haha
1  3  1  foo bar   love it