pandas 通过pandas数据框用空格替换str列的换行符

Question

提问by alvas

Given an example dataframe with the 2nd and 3rd columns of free text, e.g.

给定一个带有自由文本第 2 和第 3 列的示例数据框，例如

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> pd.DataFrame(lol)
   0  1          2          3
0  1  2        abc   foo\nbar
1  3  1  def\nhaha  love it\n

The goal is to replace the \nto (whitespace) and strip the string in column 2 and 3 to achieve:

目标是替换\nto （空格）并去除第 2 列和第 3 列中的字符串以实现：

>>> pd.DataFrame(lol)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

How to replace newlines with spaces for specific columns through pandas dataframe?

如何通过pandas数据框用特定列的空格替换换行符？

I have tried this:

我试过这个：

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]

>>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()

>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]

>>> pd.DataFrame(lol2)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

But there must be a better/simpler way.

但必须有更好/更简单的方法。

Answer 1

回答by jezrael

Use replace- first first and last strip and then replace \n:

使用replace- 第一个和最后一个条带，然后替换\n：

df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

Answer 2

回答by Wiktor Stribi?ew

You may use the following two regex replace approach:

您可以使用以下两种正则表达式替换方法：

>>> df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True)
>>> df
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
>>>

Details

细节

'\A\s+|\s+\Z'-> ''will act like strip()removing all leading and trailing whitespace:
- \A\s+- matches 1 or more whitespace symbols at the start of the string
- |- or
- \s+\Z- matches 1 or more whitespace symbols at the end of the string
'\n'-> ' 'will replace any newline with a space.

'\A\s+|\s+\Z'->''就像strip()删除所有前导和尾随空格：
- \A\s+- 匹配字符串开头的 1 个或多个空格符号
- |- 或者
- \s+\Z- 匹配字符串末尾的 1 个或多个空白符号
'\n'->' '将用空格替换任何换行符。

Answer 3

回答by zipa

You can select_dtypesto select columns of type objectand use applymapon those columns.

您可以select_dtypes选择类型的列object并applymap在这些列上使用。

Because there is no inplaceargument for these functions, this would be a workaround to make change to the dataframe:

因为inplace这些函数没有参数，所以这将是对数据框进行更改的一种解决方法：

strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3
#0  1  2       abc  foo bar
#1  3  1  def haha  love it

Answer 4

回答by Mohamed Ali JAMAOUI

Adding to the other nice answers, this is a vectorized version of your initial idea:

除了其他不错的答案之外，这是您最初想法的矢量化版本：

columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       for col in columns]

Details:

细节：

In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  

In [50]: df
Out[50]: 
   0  1        2         3
0  1  2      abc  def haha
1  3  1  foo bar   love it

pandas 通过pandas数据框用空格替换str列的换行符

提问by alvas

回答by jezrael

回答by Wiktor Stribi?ew

回答by zipa

回答by Mohamed Ali JAMAOUI

相关推荐

最近更新

标签

pandas 通过pandas数据框用空格替换str列的换行符

提问by alvas

回答by jezrael

回答by Wiktor Stribi?ew

回答by zipa

回答by Mohamed Ali JAMAOUI

相关推荐

使用正则表达式在 Pandas 数据框中创建新列

pandas 读取 csv 文件的一部分

python:pandas - 如何将熊猫数据帧的前两行组合到数据帧标题？

pandas 如何从熊猫数据帧创建一个词袋

相关推荐

最近更新

标签