pandas 通过pandas数据框用空格替换str列的换行符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46522652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replacing newlines with spaces for str columns through pandas dataframe
提问by alvas
Given an example dataframe with the 2nd and 3rd columns of free text, e.g.
给定一个带有自由文本第 2 和第 3 列的示例数据框,例如
>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> pd.DataFrame(lol)
0 1 2 3
0 1 2 abc foo\nbar
1 3 1 def\nhaha love it\n
The goal is to replace the \n
to (whitespace) and strip the string in column 2 and 3 to achieve:
目标是替换\n
to (空格)并去除第 2 列和第 3 列中的字符串以实现:
>>> pd.DataFrame(lol)
0 1 2 3
0 1 2 abc foo bar
1 3 1 def haha love it
How to replace newlines with spaces for specific columns through pandas dataframe?
如何通过pandas数据框用特定列的空格替换换行符?
I have tried this:
我试过这个:
>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()
>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]
>>> pd.DataFrame(lol2)
0 1 2 3
0 1 2 abc foo bar
1 3 1 def haha love it
But there must be a better/simpler way.
但必须有更好/更简单的方法。
回答by jezrael
回答by Wiktor Stribi?ew
You may use the following two regex replace approach:
您可以使用以下两种正则表达式替换方法:
>>> df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True)
>>> df
0 1 2 3
0 1 2 abc foo bar
1 3 1 def haha love it
>>>
Details
细节
'\A\s+|\s+\Z'
->''
will act likestrip()
removing all leading and trailing whitespace:\A\s+
- matches 1 or more whitespace symbols at the start of the string|
- or\s+\Z
- matches 1 or more whitespace symbols at the end of the string
'\n'
->' '
will replace any newline with a space.
'\A\s+|\s+\Z'
->''
就像strip()
删除所有前导和尾随空格:\A\s+
- 匹配字符串开头的 1 个或多个空格符号|
- 或者\s+\Z
- 匹配字符串末尾的 1 个或多个空白符号
'\n'
->' '
将用空格替换任何换行符。
回答by zipa
You can select_dtypes
to select columns of type object
and use applymap
on those columns.
您可以select_dtypes
选择类型的列object
并applymap
在这些列上使用。
Because there is no inplace
argument for these functions, this would be a workaround to make change to the dataframe:
因为inplace
这些函数没有参数,所以这将是对数据框进行更改的一种解决方法:
strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
# 0 1 2 3
#0 1 2 abc foo bar
#1 3 1 def haha love it
回答by Mohamed Ali JAMAOUI
Adding to the other nice answers, this is a vectorized version of your initial idea:
除了其他不错的答案之外,这是您最初想法的矢量化版本:
columns = [2,3]
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ')
for col in columns]
Details:
细节:
In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ')
for col in columns]
In [50]: df
Out[50]:
0 1 2 3
0 1 2 abc def haha
1 3 1 foo bar love it