Python 从熊猫数据帧单元格中的凌乱字符串中删除换行符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44227748/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:51:21  来源:igfitidea点击:

removing newlines from messy strings in pandas dataframe cells?

pythonstringpandassplit

提问by Calvin

I've used multiple ways of splitting and stripping the strings in my pandas dataframe to remove all the '\n'characters, but for some reason it simply doesn't want to delete the characters that are attached to other words, even though I split them. I have a pandas dataframe with a column that captures text from web pages using Beautifulsoup. The text has been cleaned a bit already by beautifulsoup, but it failed in removing the newlines attached to other characters. My strings look a bit like this:

我已经使用多种方法在我的 Pandas 数据框中拆分和剥离字符串来删除所有 '\n' 字符,但由于某种原因,它根本不想删除附加到其他单词的字符,即使我分裂他们。我有一个 Pandas 数据框,其中有一列使用 Beautifulsoup 从网页中捕获文本。Beautifulsoup 已经对文本进行了一些清理,但未能删除附加到其他字符的换行符。我的字符串看起来有点像这样:

"hands-on\ndevelopment of games. We will study a variety of software technologies\nrelevant to games including programming languages, scripting\nlanguages, operating systems, file systems, networks, simulation\nengines, and multi-media design systems. We will also study some of\nthe underlying scientific concepts from computer science and related\nfields including"

“动手\n游戏开发。我们将研究各种与游戏相关的软件技术,包括编程语言、脚本\n语言、操作系统、文件系统、网络、模拟\n引擎和多媒体设计系统。我们将还研究一些\n来自计算机科学和相关领域的潜在科学概念,包括“

Is there an easy python way to remove these "\n" characters?

有没有一种简单的python方法来删除这些“\n”字符?

Thanks in advance!

提前致谢!

回答by jezrael

EDIT: the right answer to this was:

编辑:对此的正确答案是:

df = df.replace(r'\n',' ', regex=True) 

I think you need replace:

我认为你需要replace

df = df.replace('\n','', regex=True)

Or:

或者:

df = df.replace('\n',' ', regex=True)

Or:

或者:

df = df.replace(r'\n',' ', regex=True)

Sample:

样本:

text = '''hands-on\ndev nologies\nrelevant scripting\nlang
'''
df = pd.DataFrame({'A':[text]})
print (df)
                                                   A
0  hands-on\ndev nologies\nrelevant scripting\nla...

df = df.replace('\n',' ', regex=True)
print (df)
                                                A
0  hands-on dev nologies relevant scripting lang 

回答by Pawel Piela

in messy data it might to be a good idea to remove all whitespaces df.replace(r'\s', '', regex = True, inplace = True).

在凌乱的数据中,删除所有空格可能是个好主意df.replace(r'\s', '', regex = True, inplace = True)

回答by Harshini Kanukuntla

   df = 'Sarah Marie Wimberly So so beautiful!!!\nAbram Staten You guys look good man.\nTJ Sloan I miss you guys\n'

   df = df.replace(r'\n',' ', regex=True)

This worked for the messy data I had.

这适用于我拥有的凌乱数据。