Python 从熊猫数据帧单元格中的凌乱字符串中删除换行符？

Question

提问by Calvin

I've used multiple ways of splitting and stripping the strings in my pandas dataframe to remove all the '\n'characters, but for some reason it simply doesn't want to delete the characters that are attached to other words, even though I split them. I have a pandas dataframe with a column that captures text from web pages using Beautifulsoup. The text has been cleaned a bit already by beautifulsoup, but it failed in removing the newlines attached to other characters. My strings look a bit like this:

我已经使用多种方法在我的 Pandas 数据框中拆分和剥离字符串来删除所有 '\n' 字符，但由于某种原因，它根本不想删除附加到其他单词的字符，即使我分裂他们。我有一个 Pandas 数据框，其中有一列使用 Beautifulsoup 从网页中捕获文本。Beautifulsoup 已经对文本进行了一些清理，但未能删除附加到其他字符的换行符。我的字符串看起来有点像这样：

"hands-on\ndevelopment of games. We will study a variety of software technologies\nrelevant to games including programming languages, scripting\nlanguages, operating systems, file systems, networks, simulation\nengines, and multi-media design systems. We will also study some of\nthe underlying scientific concepts from computer science and related\nfields including"

“动手\n游戏开发。我们将研究各种与游戏相关的软件技术，包括编程语言、脚本\n语言、操作系统、文件系统、网络、模拟\n引擎和多媒体设计系统。我们将还研究一些\n来自计算机科学和相关领域的潜在科学概念，包括“

Is there an easy python way to remove these "\n" characters?

有没有一种简单的python方法来删除这些“\n”字符？

Thanks in advance!

提前致谢！

Answer 1

回答by jezrael

EDIT: the right answer to this was:

编辑：对此的正确答案是：

df = df.replace(r'\n',' ', regex=True)

I think you need replace:

我认为你需要replace：

df = df.replace('\n','', regex=True)

Or:

或者：

df = df.replace('\n',' ', regex=True)

Or:

或者：

df = df.replace(r'\n',' ', regex=True)

Sample:

样本：

text = '''hands-on\ndev nologies\nrelevant scripting\nlang
'''
df = pd.DataFrame({'A':[text]})
print (df)
                                                   A
0  hands-on\ndev nologies\nrelevant scripting\nla...

df = df.replace('\n',' ', regex=True)
print (df)
                                                A
0  hands-on dev nologies relevant scripting lang

Answer 2

回答by Pawel Piela

in messy data it might to be a good idea to remove all whitespaces df.replace(r'\s', '', regex = True, inplace = True).

在凌乱的数据中，删除所有空格可能是个好主意df.replace(r'\s', '', regex = True, inplace = True)。

Answer 3

回答by Harshini Kanukuntla

   df = 'Sarah Marie Wimberly So so beautiful!!!\nAbram Staten You guys look good man.\nTJ Sloan I miss you guys\n'

   df = df.replace(r'\n',' ', regex=True)

This worked for the messy data I had.

这适用于我拥有的凌乱数据。

Python 从熊猫数据帧单元格中的凌乱字符串中删除换行符？

提问by Calvin

回答by jezrael

回答by Pawel Piela

回答by Harshini Kanukuntla

相关推荐

最近更新

标签

Python 从熊猫数据帧单元格中的凌乱字符串中删除换行符？

提问by Calvin

回答by jezrael

回答by Pawel Piela

回答by Harshini Kanukuntla

相关推荐

使用特定版本的 Python 创建 Windows Python virtualenv

Python FutureWarning：元素比较失败；返回标量，但将来会执行元素比较

Python 一次用于多列的 Pandas 数据透视表

根据条件获取 Python Pandas 中数据框的第一行

相关推荐

最近更新

标签