pandas 如何删除数据框中的回车

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37160929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:13:16  来源:igfitidea点击:

How to remove carriage return in a dataframe

pythonpandasreplacecarriage-returndata-cleaning

提问by Saranya Krishnamurthy

I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r'attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \rattached, it always creates a new row.

我有一个数据框,其中包含名为 id、country_name、location 和 total_deaths 的列。在进行数据清理过程时,我遇到了一个'\r'附加的连续值。完成清理过程后,我将生成的数据帧存储在 destination.csv 文件中。由于上面的特定行已\r附加,它总是会创建一个新行。

id                               29
location            Uttar Pradesh\r
country_name                  India
total_deaths                     20

I want to remove \r. I tried df.replace({'\r': ''}, regex=True). It isn't working for me.

我想删除\r. 我试过了df.replace({'\r': ''}, regex=True)。它对我不起作用。

Is there any other solution. Can somebody help?

有没有其他解决办法。有人可以帮忙吗?

Edit:

编辑:

In the above process, I am iterating over df to see if \ris present. If present, then need to replace. Here row.replace()or row.str.strip()doesn't seem to be working or I could be doing it in a wrong way.

在上面的过程中,我正在迭代 df 以查看是否\r存在。如果存在,则需要更换。在这里row.replace()row.str.strip()似乎没有工作,或者我可能以错误的方式做这件事。

I don't want specify the column name or row number while using replace(). Because I can't be certain that only 'location' column will be having \r. Please find the code below.

我不想在使用时指定列名或行号replace()。因为我不能确定只有“位置”列会有\r. 请在下面找到代码。

count = 0
for row_index, row in df.iterrows():
    if re.search(r"\r", str(row)):
        print type(row)               #Return type is pandas.Series
        row.replace({r'\r': ''} , regex=True)
        print row
        count += 1

回答by jezrael

Another solution is use str.strip:

另一种解决方案是使用str.strip

df['29'] = df['29'].str.strip(r'\r')
print df
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

If you want use replace, add rand one \:

如果你想使用replace,添加r一个\

print df.replace({r'\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

In replaceyou can define column for replacing like:

replace你可以更换像定义列:

print df
               id               29
0        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20

print df.replace({'29': {r'\r': ''}}, regex=True)
               id             29
0        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20

print df.replace({r'\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

EDIT by comment:

通过评论编辑:

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
0   1        India          New Delhi           354
1   2        India         Tamil Nadu            48
2   3        India          Karnataka             0
3   4        India      Andra Pradesh            32
4   5        India              Assam           679
5   6        India             Kerala           128
6   7        India             Punjab             0
7   8        India      Mumbai, Thane             1
8   9        India  Uttar Pradesh\r\n            20
9  10        India             Orissa            69

print df.replace({r'\r\n': ''}, regex=True)
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

If need replace only in column location:

如果只需要在列中替换location

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

回答by EdChum

use str.replace, you need to escape the sequence so it treats it as a carriage return rather than the literal \r:

使用str.replace,您需要对序列进行转义,以便将其视为回车而不是文字\r

In [15]:
df['29'] = df['29'].str.replace(r'\r','')
df

Out[15]:
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

回答by Gwen Au

The below code removes \n tab spaces, \n new line and \r carriage return and is great for condensing datum into one row. The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a

下面的代码删除了 \n 制表符空格、\n 换行符和 \r 回车符,非常适合将数据压缩为一行。答案取自https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a

df.replace(to_replace=[r"\t|\n|\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)

回答by user13078533

Just make df equal to the df.replace code line and then print df.

只需让 df 等于 df.replace 代码行,然后打印 df。

df=df.replace({'\r': ''}, regex=True) 
print(df)