pandas 如何删除数据框中的回车
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37160929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove carriage return in a dataframe
提问by Saranya Krishnamurthy
I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r'
attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \r
attached, it always creates a new row.
我有一个数据框,其中包含名为 id、country_name、location 和 total_deaths 的列。在进行数据清理过程时,我遇到了一个'\r'
附加的连续值。完成清理过程后,我将生成的数据帧存储在 destination.csv 文件中。由于上面的特定行已\r
附加,它总是会创建一个新行。
id 29
location Uttar Pradesh\r
country_name India
total_deaths 20
I want to remove \r
. I tried df.replace({'\r': ''}, regex=True)
. It isn't working for me.
我想删除\r
. 我试过了df.replace({'\r': ''}, regex=True)
。它对我不起作用。
Is there any other solution. Can somebody help?
有没有其他解决办法。有人可以帮忙吗?
Edit:
编辑:
In the above process, I am iterating over df to see if \r
is present. If present, then need to replace. Here row.replace()
or row.str.strip()
doesn't seem to be working or I could be doing it in a wrong way.
在上面的过程中,我正在迭代 df 以查看是否\r
存在。如果存在,则需要更换。在这里row.replace()
或row.str.strip()
似乎没有工作,或者我可能以错误的方式做这件事。
I don't want specify the column name or row number while using replace()
. Because I can't be certain that only 'location' column will be having \r
. Please find the code below.
我不想在使用时指定列名或行号replace()
。因为我不能确定只有“位置”列会有\r
. 请在下面找到代码。
count = 0
for row_index, row in df.iterrows():
if re.search(r"\r", str(row)):
print type(row) #Return type is pandas.Series
row.replace({r'\r': ''} , regex=True)
print row
count += 1
回答by jezrael
Another solution is use str.strip
:
另一种解决方案是使用str.strip
:
df['29'] = df['29'].str.strip(r'\r')
print df
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
If you want use replace
, add r
and one \
:
如果你想使用replace
,添加r
一个\
:
print df.replace({r'\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
In replace
you can define column for replacing like:
在replace
你可以更换像定义列:
print df
id 29
0 location Uttar Pradesh\r
1 country_name India
2 total_deaths\r 20
print df.replace({'29': {r'\r': ''}}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths\r 20
print df.replace({r'\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
EDIT by comment:
通过评论编辑:
import pandas as pd
df = pd.read_csv('data_source_test.csv')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh\r\n 20
9 10 India Orissa 69
print df.replace({r'\r\n': ''}, regex=True)
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
If need replace only in column location
:
如果只需要在列中替换location
:
df['location'] = df.location.str.replace(r'\r\n', '')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
回答by EdChum
use str.replace
, you need to escape the sequence so it treats it as a carriage return rather than the literal \r
:
使用str.replace
,您需要对序列进行转义,以便将其视为回车而不是文字\r
:
In [15]:
df['29'] = df['29'].str.replace(r'\r','')
df
Out[15]:
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
回答by Gwen Au
The below code removes \n tab spaces, \n new line and \r carriage return and is great for condensing datum into one row. The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a
下面的代码删除了 \n 制表符空格、\n 换行符和 \r 回车符,非常适合将数据压缩为一行。答案取自https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a
df.replace(to_replace=[r"\t|\n|\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)
回答by user13078533
Just make df equal to the df.replace code line and then print df.
只需让 df 等于 df.replace 代码行,然后打印 df。
df=df.replace({'\r': ''}, regex=True)
print(df)