如何使用正则表达式在 Pandas 中将一列拆分为多列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43730422/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:31:20  来源:igfitidea点击:

How to split one column into multiple columns in Pandas using regular expression?

pythonpandas

提问by designil

For example, if I have a home address like this:

例如,如果我有这样的家庭住址:

71 Pilgrim Avenue, Chevy Chase, MD

71 Pilgrim Avenue, Chevy Chase, MD

in a column named 'address'. I would like to split it into columns 'street', 'city', 'state', respectively.

在名为“地址”的列中。我想分别将其拆分为“街道”、“城市”、“州”列。

What is the best way to achieve this using Pandas ?

使用 Pandas 实现这一目标的最佳方法是什么?

I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex").

我试过了df[['street', 'city', 'state']] = df['address'].findall(r"myregex")

But the error I got is Must have equal len keys and value when setting with an iterable.

但我得到的错误是Must have equal len keys and value when setting with an iterable.

Thank you for your help :)

感谢您的帮助 :)

回答by jezrael

You can use splitby regex ,\s+(,and one or more whitespaces):

您可以使用split正则表达式,\s+,以及一个或多个空格):

#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
                              address id             street          city  \
0  71 Pilgrim Avenue, Chevy Chase, MD  a  71 Pilgrim Avenue   Chevy Chase   
1         72 Main St, Chevy Chase, MD  b         72 Main St   Chevy Chase   

  state  
0    MD  
1    MD  

And if need remove column addressadd drop:

如果需要删除列address添加drop

df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
  id             street         city state
0  a  71 Pilgrim Avenue  Chevy Chase    MD
1  b         72 Main St  Chevy Chase    MD

回答by Allen

df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
      1: '72 Main St, Chevy Chase, MD'},
     'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())