如何使用正则表达式在 Pandas 中将一列拆分为多列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43730422/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split one column into multiple columns in Pandas using regular expression?
提问by designil
For example, if I have a home address like this:
例如,如果我有这样的家庭住址:
71 Pilgrim Avenue, Chevy Chase, MD
71 Pilgrim Avenue, Chevy Chase, MD
in a column named 'address'. I would like to split it into columns 'street', 'city', 'state', respectively.
在名为“地址”的列中。我想分别将其拆分为“街道”、“城市”、“州”列。
What is the best way to achieve this using Pandas ?
使用 Pandas 实现这一目标的最佳方法是什么?
I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex").
我试过了df[['street', 'city', 'state']] = df['address'].findall(r"myregex")。
But the error I got is Must have equal len keys and value when setting with an iterable.
但我得到的错误是Must have equal len keys and value when setting with an iterable.
Thank you for your help :)
感谢您的帮助 :)
回答by jezrael
You can use splitby regex ,\s+(,and one or more whitespaces):
您可以使用split正则表达式,\s+(,以及一个或多个空格):
#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
address id street city \
0 71 Pilgrim Avenue, Chevy Chase, MD a 71 Pilgrim Avenue Chevy Chase
1 72 Main St, Chevy Chase, MD b 72 Main St Chevy Chase
state
0 MD
1 MD
And if need remove column addressadd drop:
如果需要删除列address添加drop:
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
id street city state
0 a 71 Pilgrim Avenue Chevy Chase MD
1 b 72 Main St Chevy Chase MD
回答by Allen
df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
1: '72 Main St, Chevy Chase, MD'},
'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())

