如何使用正则表达式在 Pandas 中将一列拆分为多列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43730422/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split one column into multiple columns in Pandas using regular expression?
提问by designil
For example, if I have a home address like this:
例如,如果我有这样的家庭住址:
71 Pilgrim Avenue, Chevy Chase, MD
71 Pilgrim Avenue, Chevy Chase, MD
in a column named 'address'. I would like to split it into columns 'street', 'city', 'state', respectively.
在名为“地址”的列中。我想分别将其拆分为“街道”、“城市”、“州”列。
What is the best way to achieve this using Pandas ?
使用 Pandas 实现这一目标的最佳方法是什么?
I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex")
.
我试过了df[['street', 'city', 'state']] = df['address'].findall(r"myregex")
。
But the error I got is Must have equal len keys and value when setting with an iterable
.
但我得到的错误是Must have equal len keys and value when setting with an iterable
.
Thank you for your help :)
感谢您的帮助 :)
回答by jezrael
You can use split
by regex ,\s+
(,
and one or more whitespaces):
您可以使用split
正则表达式,\s+
(,
以及一个或多个空格):
#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
address id street city \
0 71 Pilgrim Avenue, Chevy Chase, MD a 71 Pilgrim Avenue Chevy Chase
1 72 Main St, Chevy Chase, MD b 72 Main St Chevy Chase
state
0 MD
1 MD
And if need remove column address
add drop
:
如果需要删除列address
添加drop
:
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
id street city state
0 a 71 Pilgrim Avenue Chevy Chase MD
1 b 72 Main St Chevy Chase MD
回答by Allen
df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
1: '72 Main St, Chevy Chase, MD'},
'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())