如何使用正则表达式在 Pandas 中将一列拆分为多列？

Question

提问by designil

For example, if I have a home address like this:

例如，如果我有这样的家庭住址：

71 Pilgrim Avenue, Chevy Chase, MD

in a column named 'address'. I would like to split it into columns 'street', 'city', 'state', respectively.

在名为“地址”的列中。我想分别将其拆分为“街道”、“城市”、“州”列。

What is the best way to achieve this using Pandas ?

使用 Pandas 实现这一目标的最佳方法是什么？

I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex").

我试过了df[['street', 'city', 'state']] = df['address'].findall(r"myregex")。

But the error I got is Must have equal len keys and value when setting with an iterable.

但我得到的错误是Must have equal len keys and value when setting with an iterable.

Thank you for your help :)

感谢您的帮助：）

Answer 1

回答by jezrael

You can use splitby regex ,\s+(,and one or more whitespaces):

您可以使用split正则表达式,\s+（,以及一个或多个空格）：

#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
                              address id             street          city  \
0  71 Pilgrim Avenue, Chevy Chase, MD  a  71 Pilgrim Avenue   Chevy Chase   
1         72 Main St, Chevy Chase, MD  b         72 Main St   Chevy Chase   

  state  
0    MD  
1    MD

And if need remove column addressadd drop:

如果需要删除列address添加drop：

df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
  id             street         city state
0  a  71 Pilgrim Avenue  Chevy Chase    MD
1  b         72 Main St  Chevy Chase    MD

Answer 2

回答by Allen

df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
      1: '72 Main St, Chevy Chase, MD'},
     'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())

如何使用正则表达式在 Pandas 中将一列拆分为多列？

提问by designil

回答by jezrael

回答by Allen

相关推荐

最近更新

标签

如何使用正则表达式在 Pandas 中将一列拆分为多列？

提问by designil

回答by jezrael

回答by Allen

相关推荐

pandas 熊猫在 csv 列中读取为浮点数并将空单元格设置为 0

pandas 使用pandas从csv中删除特定行

从单列 Pandas 数据帧生成词云

如何使用 sqlalchemy+pyodbc 和 MS SQL Server 中的多个数据库为 pandas read_sql 创建 sql alchemy 连接？

相关推荐

最近更新

标签