Python Pandas Dataframe：将列拆分为多列，右对齐不一致的单元格条目

Question

提问by jamesbev

I have a pandas dataframe with a column named 'City, State, Country'. I want to separate this column into three new columns, 'City, 'State' and 'Country'.

我有一个 Pandas 数据框，其中有一列名为“城市、州、国家”。我想将此列分成三个新列，“城市”、“州”和“国家”。

0                 HUN
1                 ESP
2                 GBR
3                 ESP
4                 FRA
5             ID, USA
6             GA, USA
7    Hoboken, NJ, USA
8             NJ, USA
9                 AUS

Splitting the column into three columns is trivial enough:

将列拆分为三列非常简单：

location_df = df['City, State, Country'].apply(lambda x: pd.Series(x.split(',')))

However, this creates left-aligned data:

但是，这会创建左对齐的数据：

     0       1       2
0    HUN     NaN     NaN
1    ESP     NaN     NaN
2    GBR     NaN     NaN
3    ESP     NaN     NaN
4    FRA     NaN     NaN
5    ID      USA     NaN
6    GA      USA     NaN
7    Hoboken  NJ     USA
8    NJ      USA     NaN
9    AUS     NaN     NaN

How would one go about creating the new columns with the data right-aligned? Would I need to iterate through every row, count the number of commas and handle the contents individually?

如何创建数据右对齐的新列？我是否需要遍历每一行，计算逗号的数量并单独处理内容？

Answer 1

采纳答案by Karl D.

I'd do something like the following:

我会做类似以下的事情：

foo = lambda x: pd.Series([i for i in reversed(x.split(','))])
rev = df['City, State, Country'].apply(foo)
print rev

      0    1        2
0   HUN  NaN      NaN
1   ESP  NaN      NaN
2   GBR  NaN      NaN
3   ESP  NaN      NaN
4   FRA  NaN      NaN
5   USA   ID      NaN
6   USA   GA      NaN
7   USA   NJ  Hoboken
8   USA   NJ      NaN
9   AUS  NaN      NaN

I think that gets you what you want but if you also want to pretty things up and get a City, State, Country column order, you could add the following:

我认为这可以满足您的需求，但如果您还想美化并获得 City、State、Country 列顺序，您可以添加以下内容：

rev.rename(columns={0:'Country',1:'State',2:'City'},inplace=True)
rev = rev[['City','State','Country']]
print rev

     City State Country
0      NaN   NaN     HUN
1      NaN   NaN     ESP
2      NaN   NaN     GBR
3      NaN   NaN     ESP
4      NaN   NaN     FRA
5      NaN    ID     USA
6      NaN    GA     USA
7  Hoboken    NJ     USA
8      NaN    NJ     USA
9      NaN   NaN     AUS

Answer 2

回答by Naufal

Since you are dealing with strings I would suggest the amendment to your current code i.e.

由于您正在处理字符串，我建议修改您当前的代码，即

location_df = df[['City, State, Country']].apply(lambda x: pd.Series(str(x).split(',')))

I got mine to work by testing one of the columns but give this one a try.

我通过测试其中一列让我的工作正常工作，但请尝试一下。

Answer 3

回答by Dolittle Wang

Assume you have the column name as target

假设您将列名作为目标

df[["City", "State", "Country"]] = df["target"].str.split(pat=",", expand=True)

Python Pandas Dataframe：将列拆分为多列，右对齐不一致的单元格条目

提问by jamesbev

采纳答案by Karl D.

回答by Naufal

回答by Dolittle Wang

相关推荐

最近更新

标签

Python Pandas Dataframe：将列拆分为多列，右对齐不一致的单元格条目

提问by jamesbev

采纳答案by Karl D.

回答by Naufal

回答by Dolittle Wang

相关推荐

Python 在 Spyder 中更改变量名称

Python 了解在 Selenium 中执行异步脚本

Python Pandas 使用什么规则来生成视图和副本？

Python：如何在三个列表中找到公共值

相关推荐

最近更新

标签