Python Pandas Dataframe:将列拆分为多列,右对齐不一致的单元格条目

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23317342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:44:08  来源:igfitidea点击:

Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries

pythonsplitpandas

提问by jamesbev

I have a pandas dataframe with a column named 'City, State, Country'. I want to separate this column into three new columns, 'City, 'State' and 'Country'.

我有一个 Pandas 数据框,其中有一列名为“城市、州、国家”。我想将此列分成三个新列,“城市”、“州”和“国家”。

0                 HUN
1                 ESP
2                 GBR
3                 ESP
4                 FRA
5             ID, USA
6             GA, USA
7    Hoboken, NJ, USA
8             NJ, USA
9                 AUS

Splitting the column into three columns is trivial enough:

将列拆分为三列非常简单:

location_df = df['City, State, Country'].apply(lambda x: pd.Series(x.split(',')))

However, this creates left-aligned data:

但是,这会创建左对齐的数据:

     0       1       2
0    HUN     NaN     NaN
1    ESP     NaN     NaN
2    GBR     NaN     NaN
3    ESP     NaN     NaN
4    FRA     NaN     NaN
5    ID      USA     NaN
6    GA      USA     NaN
7    Hoboken  NJ     USA
8    NJ      USA     NaN
9    AUS     NaN     NaN

How would one go about creating the new columns with the data right-aligned? Would I need to iterate through every row, count the number of commas and handle the contents individually?

如何创建数据右对齐的新列?我是否需要遍历每一行,计算逗号的数量并单独处理内容?

采纳答案by Karl D.

I'd do something like the following:

我会做类似以下的事情:

foo = lambda x: pd.Series([i for i in reversed(x.split(','))])
rev = df['City, State, Country'].apply(foo)
print rev

      0    1        2
0   HUN  NaN      NaN
1   ESP  NaN      NaN
2   GBR  NaN      NaN
3   ESP  NaN      NaN
4   FRA  NaN      NaN
5   USA   ID      NaN
6   USA   GA      NaN
7   USA   NJ  Hoboken
8   USA   NJ      NaN
9   AUS  NaN      NaN

I think that gets you what you want but if you also want to pretty things up and get a City, State, Country column order, you could add the following:

我认为这可以满足您的需求,但如果您还想美化并获得 City、State、Country 列顺序,您可以添加以下内容:

rev.rename(columns={0:'Country',1:'State',2:'City'},inplace=True)
rev = rev[['City','State','Country']]
print rev

     City State Country
0      NaN   NaN     HUN
1      NaN   NaN     ESP
2      NaN   NaN     GBR
3      NaN   NaN     ESP
4      NaN   NaN     FRA
5      NaN    ID     USA
6      NaN    GA     USA
7  Hoboken    NJ     USA
8      NaN    NJ     USA
9      NaN   NaN     AUS

回答by Naufal

Since you are dealing with strings I would suggest the amendment to your current code i.e.

由于您正在处理字符串,我建议修改您当前的代码,即

location_df = df[['City, State, Country']].apply(lambda x: pd.Series(str(x).split(',')))

I got mine to work by testing one of the columns but give this one a try.

我通过测试其中一列让我的工作正常工作,但请尝试一下。

回答by Dolittle Wang

Assume you have the column name as target

假设您将列名作为目标

df[["City", "State", "Country"]] = df["target"].str.split(pat=",", expand=True)