从元组到 Pandas 中的多列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25559202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:24:25  来源:igfitidea点击:

From tuples to multiple columns in pandas

pythonpandastuples

提问by ba_ul

How do I convert this dataframe

我如何转换这个数据框

                                          location  value                       
0                   (Richmond, Virginia, nan, USA)    100                       
1              (New York City, New York, nan, USA)    200                       

to this:

对此:

    city            state       region    country   value
0   Richmond        Virginia    nan       USA       100
1   New York City   New York    nan       USA       200

Note that the locationcolumn in the first dataframe contains tuples. I want to create four columns out of the locationcolumn.

请注意,location第一个数据框中的列包含元组。我想从列中创建四列location

采纳答案by exp1orer

new_col_list = ['city','state','regions','country']
for n,col in enumerate(new_col_list):
    df[col] = df['location'].apply(lambda location: location[n])

df = df.drop('location',axis=1)

回答by meloncholy

If you return a Series of the (split) location, you can merge (jointo merge on index) the resulting DF directly with your value column.

如果您返回(拆分)位置的系列,您可以将结果 DF 直接与您的值列合并(join合并索引)。

addr = ['city', 'state', 'region', 'country']
df[['value']].join(df.location.apply(lambda loc: Series(loc, index=addr)))

   value           city     state  region country
0    100       Richmond  Virginia     NaN     USA
1    200  New York City  New York     NaN     USA

回答by Martin Alley

I haven't timed this, but I would suggest this option:

我没有计时,但我建议这个选项:

df.loc[:,'city']=df.location.map(lambda x:x[0])
df.loc[:,'state']=df.location.map(lambda x:x[1])
df.loc[:,'regions']=df.location.map(lambda x:x[2])
df.loc[:,'country']=df.location.map(lambda x:x[3])

I'm guessing avoiding explicit for loop might lend itself to a SIMD instruction (certainly numpy looks for that, but perhaps not other libraries)

我猜避免显式 for 循环可能会适合 SIMD 指令(当然 numpy 会寻找它,但可能不是其他库)