从元组到 Pandas 中的多列

Question

提问by ba_ul

How do I convert this dataframe

我如何转换这个数据框

                                          location  value                       
0                   (Richmond, Virginia, nan, USA)    100                       
1              (New York City, New York, nan, USA)    200

to this:

对此：

    city            state       region    country   value
0   Richmond        Virginia    nan       USA       100
1   New York City   New York    nan       USA       200

Note that the locationcolumn in the first dataframe contains tuples. I want to create four columns out of the locationcolumn.

请注意，location第一个数据框中的列包含元组。我想从列中创建四列location。

Answer 1

采纳答案by exp1orer

new_col_list = ['city','state','regions','country']
for n,col in enumerate(new_col_list):
    df[col] = df['location'].apply(lambda location: location[n])

df = df.drop('location',axis=1)

Answer 2

回答by meloncholy

If you return a Series of the (split) location, you can merge (jointo merge on index) the resulting DF directly with your value column.

如果您返回（拆分）位置的系列，您可以将结果 DF 直接与您的值列合并（join以合并索引）。

addr = ['city', 'state', 'region', 'country']
df[['value']].join(df.location.apply(lambda loc: Series(loc, index=addr)))

   value           city     state  region country
0    100       Richmond  Virginia     NaN     USA
1    200  New York City  New York     NaN     USA

Answer 3

回答by Martin Alley

I haven't timed this, but I would suggest this option:

我没有计时，但我建议这个选项：

df.loc[:,'city']=df.location.map(lambda x:x[0])
df.loc[:,'state']=df.location.map(lambda x:x[1])
df.loc[:,'regions']=df.location.map(lambda x:x[2])
df.loc[:,'country']=df.location.map(lambda x:x[3])

I'm guessing avoiding explicit for loop might lend itself to a SIMD instruction (certainly numpy looks for that, but perhaps not other libraries)

我猜避免显式 for 循环可能会适合 SIMD 指令（当然 numpy 会寻找它，但可能不是其他库）

从元组到 Pandas 中的多列

提问by ba_ul

采纳答案by exp1orer

回答by meloncholy

回答by Martin Alley

相关推荐

最近更新

标签

从元组到 Pandas 中的多列

提问by ba_ul

采纳答案by exp1orer

回答by meloncholy

回答by Martin Alley

相关推荐

在 Pandas 中同步两个大数据帧的最有效方法是什么？

pandas Python 使用线性插值对不规则时间序列进行正则化

pandas 用滚动平均值或其他插值替换 NaN 或缺失值

pandas 一个月中的一周熊猫

相关推荐

最近更新

标签