Python 如何将多个列值连接到 Panda 数据框中的单个列中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39291499/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:05:29  来源:igfitidea点击:

How to concatenate multiple column values into a single column in Panda dataframe

pythonpandasdataframe

提问by NamAshena

This question is same to this postedearlier. I want to concatenate three columns instead of concatenating two columns:

这个问题与之前发布的这个问题相同。我想连接三列而不是连接两列:

Here is the combining two columns:

这是组合两列:

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined']=df.apply(lambda x:'%s_%s' % (x['foo'],x['bar']),axis=1)

df
    bar foo new combined
0   1   a   apple   a_1
1   2   b   banana  b_2
2   3   c   pear    c_3

I want to combine three columns with this command but it is not working, any idea?

我想将三列与此命令结合使用,但它不起作用,知道吗?

df['combined']=df.apply(lambda x:'%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

回答by Allen

Another solution using DataFrame.apply(), with slightly less typing and more scalable when you want to join more columns:

使用 的另一种解决方案DataFrame.apply(),当您想要加入更多列时,输入略少,可扩展性更强:

cols = ['foo', 'bar', 'new']
df['combined'] = df[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)

回答by shivsn

you can simply do:

你可以简单地做:

In[17]:df['combined']=df['bar'].astype(str)+'_'+df['foo']+'_'+df['new']

In[17]:df
Out[18]: 
   bar foo     new    combined
0    1   a   apple   1_a_apple
1    2   b  banana  2_b_banana
2    3   c    pear    3_c_pear

回答by cbrnr

If you have even more columns you want to combine, using the Series method str.catmight be handy:

如果您想要合并更多列,使用 Series 方法str.cat可能会很方便:

df["combined"] = df["foo"].str.cat(df[["bar", "new"]].astype(str), sep="_")

Basically, you select the first column (if it is not already of type str, you need to append .astype(str)), to which you append the other columns (separated by an optional separator character).

基本上,您选择第一列(如果它不是 type str,则需要 append .astype(str)),然后将其他列(由可选的分隔符分隔)附加到该列。

回答by MaxU

Just wanted to make a time comparison for both solutions (for 30K rows DF):

只是想对两种解决方案进行时间比较(对于 30K 行 DF):

In [1]: df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

In [2]: big = pd.concat([df] * 10**4, ignore_index=True)

In [3]: big.shape
Out[3]: (30000, 3)

In [4]: %timeit big.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
1 loop, best of 3: 881 ms per loop

In [5]: %timeit big['bar'].astype(str)+'_'+big['foo']+'_'+big['new']
10 loops, best of 3: 44.2 ms per loop

a few more options:

还有几个选项:

In [6]: %timeit big.ix[:, :-1].astype(str).add('_').sum(axis=1).str.cat(big.new)
10 loops, best of 3: 72.2 ms per loop

In [11]: %timeit big.astype(str).add('_').sum(axis=1).str[:-1]
10 loops, best of 3: 82.3 ms per loop

回答by milos.ai

I think you are missing one %s

我想你少了一个%s

df['combined']=df.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

回答by Manivannan Murugavel

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined'] = df['foo'].astype(str)+'_'+df['bar'].astype(str)

If you concatenate with string('_') please you convert the column to string which you want and after you can concatenate the dataframe.

如果您与 string('_') 连接,请将列转换为您想要的字符串,然后连接数据框。

回答by Nipun Kumar Goel

df['New_column_name'] = df['Column1'].map(str) + 'X' + df['Steps']

X= x is any delimiter (eg: space) by which you want to separate two merged column.

X= x 是您想要分隔两个合并列的任何分隔符(例如:空格)。

回答by derchambers

The answer given by @allen is reasonably generic but can lack in performance for larger dataframes:

@allen 给出的答案相当通用,但对于较大的数据帧可能缺乏性能:

Reduce does a lotbetter:

确实减少了很多更好:

from functools import reduce

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'


def reduce_join(df, columns):
    assert len(columns) > 1
    slist = [df[x].astype(str) for x in columns]
    return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])


def apply_join(df, columns):
    assert len(columns) > 1
    return df[columns].apply(lambda row:'_'.join(row.values.astype(str)), axis=1)

# ensure outputs are equal
df1 = reduce_join(df, list('1234'))
df2 = apply_join(df, list('1234'))
assert df1.equals(df2)

# profile
%timeit df1 = reduce_join(df, list('1234'))  # 733 ms
%timeit df2 = apply_join(df, list('1234'))   # 8.84 s

回答by Grzegorz

@derchambers I found one more solution:

@derchambers 我找到了另一种解决方案:

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'

def eval_join(df, columns):

    sum_elements = [f"df['{col}']" for col in list('1234')]
    to_eval = "+ '_' + ".join(sum_elements)

    return eval(to_eval)


#profile
%timeit df3 = eval_join(df, list('1234')) # 504 ms