Python 如何将多个列值连接到 Panda 数据框中的单个列中

Question

提问by NamAshena

This question is same to this postedearlier. I want to concatenate three columns instead of concatenating two columns:

这个问题与之前发布的这个问题相同。我想连接三列而不是连接两列：

Here is the combining two columns:

这是组合两列：

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined']=df.apply(lambda x:'%s_%s' % (x['foo'],x['bar']),axis=1)

df
    bar foo new combined
0   1   a   apple   a_1
1   2   b   banana  b_2
2   3   c   pear    c_3

I want to combine three columns with this command but it is not working, any idea?

我想将三列与此命令结合使用，但它不起作用，知道吗？

df['combined']=df.apply(lambda x:'%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

Answer 1

回答by Allen

Another solution using DataFrame.apply(), with slightly less typing and more scalable when you want to join more columns:

使用的另一种解决方案DataFrame.apply()，当您想要加入更多列时，输入略少，可扩展性更强：

cols = ['foo', 'bar', 'new']
df['combined'] = df[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)

Answer 2

回答by shivsn

you can simply do:

你可以简单地做：

In[17]:df['combined']=df['bar'].astype(str)+'_'+df['foo']+'_'+df['new']

In[17]:df
Out[18]: 
   bar foo     new    combined
0    1   a   apple   1_a_apple
1    2   b  banana  2_b_banana
2    3   c    pear    3_c_pear

Answer 3

回答by cbrnr

If you have even more columns you want to combine, using the Series method str.catmight be handy:

如果您想要合并更多列，使用 Series 方法str.cat可能会很方便：

df["combined"] = df["foo"].str.cat(df[["bar", "new"]].astype(str), sep="_")

Basically, you select the first column (if it is not already of type str, you need to append .astype(str)), to which you append the other columns (separated by an optional separator character).

基本上，您选择第一列（如果它不是 type str，则需要 append .astype(str)），然后将其他列（由可选的分隔符分隔）附加到该列。

Answer 4

回答by MaxU

Just wanted to make a time comparison for both solutions (for 30K rows DF):

只是想对两种解决方案进行时间比较（对于 30K 行 DF）：

In [1]: df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

In [2]: big = pd.concat([df] * 10**4, ignore_index=True)

In [3]: big.shape
Out[3]: (30000, 3)

In [4]: %timeit big.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
1 loop, best of 3: 881 ms per loop

In [5]: %timeit big['bar'].astype(str)+'_'+big['foo']+'_'+big['new']
10 loops, best of 3: 44.2 ms per loop

a few more options:

还有几个选项：

In [6]: %timeit big.ix[:, :-1].astype(str).add('_').sum(axis=1).str.cat(big.new)
10 loops, best of 3: 72.2 ms per loop

In [11]: %timeit big.astype(str).add('_').sum(axis=1).str[:-1]
10 loops, best of 3: 82.3 ms per loop

Answer 5

回答by milos.ai

I think you are missing one %s

我想你少了一个%s

df['combined']=df.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

Answer 6

回答by Manivannan Murugavel

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined'] = df['foo'].astype(str)+'_'+df['bar'].astype(str)

If you concatenate with string('_') please you convert the column to string which you want and after you can concatenate the dataframe.

如果您与 string('_') 连接，请将列转换为您想要的字符串，然后连接数据框。

Answer 7

回答by Nipun Kumar Goel

df['New_column_name'] = df['Column1'].map(str) + 'X' + df['Steps']

X= x is any delimiter (eg: space) by which you want to separate two merged column.

X= x 是您想要分隔两个合并列的任何分隔符（例如：空格）。

Answer 8

回答by derchambers

The answer given by @allen is reasonably generic but can lack in performance for larger dataframes:

@allen 给出的答案相当通用，但对于较大的数据帧可能缺乏性能：

Reduce does a lotbetter:

确实减少了很多更好：

from functools import reduce

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'


def reduce_join(df, columns):
    assert len(columns) > 1
    slist = [df[x].astype(str) for x in columns]
    return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])


def apply_join(df, columns):
    assert len(columns) > 1
    return df[columns].apply(lambda row:'_'.join(row.values.astype(str)), axis=1)

# ensure outputs are equal
df1 = reduce_join(df, list('1234'))
df2 = apply_join(df, list('1234'))
assert df1.equals(df2)

# profile
%timeit df1 = reduce_join(df, list('1234'))  # 733 ms
%timeit df2 = apply_join(df, list('1234'))   # 8.84 s

Answer 9

回答by Grzegorz

@derchambers I found one more solution:

@derchambers 我找到了另一种解决方案：

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'

def eval_join(df, columns):

    sum_elements = [f"df['{col}']" for col in list('1234')]
    to_eval = "+ '_' + ".join(sum_elements)

    return eval(to_eval)


#profile
%timeit df3 = eval_join(df, list('1234')) # 504 ms

Python 如何将多个列值连接到 Panda 数据框中的单个列中

提问by NamAshena

回答by Allen

回答by shivsn

回答by cbrnr

回答by MaxU

回答by milos.ai

回答by Manivannan Murugavel

回答by Nipun Kumar Goel

回答by derchambers

回答by Grzegorz

相关推荐

最近更新

标签

Python 如何将多个列值连接到 Panda 数据框中的单个列中

提问by NamAshena

回答by Allen

回答by shivsn

回答by cbrnr

回答by MaxU

回答by milos.ai

回答by Manivannan Murugavel

回答by Nipun Kumar Goel

回答by derchambers

回答by Grzegorz

相关推荐

Python 类型错误：JSON 对象必须是 str，而不是 'bytes'

循环遍历python中的文件夹并打开文件会引发错误

Python 中的所得税程序

如何在 Python 中构建提升图（又名增益图）？

相关推荐

最近更新

标签