Python 如何将多个列值连接到 Panda 数据框中的单个列中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39291499/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to concatenate multiple column values into a single column in Panda dataframe
提问by NamAshena
This question is same to this postedearlier. I want to concatenate three columns instead of concatenating two columns:
这个问题与之前发布的这个问题相同。我想连接三列而不是连接两列:
Here is the combining two columns:
这是组合两列:
df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
df['combined']=df.apply(lambda x:'%s_%s' % (x['foo'],x['bar']),axis=1)
df
bar foo new combined
0 1 a apple a_1
1 2 b banana b_2
2 3 c pear c_3
I want to combine three columns with this command but it is not working, any idea?
我想将三列与此命令结合使用,但它不起作用,知道吗?
df['combined']=df.apply(lambda x:'%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
回答by Allen
Another solution using DataFrame.apply()
, with slightly less typing and more scalable when you want to join more columns:
使用 的另一种解决方案DataFrame.apply()
,当您想要加入更多列时,输入略少,可扩展性更强:
cols = ['foo', 'bar', 'new']
df['combined'] = df[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)
回答by shivsn
you can simply do:
你可以简单地做:
In[17]:df['combined']=df['bar'].astype(str)+'_'+df['foo']+'_'+df['new']
In[17]:df
Out[18]:
bar foo new combined
0 1 a apple 1_a_apple
1 2 b banana 2_b_banana
2 3 c pear 3_c_pear
回答by cbrnr
If you have even more columns you want to combine, using the Series method str.cat
might be handy:
如果您想要合并更多列,使用 Series 方法str.cat
可能会很方便:
df["combined"] = df["foo"].str.cat(df[["bar", "new"]].astype(str), sep="_")
Basically, you select the first column (if it is not already of type str
, you need to append .astype(str)
), to which you append the other columns (separated by an optional separator character).
基本上,您选择第一列(如果它不是 type str
,则需要 append .astype(str)
),然后将其他列(由可选的分隔符分隔)附加到该列。
回答by MaxU
Just wanted to make a time comparison for both solutions (for 30K rows DF):
只是想对两种解决方案进行时间比较(对于 30K 行 DF):
In [1]: df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
In [2]: big = pd.concat([df] * 10**4, ignore_index=True)
In [3]: big.shape
Out[3]: (30000, 3)
In [4]: %timeit big.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
1 loop, best of 3: 881 ms per loop
In [5]: %timeit big['bar'].astype(str)+'_'+big['foo']+'_'+big['new']
10 loops, best of 3: 44.2 ms per loop
a few more options:
还有几个选项:
In [6]: %timeit big.ix[:, :-1].astype(str).add('_').sum(axis=1).str.cat(big.new)
10 loops, best of 3: 72.2 ms per loop
In [11]: %timeit big.astype(str).add('_').sum(axis=1).str[:-1]
10 loops, best of 3: 82.3 ms per loop
回答by milos.ai
I think you are missing one %s
我想你少了一个%s
df['combined']=df.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
回答by Manivannan Murugavel
df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
df['combined'] = df['foo'].astype(str)+'_'+df['bar'].astype(str)
If you concatenate with string('_') please you convert the column to string which you want and after you can concatenate the dataframe.
如果您与 string('_') 连接,请将列转换为您想要的字符串,然后连接数据框。
回答by Nipun Kumar Goel
df['New_column_name'] = df['Column1'].map(str) + 'X' + df['Steps']
X= x is any delimiter (eg: space) by which you want to separate two merged column.
X= x 是您想要分隔两个合并列的任何分隔符(例如:空格)。
回答by derchambers
The answer given by @allen is reasonably generic but can lack in performance for larger dataframes:
@allen 给出的答案相当通用,但对于较大的数据帧可能缺乏性能:
Reduce does a lotbetter:
确实减少了很多更好:
from functools import reduce
import pandas as pd
# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'
def reduce_join(df, columns):
assert len(columns) > 1
slist = [df[x].astype(str) for x in columns]
return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])
def apply_join(df, columns):
assert len(columns) > 1
return df[columns].apply(lambda row:'_'.join(row.values.astype(str)), axis=1)
# ensure outputs are equal
df1 = reduce_join(df, list('1234'))
df2 = apply_join(df, list('1234'))
assert df1.equals(df2)
# profile
%timeit df1 = reduce_join(df, list('1234')) # 733 ms
%timeit df2 = apply_join(df, list('1234')) # 8.84 s
回答by Grzegorz
@derchambers I found one more solution:
@derchambers 我找到了另一种解决方案:
import pandas as pd
# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'
def eval_join(df, columns):
sum_elements = [f"df['{col}']" for col in list('1234')]
to_eval = "+ '_' + ".join(sum_elements)
return eval(to_eval)
#profile
%timeit df3 = eval_join(df, list('1234')) # 504 ms