Python 如何使用 Pandas 将多行字符串合并为一行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33279940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to combine multiple rows of strings into one using pandas?
提问by eclairs
I have a DataFrame with multiple rows. Is there any way in which they can be combined to form one string?
我有一个多行的 DataFrame。有什么方法可以将它们组合成一个字符串吗?
For example:
例如:
words
0 I, will, hereby
1 am, gonna
2 going, far
3 to
4 do
5 this
Expected output:
预期输出:
I, will, hereby, am, gonna, going, far, to, do, this
采纳答案by Alex Riley
You can use str.cat
to join the strings in each row. For a Series or column s
, write:
您可以使用str.cat
连接每一行中的字符串。对于 Series 或 column s
,请编写:
>>> s.str.cat(sep=', ')
'I, will, hereby, am, gonna, going, far, to, do, this'
回答by Zhong Dai
If you have a DataFrame
rather than a Series
and you want to concatenate values (I think text values only) from different rows based on another column as a 'group by' key, then you can use the .agg
method from the class DataFrameGroupBy
. Here is a link to the API manual.
如果您有 aDataFrame
而不是 aSeries
并且您想连接来自基于另一列的不同行的值(我认为仅文本值)作为“分组依据”键,那么您可以使用.agg
类中的方法DataFrameGroupBy
。这是API 手册的链接。
Sample code tested with Pandas v0.18.1:
使用 Pandas v0.18.1 测试的示例代码:
import pandas as pd
df = pd.DataFrame({
'category': ['A'] * 3 + ['B'] * 2,
'name': ['A1', 'A2', 'A3', 'B1', 'B2'],
'num': range(1, 6)
})
df.groupby('category').agg({
'name': lambda x: ', '.join(x),
'num': lambda x: x.max()
})
回答by Zero
How about traditional python's join
? And, it's faster.
传统的蟒蛇怎么样join
?而且,速度更快。
In [209]: ', '.join(df.words)
Out[209]: 'I, will, hereby, am, gonna, going, far, to, do, this'
Timings in Dec, 2016 on pandas 0.18.1
熊猫时间 2016 年 12 月 0.18.1
In [214]: df.shape
Out[214]: (6, 1)
In [215]: %timeit df.words.str.cat(sep=', ')
10000 loops, best of 3: 72.2 μs per loop
In [216]: %timeit ', '.join(df.words)
100000 loops, best of 3: 14 μs per loop
In [217]: df = pd.concat([df]*10000, ignore_index=True)
In [218]: df.shape
Out[218]: (60000, 1)
In [219]: %timeit df.words.str.cat(sep=', ')
100 loops, best of 3: 5.2 ms per loop
In [220]: %timeit ', '.join(df.words)
100 loops, best of 3: 1.91 ms per loop
回答by Kevin Chou
For anyone want to know how to combine multiple rows of strings in dataframe
,
I provide a method that can concatenate strings within a 'window-like' range of near rows as follows:
对于任何人想知道如何在字符串中的多行结合起来dataframe
,
我提供了一个可以“窗口状”近行的范围内,连接字符串如下的方法:
# add columns based on 'windows-like' rows
df['windows_key_list'] = pd.Series(df['key'].str.cat([df.groupby(['bycol']).shift(-i)['key'] for i in range(1, windows_size)], sep = ' ')
Note:
This can't be reached by groupby
, because we don't mean the same id of rows, just near rows.
注意:这不能通过 达到groupby
,因为我们的意思不是行的相同 id,只是在行附近。