Python 将 Panda DF 列表转换为字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37347725/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting a Panda DF List into a string
提问by Rusty Coder
I have a panda data frame. One of the columns contains a list. I want that column to be a single string.
我有一个熊猫数据框。其中一列包含一个列表。我希望该列是单个字符串。
For example my list ['one','two','three'] should simply be 'one, two, three'
例如我的列表 ['one','two','three'] 应该只是 'one, two,three'
df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str)))
gives me ['one, two, three],['four','five','six'] where the second list is from the next row. Needless to say with millions of rows this concatenation across rows is not only incorrect, it kills my memory.
给我 ['one, two,three],['four','five','six'] 其中第二个列表来自下一行。不用说,对于数百万行,这种跨行串联不仅不正确,而且会扼杀我的记忆。
回答by IanS
You should certainly not convert to string before you transform the list. Try:
在转换列表之前,您当然不应该转换为字符串。尝试:
df['col'].apply(', '.join)
Also note that apply
applies the function to the elements of the series, so using df['col']
in the lambda function is probably not what you want.
另请注意,apply
将函数应用于系列的元素,因此df['col']
在 lambda 函数中使用可能不是您想要的。
Edit: thanks Yakymfor pointing out that there is no need for a lambda function.
编辑:感谢Yakym指出不需要 lambda 函数。
Edit: as noted by Anton Protopopov, there is a native .str.join
method, but it is (surprisingly) a bit slower than apply
.
编辑:正如Anton Protopopov所指出的,有一种本地.str.join
方法,但它(令人惊讶地)比apply
.
回答by hilberts_drinking_problem
When you cast col
to str
with astype
, you get a string representation of a python list, brackets and all. You do not need to do that, just apply
join
directly:
当您转换col
为str
with 时astype
,您将获得一个 Python 列表、方括号和所有内容的字符串表示形式。你不需要这样做,apply
join
直接:
import pandas as pd
df = pd.DataFrame({
'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
})
# Out[8]:
# A
# 0 [a, b, c]
# 1 [A, B, C]
df['Joined'] = df.A.apply(', '.join)
# A Joined
# 0 [a, b, c] a, b, c
# 1 [A, B, C] A, B, C
回答by Anton Protopopov
You could convert your list to str with astype(str)
and then remove '
, [
, ]
characters. Using @Yakim example:
您可以将列表转换为 str ,astype(str)
然后删除'
, [
,]
字符。使用@Yakim 示例:
In [114]: df
Out[114]:
A
0 [a, b, c]
1 [A, B, C]
In [115]: df.A.astype(str).str.replace('\[|\]|\'', '')
Out[115]:
0 a, b, c
1 A, B, C
Name: A, dtype: object
Timing
定时
import pandas as pd
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
df = pd.concat([df]*1000)
In [2]: timeit df['A'].apply(', '.join)
292 μs ± 10.8 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: timeit df['A'].str.join(', ')
368 μs ± 24.6 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
505 μs ± 5.74 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [5]: timeit df['A'].str.replace('\[|\]|\'', '')
2.43 ms ± 62.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
回答by AMC
Pandas offers a method for this, Series.str.join
.
Pandas 为此提供了一种方法,Series.str.join
.