pandas 将熊猫系列列表转换为数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45901018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert pandas series of lists to dataframe
提问by Hatshepsut
I have a series made of lists
我有一系列的列表
import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])
and I want a DataFrame with each column a list.
我想要一个每列一个列表的 DataFrame。
None of from_items
, from_records
, DataFrame
Series.to_frame
seem to work.
没有from_items
,from_records
,DataFrame
Series.to_frame
似乎工作。
How to do this?
这该怎么做?
回答by Cleb
You can use from_items
like this (assuming that your lists are of the same length):
您可以这样使用from_items
(假设您的列表长度相同):
pd.DataFrame.from_items(zip(s.index, s.values))
0 1
0 1 4
1 2 5
2 3 6
or
或者
pd.DataFrame.from_items(zip(s.index, s.values)).T
0 1 2
0 1 2 3
1 4 5 6
depending on your desired output.
取决于您想要的输出。
This can be much faster than using an apply
(as used in @Wen's answerwhich, however, does also work for lists of different length):
这可能比使用 an 快得多apply
(如@Wen 的回答中使用的那样,但是,它也适用于不同长度的列表):
%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 μs per loop
%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop
and
和
%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 μs per loop
%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop
Also @Hatshepsut's answeris quite fast (also works for lists of different length):
另外@Hatshepsut 的回答非常快(也适用于不同长度的列表):
%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 μs per loop
and
和
%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 μs per loop
Fastest solution seems to be @Abdou's answer(tested for Python 2; also works for lists of different length; use itertools.zip_longest
in Python 3.6+):
最快的解决方案似乎是@Abdou 的答案(针对 Python 2 进行了测试;也适用于不同长度的列表;itertools.zip_longest
在 Python 3.6+ 中使用):
%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 μs per loop
An additional option:
一个额外的选择:
pd.DataFrame(dict(zip(s.index, s.values)))
0 1
0 1 4
1 2 5
2 3 6
回答by Abdou
pd.DataFrame.from_records
should also work using itertools.zip_longest
:
pd.DataFrame.from_records
也应该使用itertools.zip_longest
:
from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(*s.values))
# 0 1
# 0 1 4
# 1 2 5
# 2 3 6
回答by Z.Webber
If the length of the series is super high (more than 1m), you can use:
如果系列的长度超高(超过1m),可以使用:
s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(s.tolist())
回答by Hatshepsut
Iterate over the series like this:
像这样迭代系列:
series = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(item for item in series)
0 1 2
0 1 2 3
1 4 5 6
回答by YOBEN_S
You may looking for
您可能正在寻找
s.apply(lambda x:pd.Series(x))
0 1 2
0 1 2 3
1 4 5 6
Or
或者
s.apply(lambda x:pd.Series(x)).T
Out[133]:
0 1
0 1 4
1 2 5
2 3 6
回答by Dataman
Notethat the from_items()
method in the accepted answer is deprecatedin the latest Pandas and from_dict()
method should be used instead. Here is how:
请注意,已from_items()
接受答案中的方法在最新的 Pandas 中已弃用,from_dict()
应改用方法。方法如下:
pd.DataFrame.from_dict(dict(zip(s.index, s.values)))
## OR
pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T
Also note that using from_dict()
provides us with the fastestapproach so far:
另请注意, usingfrom_dict()
为我们提供了迄今为止最快的方法:
%timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values)))
376 μs ± 14.4 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
## OR
%timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T
487 μs ± 3.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
回答by Evan Rosica
Try:
尝试:
import numpy as np, pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(np.vstack(s))