pandas 将熊猫系列列表转换为数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45901018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:20:13  来源:igfitidea点击:

Convert pandas series of lists to dataframe

pythonpandasdataframe

提问by Hatshepsut

I have a series made of lists

我有一系列的列表

import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])

and I want a DataFrame with each column a list.

我想要一个每列一个列表的 DataFrame。

None of from_items, from_records, DataFrameSeries.to_frameseem to work.

没有from_itemsfrom_recordsDataFrameSeries.to_frame似乎工作。

How to do this?

这该怎么做?

回答by Cleb

You can use from_itemslike this (assuming that your lists are of the same length):

您可以这样使用from_items(假设您的列表长度相同):

pd.DataFrame.from_items(zip(s.index, s.values))

   0  1
0  1  4
1  2  5
2  3  6

or

或者

pd.DataFrame.from_items(zip(s.index, s.values)).T

   0  1  2
0  1  2  3
1  4  5  6

depending on your desired output.

取决于您想要的输出。

This can be much faster than using an apply(as used in @Wen's answerwhich, however, does also work for lists of different length):

这可能比使用 an 快得多apply(如@Wen 的回答中使用的那样,但是,它也适用于不同长度的列表):

%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 μs per loop

%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop

and

%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 μs per loop

%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop

Also @Hatshepsut's answeris quite fast (also works for lists of different length):

另外@Hatshepsut 的回答非常快(也适用于不同长度的列表):

%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 μs per loop

and

%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 μs per loop

Fastest solution seems to be @Abdou's answer(tested for Python 2; also works for lists of different length; use itertools.zip_longestin Python 3.6+):

最快的解决方案似乎是@Abdou 的答案(针对 Python 2 进行了测试;也适用于不同长度的列表;itertools.zip_longest在 Python 3.6+ 中使用):

%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 μs per loop

An additional option:

一个额外的选择:

pd.DataFrame(dict(zip(s.index, s.values)))

   0  1
0  1  4
1  2  5
2  3  6

回答by Abdou

pd.DataFrame.from_recordsshould also work using itertools.zip_longest:

pd.DataFrame.from_records也应该使用itertools.zip_longest

from itertools import zip_longest

pd.DataFrame.from_records(zip_longest(*s.values))

#    0  1
# 0  1  4
# 1  2  5
# 2  3  6

回答by Z.Webber

If the length of the series is super high (more than 1m), you can use:

如果系列的长度超高(超过1m),可以使用:

s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(s.tolist())

回答by Hatshepsut

Iterate over the series like this:

像这样迭代系列:

series = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(item for item in series)

   0  1  2
0  1  2  3
1  4  5  6

回答by YOBEN_S

You may looking for

您可能正在寻找

s.apply(lambda x:pd.Series(x))
   0  1  2
0  1  2  3
1  4  5  6

Or

或者

 s.apply(lambda x:pd.Series(x)).T

Out[133]: 
   0  1
0  1  4
1  2  5
2  3  6

回答by Dataman

Notethat the from_items()method in the accepted answer is deprecatedin the latest Pandas and from_dict()method should be used instead. Here is how:

请注意,已from_items()接受答案中的方法在最新的 Pandas 中已弃用from_dict()应改用方法。方法如下:

pd.DataFrame.from_dict(dict(zip(s.index, s.values)))

## OR  

pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T

Also note that using from_dict()provides us with the fastestapproach so far:

另请注意, usingfrom_dict()为我们提供了迄今为止最快的方法:

%timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values)))
376 μs ± 14.4 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

## OR

%timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T
487 μs ± 3.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

回答by Evan Rosica

Try:

尝试:

import numpy as np, pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(np.vstack(s))