pandas 将熊猫系列列表转换为数据框

Question

提问by Hatshepsut

I have a series made of lists

我有一系列的列表

import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])

and I want a DataFrame with each column a list.

我想要一个每列一个列表的 DataFrame。

None of from_items, from_records, DataFrameSeries.to_frameseem to work.

没有from_items，from_records，DataFrameSeries.to_frame似乎工作。

How to do this?

这该怎么做？

Answer 1

回答by Cleb

You can use from_itemslike this (assuming that your lists are of the same length):

您可以这样使用from_items（假设您的列表长度相同）：

pd.DataFrame.from_items(zip(s.index, s.values))

   0  1
0  1  4
1  2  5
2  3  6

or

或者

pd.DataFrame.from_items(zip(s.index, s.values)).T

   0  1  2
0  1  2  3
1  4  5  6

depending on your desired output.

取决于您想要的输出。

This can be much faster than using an apply(as used in @Wen's answerwhich, however, does also work for lists of different length):

这可能比使用 an 快得多apply（如@Wen 的回答中使用的那样，但是，它也适用于不同长度的列表）：

%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 μs per loop

%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop

and

和

%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 μs per loop

%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop

Also @Hatshepsut's answeris quite fast (also works for lists of different length):

另外@Hatshepsut 的回答非常快（也适用于不同长度的列表）：

%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 μs per loop

and

和

%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 μs per loop

Fastest solution seems to be @Abdou's answer(tested for Python 2; also works for lists of different length; use itertools.zip_longestin Python 3.6+):

最快的解决方案似乎是@Abdou 的答案（针对 Python 2 进行了测试；也适用于不同长度的列表；itertools.zip_longest在 Python 3.6+ 中使用）：

%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 μs per loop

An additional option:

一个额外的选择：

pd.DataFrame(dict(zip(s.index, s.values)))

   0  1
0  1  4
1  2  5
2  3  6

Answer 2

回答by Abdou

pd.DataFrame.from_recordsshould also work using itertools.zip_longest:

pd.DataFrame.from_records也应该使用itertools.zip_longest：

from itertools import zip_longest

pd.DataFrame.from_records(zip_longest(*s.values))

#    0  1
# 0  1  4
# 1  2  5
# 2  3  6

Answer 3

回答by Z.Webber

If the length of the series is super high (more than 1m), you can use:

如果系列的长度超高（超过1m），可以使用：

s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(s.tolist())

Answer 4

回答by Hatshepsut

Iterate over the series like this:

像这样迭代系列：

series = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(item for item in series)

   0  1  2
0  1  2  3
1  4  5  6

Answer 5

回答by YOBEN_S

You may looking for

您可能正在寻找

s.apply(lambda x:pd.Series(x))
   0  1  2
0  1  2  3
1  4  5  6

Or

或者

 s.apply(lambda x:pd.Series(x)).T

Out[133]: 
   0  1
0  1  4
1  2  5
2  3  6

Answer 6

回答by Dataman

Notethat the from_items()method in the accepted answer is deprecatedin the latest Pandas and from_dict()method should be used instead. Here is how:

请注意，已from_items()接受答案中的方法在最新的 Pandas 中已弃用，from_dict()应改用方法。方法如下：

pd.DataFrame.from_dict(dict(zip(s.index, s.values)))

## OR  

pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T

Also note that using from_dict()provides us with the fastestapproach so far:

另请注意， usingfrom_dict()为我们提供了迄今为止最快的方法：

%timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values)))
376 μs ± 14.4 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

## OR

%timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T
487 μs ± 3.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Answer 7

回答by Evan Rosica

Try:

尝试：

import numpy as np, pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(np.vstack(s))

pandas 将熊猫系列列表转换为数据框

提问by Hatshepsut

回答by Cleb

回答by Abdou

回答by Z.Webber

回答by Hatshepsut

回答by YOBEN_S

回答by Dataman

回答by Evan Rosica

相关推荐

最近更新

标签

pandas 将熊猫系列列表转换为数据框

提问by Hatshepsut

回答by Cleb

回答by Abdou

回答by Z.Webber

回答by Hatshepsut

回答by YOBEN_S

回答by Dataman

回答by Evan Rosica

相关推荐

pandas 如何用字典键替换数据框列值？

pandas numpy.where: TypeError: 无效的类型提升

pandas 循环分组数据框中的组

pandas 如何检查DataFrame是否为空？

相关推荐

最近更新

标签