Python 在 Pandas 中将两个系列组合成一个 DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18062135/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:50:01  来源:igfitidea点击:

Combining two Series into a DataFrame in pandas

pythonpandasseriesdataframe

提问by user7289

I have two Series s1and s2with the same (non-consecutive) indices. How do I combine s1and s2to being two columns in a DataFrame and keep one of the indices as a third column?

我有两个系列s1并且s2具有相同的(非连续)索引。如何组合s1s2成为 DataFrame 中的两列并将其中一个索引保留为第三列?

采纳答案by Andy Hayden

I think concatis a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them):

我认为这concat是一个很好的方式来做到这一点。如果它们存在,它使用 Series 的 name 属性作为列(否则它只是给它们编号):

In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1')

In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2')

In [3]: pd.concat([s1, s2], axis=1)
Out[3]:
   s1  s2
A   1   3
B   2   4

In [4]: pd.concat([s1, s2], axis=1).reset_index()
Out[4]:
  index  s1  s2
0     A   1   3
1     B   2   4

Note: This extends to more than 2 Series.

注意:这扩展到超过 2 个系列。

回答by jbn

Example code:

示例代码:

a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})

Pandas allows you to create a DataFramefrom a dictwith Seriesas the values and the column names as the keys. When it finds a Seriesas a value, it uses the Seriesindex as part of the DataFrameindex. This data alignment is one of the main perks of Pandas. Consequently, unless you have other needs, the freshly created DataFramehas duplicated value. In the above example, data['idx_col']has the same data as data.index.

Pandas 允许您DataFrame从 adictSeries值和列名作为键创建一个。当它找到 aSeries作为值时,它使用Series索引作为索引的一部分DataFrame。这种数据对齐是 Pandas 的主要优势之一。因此,除非您有其他需求,否则新创建的产品DataFrame具有重复价值。在上面的例子中,与data['idx_col']具有相同的数据data.index

回答by Bertrand L

Not sure I fully understand your question, but is this what you want to do?

不确定我是否完全理解您的问题,但这是您想要做的吗?

pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)

(index=s1.indexis not even necessary here)

index=s1.index这里甚至不需要)

回答by Jeff

Pandas will automatically align these passed in series and create the joint index They happen to be the same here. reset_indexmoves the index to a column.

Pandas 会自动对齐这些串行传递的并创建联合索引它们在这里恰好是相同的。reset_index将索引移动到列。

In [2]: s1 = Series(randn(5),index=[1,2,4,5,6])

In [4]: s2 = Series(randn(5),index=[1,2,4,5,6])

In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index()
Out[8]: 
   index        s1        s2
0      1 -0.176143  0.128635
1      2 -1.286470  0.908497
2      4 -0.995881  0.528050
3      5  0.402241  0.458870
4      6  0.380457  0.072251

回答by swmfg

Why don't you just use .to_frame if both have the same indexes?

如果两者具有相同的索引,为什么不使用 .to_frame 呢?

>= v0.23

>= v0.23

a.to_frame().join(b)

< v0.23

< v0.23

a.to_frame().join(b.to_frame())

回答by Lorenzo A. Rossi

A simplification of the solution based on join():

解决方案的简化基于join()

df = a.to_frame().join(b)

回答by Sateesh

If I may answer this.

如果我可以回答这个。

The fundamentals behind converting series to data frame is to understand that

将系列转换为数据框的基本原理是理解

1. At conceptual level, every column in data frame is a series.

1. 在概念层面,数据框中的每一列都是一个系列。

2. And, every column name is a key name that maps to a series.

2. 而且,每个列名都是一个映射到一个系列的键名。

If you keep above two concepts in mind, you can think of many ways to convert series to data frame. One easy solution will be like this:

如果您牢记以上两个概念,您可以想到许多将系列转换为数据框的方法。一个简单的解决方案是这样的:

Create two series here

在这里创建两个系列

import pandas as pd

series_1 = pd.Series(list(range(10)))

series_2 = pd.Series(list(range(20,30)))

Create an empty data frame with just desired column names

创建一个只包含所需列名的空数据框

df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])

Put series value inside data frame using mapping concept

使用映射概念将系列值放入数据框中

df['Column_name#1'] = series_1

df['Column_name#2'] = series_2

Check results now

立即查看结果

df.head(5)

回答by Golden Lion

I used pandas to convert my numpy array or iseries to an dataframe then added and additional the additional column by key as 'prediction'. If you need dataframe converted back to a list then use values.tolist()

我使用 Pandas 将我的 numpy 数组或 iseries 转换为数据帧,然后通过键添加和附加附加列作为“预测”。如果您需要将数据帧转换回列表,请使用 values.tolist()

output=pd.DataFrame(X_test)
output['prediction']=y_pred

list=output.values.tolist()