Python 在 Pandas 中将两个系列组合成一个 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18062135/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Combining two Series into a DataFrame in pandas
提问by user7289
I have two Series s1
and s2
with the same (non-consecutive) indices. How do I combine s1
and s2
to being two columns in a DataFrame and keep one of the indices as a third column?
我有两个系列s1
并且s2
具有相同的(非连续)索引。如何组合s1
并s2
成为 DataFrame 中的两列并将其中一个索引保留为第三列?
采纳答案by Andy Hayden
I think concat
is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them):
我认为这concat
是一个很好的方式来做到这一点。如果它们存在,它使用 Series 的 name 属性作为列(否则它只是给它们编号):
In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1')
In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2')
In [3]: pd.concat([s1, s2], axis=1)
Out[3]:
s1 s2
A 1 3
B 2 4
In [4]: pd.concat([s1, s2], axis=1).reset_index()
Out[4]:
index s1 s2
0 A 1 3
1 B 2 4
Note: This extends to more than 2 Series.
注意:这扩展到超过 2 个系列。
回答by jbn
Example code:
示例代码:
a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})
Pandas allows you to create a DataFrame
from a dict
with Series
as the values and the column names as the keys. When it finds a Series
as a value, it uses the Series
index as part of the DataFrame
index. This data alignment is one of the main perks of Pandas. Consequently, unless you have other needs, the freshly created DataFrame
has duplicated value. In the above example, data['idx_col']
has the same data as data.index
.
Pandas 允许您DataFrame
从 adict
以Series
值和列名作为键创建一个。当它找到 aSeries
作为值时,它使用Series
索引作为索引的一部分DataFrame
。这种数据对齐是 Pandas 的主要优势之一。因此,除非您有其他需求,否则新创建的产品DataFrame
具有重复价值。在上面的例子中,与data['idx_col']
具有相同的数据data.index
。
回答by Bertrand L
Not sure I fully understand your question, but is this what you want to do?
不确定我是否完全理解您的问题,但这是您想要做的吗?
pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)
(index=s1.index
is not even necessary here)
(index=s1.index
这里甚至不需要)
回答by Jeff
Pandas will automatically align these passed in series and create the joint index
They happen to be the same here. reset_index
moves the index to a column.
Pandas 会自动对齐这些串行传递的并创建联合索引它们在这里恰好是相同的。reset_index
将索引移动到列。
In [2]: s1 = Series(randn(5),index=[1,2,4,5,6])
In [4]: s2 = Series(randn(5),index=[1,2,4,5,6])
In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index()
Out[8]:
index s1 s2
0 1 -0.176143 0.128635
1 2 -1.286470 0.908497
2 4 -0.995881 0.528050
3 5 0.402241 0.458870
4 6 0.380457 0.072251
回答by swmfg
Why don't you just use .to_frame if both have the same indexes?
如果两者具有相同的索引,为什么不使用 .to_frame 呢?
>= v0.23
>= v0.23
a.to_frame().join(b)
< v0.23
< v0.23
a.to_frame().join(b.to_frame())
回答by Lorenzo A. Rossi
A simplification of the solution based on join()
:
解决方案的简化基于join()
:
df = a.to_frame().join(b)
回答by Sateesh
If I may answer this.
如果我可以回答这个。
The fundamentals behind converting series to data frame is to understand that
将系列转换为数据框的基本原理是理解
1. At conceptual level, every column in data frame is a series.
1. 在概念层面,数据框中的每一列都是一个系列。
2. And, every column name is a key name that maps to a series.
2. 而且,每个列名都是一个映射到一个系列的键名。
If you keep above two concepts in mind, you can think of many ways to convert series to data frame. One easy solution will be like this:
如果您牢记以上两个概念,您可以想到许多将系列转换为数据框的方法。一个简单的解决方案是这样的:
Create two series here
在这里创建两个系列
import pandas as pd
series_1 = pd.Series(list(range(10)))
series_2 = pd.Series(list(range(20,30)))
Create an empty data frame with just desired column names
创建一个只包含所需列名的空数据框
df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])
Put series value inside data frame using mapping concept
使用映射概念将系列值放入数据框中
df['Column_name#1'] = series_1
df['Column_name#2'] = series_2
Check results now
立即查看结果
df.head(5)
回答by Golden Lion
I used pandas to convert my numpy array or iseries to an dataframe then added and additional the additional column by key as 'prediction'. If you need dataframe converted back to a list then use values.tolist()
我使用 Pandas 将我的 numpy 数组或 iseries 转换为数据帧,然后通过键添加和附加附加列作为“预测”。如果您需要将数据帧转换回列表,请使用 values.tolist()
output=pd.DataFrame(X_test)
output['prediction']=y_pred
list=output.values.tolist()