Python Pandas 数据框创建

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46562479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:34:55  来源:igfitidea点击:

Python Pandas Data frame creation

python-2.7pandasnumpydataframe

提问by Sarvagya Dubey

I tried to create a data frame df using the below code :

我尝试使用以下代码创建数据框 df :

import numpy as np
import pandas as pd
index = [0,1,2,3,4,5]
s = pd.Series([1,2,3,4,5,6],index= index)
t = pd.Series([2,4,6,8,10,12],index= index)
df = pd.DataFrame(s,columns = ["MUL1"])
df["MUL2"] =t

print df


   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12

While trying to create the same data frame using the below syntax, I am getting a wierd output.

在尝试使用以下语法创建相同的数据框时,我得到了一个奇怪的输出。

df = pd.DataFrame([s,t],columns = ["MUL1","MUL2"])

print df

   MUL1  MUL2
0   NaN   NaN
1   NaN   NaN

Please explain why the NaN is being displayed in the dataframe when both the Series are non empty and why only two rows are getting displayed and no the rest.

请解释为什么当系列都非空时 NaN 显示在数据框中,以及为什么只显示两行而没有显示其余行。

Also provide the correct way to create the data frame same as has been mentioned above by using the columns argument in the pandas DataFrame method.

还通过使用 pandas DataFrame 方法中的 columns 参数提供创建与上述相同的数据框的正确方法。

采纳答案by Divakar

One of the correct ways would be to stack the array data from the input list holding those series into columns -

正确的方法之一是将包含这些系列的输入列表中的数组数据堆叠到列中 -

In [161]: pd.DataFrame(np.c_[s,t],columns = ["MUL1","MUL2"])
Out[161]: 
   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12

Behind the scenes, the stacking creates a 2D array, which is then converted to a dataframe. Here's what the stacked array looks like -

在幕后,堆叠会创建一个二维数组,然后将其转换为数据帧。这是堆叠数组的样子 -

In [162]: np.c_[s,t]
Out[162]: 
array([[ 1,  2],
       [ 2,  4],
       [ 3,  6],
       [ 4,  8],
       [ 5, 10],
       [ 6, 12]])

回答by jezrael

If remove columns argument get:

如果删除列参数得到:

df = pd.DataFrame([s,t])

print (df)
   0  1  2  3   4   5
0  1  2  3  4   5   6
1  2  4  6  8  10  12

Then define columns - if columns not exist get NaNs column:

然后定义列 - 如果列不存在,则获取 NaN 列:

df = pd.DataFrame([s,t], columns=[0,'MUL2'])

print (df)
     0  MUL2
0  1.0   NaN
1  2.0   NaN


Better is use dictionary:

更好的是使用dictionary

df = pd.DataFrame({'MUL1':s,'MUL2':t})

print (df)
   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12

And if need change columns order add columns parameter:

如果需要更改列顺序添加列参数:

df = pd.DataFrame({'MUL1':s,'MUL2':t}, columns=['MUL2','MUL1'])

print (df)
   MUL2  MUL1
0     2     1
1     4     2
2     6     3
3     8     4
4    10     5
5    12     6

More information is in dataframe documentation.

更多信息在数据框文档中

Another solution by concat- DataFrameconstructor is not necessary:

不需要concat-DataFrame构造函数的另一个解决方案:

df = pd.concat([s,t], axis=1, keys=['MUL1','MUL2'])

print (df)
   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12