pandas 从 numpy 数组创建熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50518158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Create pandas dataframe from numpy array
提问by blue-sky
To create a pandas dataframe from numpy I can use :
要从 numpy 创建一个Pandas数据框,我可以使用:
columns = ['1','2']
data = np.array([[1,2] , [1,5] , [2,3]])
df_1 = pd.DataFrame(data,columns=columns)
df_1
If I instead use :
如果我改为使用:
columns = ['1','2']
data = np.array([[1,2,2] , [1,5,3]])
df_1 = pd.DataFrame(data,columns=columns)
df_1
Where each array is a column of data. But this throws error :
其中每个数组是一列数据。但这会引发错误:
ValueError: Wrong number of items passed 3, placement implies 2
Is there support in pandas in this data format or must I use the format in example 1 ?
pandas 是否支持这种数据格式,还是必须使用示例 1 中的格式?
回答by jpp
You need to transpose your numpy
array:
您需要转置numpy
数组:
df_1 = pd.DataFrame(data.T, columns=columns)
To see why this is necessary, consider the shape of your array:
要了解为什么这是必要的,请考虑数组的形状:
print(data.shape)
(2, 3)
The second number in the shape tuple, or the number of columns in the array, must be equal to the number of columns in your dataframe.
形状元组中的第二个数字或数组中的列数必须等于数据框中的列数。
When we transpose the array, the data and shape of the array are transposed, enabling it to be a passed into a dataframe with two columns:
当我们转置数组时,数组的数据和形状被转置,使其能够被传递到具有两列的数据帧中:
print(data.T.shape)
(3, 2)
print(data.T)
[[1 1]
[2 5]
[2 3]]
回答by Lzkatz
DataFrames are inherently created in that order from an array.
DataFrame 本质上是按照数组的顺序创建的。
Either way, you need to transpose something.
无论哪种方式,您都需要转置某些内容。
One option would be to specify the index=columns then transpose the whole thing. This will get you the same output.
一种选择是指定 index=columns 然后转置整个事情。这将为您提供相同的输出。
columns = ['1','2']
data = np.array([[1,2,2] , [1,5,3]])
df_1 = pd.DataFrame(data, index=columns).T
df_1
Passing in data.T as mentioned above is also perfectly acceptable (assuming the data is an ndarray type).
如上所述传入 data.T 也是完全可以接受的(假设数据是 ndarray 类型)。
回答by llllllllll
In the second case, you can use:
在第二种情况下,您可以使用:
df_1 = pd.DataFrame(dict(zip(columns, data)))