Python 使用包含多种类型的 numpy 数组创建 Pandas DataFrame

Question

提问by bfcondon

I want to create a pandas dataframe with default values of zero, but one column of integers and the other of floats. I am able to create a numpy array with the correct types, see the valuesvariable below. However, when I pass that into the dataframe constructor, it only returns NaN values (see dfbelow). I have include the untyped code that returns an array of floats(see df2)

我想创建一个默认值为零的熊猫数据框，但是一列整数和另一列浮点数。我能够创建一个具有正确类型的 numpy 数组，请参阅values下面的变量。但是，当我将它传递给数据帧构造函数时，它只返回 NaN 值（见df下文）。我已经包含了返回浮点数组的无类型代码（参见df2）

import pandas as pd
import numpy as np

values = np.zeros((2,3), dtype='int32,float32')
index = ['x', 'y']
columns = ['a','b','c']

df = pd.DataFrame(data=values, index=index, columns=columns)
df.values.dtype

values2 = np.zeros((2,3))
df2 = pd.DataFrame(data=values2, index=index, columns=columns)
df2.values.dtype

Any suggestions on how to construct the dataframe?

关于如何构建数据框的任何建议？

Answer 1

采纳答案by unutbu

Here are a few options you could choose from:

您可以从以下几个选项中进行选择：

import numpy as np
import pandas as pd

index = ['x', 'y']
columns = ['a','b','c']

# Option 1: Set the column names in the structured array's dtype 
dtype = [('a','int32'), ('b','float32'), ('c','float32')]
values = np.zeros(2, dtype=dtype)
df = pd.DataFrame(values, index=index)

# Option 2: Alter the structured array's column names after it has been created
values = np.zeros(2, dtype='int32, float32, float32')
values.dtype.names = columns
df2 = pd.DataFrame(values, index=index, columns=columns)

# Option 3: Alter the DataFrame's column names after it has been created
values = np.zeros(2, dtype='int32, float32, float32')
df3 = pd.DataFrame(values, index=index)
df3.columns = columns

# Option 4: Use a dict of arrays, each of the right dtype:
df4 = pd.DataFrame(
    {'a': np.zeros(2, dtype='int32'),
     'b': np.zeros(2, dtype='float32'),
     'c': np.zeros(2, dtype='float32')}, index=index, columns=columns)

# Option 5: Concatenate DataFrames of the simple dtypes:
df5 = pd.concat([
    pd.DataFrame(np.zeros((2,), dtype='int32'), columns=['a']), 
    pd.DataFrame(np.zeros((2,2), dtype='float32'), columns=['b','c'])], axis=1)

# Option 6: Alter the dtypes after the DataFrame has been formed. (This is not very efficient)
values2 = np.zeros((2, 3))
df6 = pd.DataFrame(values2, index=index, columns=columns)
for col, dtype in zip(df6.columns, 'int32 float32 float32'.split()):
    df6[col] = df6[col].astype(dtype)

Each of the options above produce the same result

上面的每个选项都会产生相同的结果

   a  b  c
x  0  0  0
y  0  0  0

with dtypes:

使用数据类型：

a      int32
b    float32
c    float32
dtype: object

Why pd.DataFrame(values, index=index, columns=columns)produces a DataFrame with NaNs:

为什么pd.DataFrame(values, index=index, columns=columns)用 NaN 生成 DataFrame：

valuesis a structured array with column names f0, f1, f2:

values是一个带有列名f0, f1,的结构化数组f2：

In [171]:  values
Out[172]: 
array([(0, 0.0, 0.0), (0, 0.0, 0.0)], 
      dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '<f4')])

If you pass the argument columns=['a', 'b', 'c']to pd.DataFrame, then Pandas will look for columns with those names in the structured array values. When those columns are not found, Pandas places NaNs in the DataFrame to represent missing values.

如果您将参数传递columns=['a', 'b', 'c']给pd.DataFrame，那么 Pandas 将在结构化数组中查找具有这些名称的列values。当找不到这些列时，Pandas 将NaNs 放在 DataFrame 中以表示缺失值。

Python 使用包含多种类型的 numpy 数组创建 Pandas DataFrame

提问by bfcondon

采纳答案by unutbu

相关推荐

最近更新

标签

Python 使用包含多种类型的 numpy 数组创建 Pandas DataFrame

提问by bfcondon

采纳答案by unutbu

相关推荐

Python-3.2 协程：AttributeError：'generator' 对象没有属性 'next'

Python Matplotlib 中的水平堆积条形图

Python 无法将 jinja2 变量传递到 javascript 片段中

Python：将随机数放入列表

相关推荐

最近更新

标签