pandas 熊猫如何将数组放置在单个数据帧单元格中？

Question

提问by amadzebra

So I currently have a dataframe that looks like:

所以我目前有一个如下所示的数据框：

Current Dataframe

当前数据帧

And I want to add a completely new column called "Predictors" with only one cell that contains an array.

我想添加一个名为“预测器”的全新列，其中只有一个包含数组的单元格。

So [0, 'Predictors'] should contain an array and everything below that cell in the same column should be empty.

所以 [0, 'Predictors'] 应该包含一个数组，并且同一列中该单元格下方的所有内容都应该是空的。

Here's my attempt, I tried to create a separate dataframe that just contained the "Predictors" column, and tried appending it to the current dataframe, but I get: 'Length mismatch: Expected axis has 3 elements, new values have 4 elements.'

这是我的尝试，我尝试创建一个仅包含“预测器”列的单独数据框，并尝试将其附加到当前数据框，但我得到：“长度不匹配：预期轴有 3 个元素，新值有 4 个元素。”

How do I append a single cell containing an array to my dataframe?

如何将包含数组的单个单元格附加到我的数据帧？

# create a list and dataframe to hold the names of predictors
dataframe=dataframe.drop(['price','Date'],axis=1)  
predictorsList = dataframe.columns.get_values().tolist()
predictorsList = np.array(predictorsList, dtype=object)

# Combine actual and forecasted lists to one dataframe
combinedResults = pd.DataFrame({'Actual': actual, 'Forecasted': forecasted})

predictorsDF = pd.DataFrame({'Predictors': [predictorsList]})

# Add Predictors to dataframe
#combinedResults.at[0, 'Predictors'] = predictorsList
pd.concat([combinedResults,predictorsDF], ignore_index=True, axis=1)

Answer 1

采纳答案by Tomas Farias

You could fill the rest of the cells in the desired column with NaN, but they will not "empty". To do that, use pd.mergeon both indexes:

您可以用填充所需列中的其余单元格NaN，但它们不会“为空”。为此，请pd.merge在两个索引上使用：

Setup

设置

import pandas as pd
import numpy as np

df = pd.DataFrame({
     'Actual': [18.442, 15.4233, 20.6217, 16.7, 18.185], 
     'Forecasted': [19.6377, 13.1665, 19.3992, 17.4557, 14.0053]
})

arr = np.zeros(3)
df_arr = pd.DataFrame({'Predictors': [arr]})

Merging df and df_arr

合并 df 和 df_arr

result = pd.merge(
    df,
    df_arr,
    how='left',
    left_index=True, # Merge on both indexes, since right only has 0...
    right_index=True # all the other rows will be NaN
)

Results

结果

>>> print(result)
    Actual  Forecasted       Predictors
0  18.4420     19.6377  [0.0, 0.0, 0.0]
1  15.4233     13.1665              NaN
2  20.6217     19.3992              NaN
3  16.7000     17.4557              NaN
4  18.1850     14.0053              NaN

>>> result.loc[0, 'Predictors']
array([0., 0., 0.])

>>> result.loc[1, 'Predictors'] # actually contains a NaN value
nan

Answer 2

回答by Markus Dutschke

You need to change the object type of the column (in your case Predictors) first

您需要先更改列的对象类型（在您的情况下Predictors）

import pandas as pd
import numpy as np


df=pd.DataFrame(np.arange(20).reshape(5,4), columns=list('abcd'))
df=df.astype(object)  # this line allows the signment of the array
df.iloc[1,2] = np.array([99,99,99])
print(df)

gives

给

    a   b             c   d
0   0   1             2   3
1   4   5  [99, 99, 99]   7
2   8   9            10  11
3  12  13            14  15
4  16  17            18  19

pandas 熊猫如何将数组放置在单个数据帧单元格中？

提问by amadzebra

采纳答案by Tomas Farias

回答by Markus Dutschke

相关推荐

最近更新

标签

pandas 熊猫如何将数组放置在单个数据帧单元格中？

提问by amadzebra

采纳答案by Tomas Farias

回答by Markus Dutschke

相关推荐

pandas 用循环附加到字典

pandas '标签 [0] 不在 [索引] 中'

pandas 如何选择数据框中的特定列？

使用 for 循环替换 pandas 列的每一行中的单元格值

相关推荐

最近更新

标签