pandas 熊猫如何将数组放置在单个数据帧单元格中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51218488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:46:43  来源:igfitidea点击:

Pandas how to place an array in a single dataframe cell?

pythonpandasdataframestatisticsdata-science

提问by amadzebra

So I currently have a dataframe that looks like:

所以我目前有一个如下所示的数据框:

Current Dataframe

当前数据帧

And I want to add a completely new column called "Predictors" with only one cell that contains an array.

我想添加一个名为“预测器”的全新列,其中只有一个包含数组的单元格。

So [0, 'Predictors'] should contain an array and everything below that cell in the same column should be empty.

所以 [0, 'Predictors'] 应该包含一个数组,并且同一列中该单元格下方的所有内容都应该是空的。

Here's my attempt, I tried to create a separate dataframe that just contained the "Predictors" column, and tried appending it to the current dataframe, but I get: 'Length mismatch: Expected axis has 3 elements, new values have 4 elements.'

这是我的尝试,我尝试创建一个仅包含“预测器”列的单独数据框,并尝试将其附加到当前数据框,但我得到:“长度不匹配:预期轴有 3 个元素,新值有 4 个元素。”

How do I append a single cell containing an array to my dataframe?

如何将包含数组的单个单元格附加到我的数据帧?

# create a list and dataframe to hold the names of predictors
dataframe=dataframe.drop(['price','Date'],axis=1)  
predictorsList = dataframe.columns.get_values().tolist()
predictorsList = np.array(predictorsList, dtype=object)

# Combine actual and forecasted lists to one dataframe
combinedResults = pd.DataFrame({'Actual': actual, 'Forecasted': forecasted})

predictorsDF = pd.DataFrame({'Predictors': [predictorsList]})

# Add Predictors to dataframe
#combinedResults.at[0, 'Predictors'] = predictorsList
pd.concat([combinedResults,predictorsDF], ignore_index=True, axis=1)

采纳答案by Tomas Farias

You could fill the rest of the cells in the desired column with NaN, but they will not "empty". To do that, use pd.mergeon both indexes:

您可以用 填充所需列中的其余单元格NaN,但它们不会“为空”。为此,请pd.merge在两个索引上使用:

Setup

设置

import pandas as pd
import numpy as np

df = pd.DataFrame({
     'Actual': [18.442, 15.4233, 20.6217, 16.7, 18.185], 
     'Forecasted': [19.6377, 13.1665, 19.3992, 17.4557, 14.0053]
})

arr = np.zeros(3)
df_arr = pd.DataFrame({'Predictors': [arr]})

Merging df and df_arr

合并 df 和 df_arr

result = pd.merge(
    df,
    df_arr,
    how='left',
    left_index=True, # Merge on both indexes, since right only has 0...
    right_index=True # all the other rows will be NaN
)

Results

结果

>>> print(result)
    Actual  Forecasted       Predictors
0  18.4420     19.6377  [0.0, 0.0, 0.0]
1  15.4233     13.1665              NaN
2  20.6217     19.3992              NaN
3  16.7000     17.4557              NaN
4  18.1850     14.0053              NaN

>>> result.loc[0, 'Predictors']
array([0., 0., 0.])

>>> result.loc[1, 'Predictors'] # actually contains a NaN value
nan 

回答by Markus Dutschke

You need to change the object type of the column (in your case Predictors) first

您需要先更改列的对象类型(在您的情况下Predictors

import pandas as pd
import numpy as np


df=pd.DataFrame(np.arange(20).reshape(5,4), columns=list('abcd'))
df=df.astype(object)  # this line allows the signment of the array
df.iloc[1,2] = np.array([99,99,99])
print(df)

gives

    a   b             c   d
0   0   1             2   3
1   4   5  [99, 99, 99]   7
2   8   9            10  11
3  12  13            14  15
4  16  17            18  19