pandas 熊猫如何将数组放置在单个数据帧单元格中?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51218488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas how to place an array in a single dataframe cell?
提问by amadzebra
So I currently have a dataframe that looks like:
所以我目前有一个如下所示的数据框:


And I want to add a completely new column called "Predictors" with only one cell that contains an array.
我想添加一个名为“预测器”的全新列,其中只有一个包含数组的单元格。
So [0, 'Predictors'] should contain an array and everything below that cell in the same column should be empty.
所以 [0, 'Predictors'] 应该包含一个数组,并且同一列中该单元格下方的所有内容都应该是空的。
Here's my attempt, I tried to create a separate dataframe that just contained the "Predictors" column, and tried appending it to the current dataframe, but I get: 'Length mismatch: Expected axis has 3 elements, new values have 4 elements.'
这是我的尝试,我尝试创建一个仅包含“预测器”列的单独数据框,并尝试将其附加到当前数据框,但我得到:“长度不匹配:预期轴有 3 个元素,新值有 4 个元素。”
How do I append a single cell containing an array to my dataframe?
如何将包含数组的单个单元格附加到我的数据帧?
# create a list and dataframe to hold the names of predictors
dataframe=dataframe.drop(['price','Date'],axis=1)
predictorsList = dataframe.columns.get_values().tolist()
predictorsList = np.array(predictorsList, dtype=object)
# Combine actual and forecasted lists to one dataframe
combinedResults = pd.DataFrame({'Actual': actual, 'Forecasted': forecasted})
predictorsDF = pd.DataFrame({'Predictors': [predictorsList]})
# Add Predictors to dataframe
#combinedResults.at[0, 'Predictors'] = predictorsList
pd.concat([combinedResults,predictorsDF], ignore_index=True, axis=1)
采纳答案by Tomas Farias
You could fill the rest of the cells in the desired column with NaN, but they will not "empty". To do that, use pd.mergeon both indexes:
您可以用 填充所需列中的其余单元格NaN,但它们不会“为空”。为此,请pd.merge在两个索引上使用:
Setup
设置
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Actual': [18.442, 15.4233, 20.6217, 16.7, 18.185],
'Forecasted': [19.6377, 13.1665, 19.3992, 17.4557, 14.0053]
})
arr = np.zeros(3)
df_arr = pd.DataFrame({'Predictors': [arr]})
Merging df and df_arr
合并 df 和 df_arr
result = pd.merge(
df,
df_arr,
how='left',
left_index=True, # Merge on both indexes, since right only has 0...
right_index=True # all the other rows will be NaN
)
Results
结果
>>> print(result)
Actual Forecasted Predictors
0 18.4420 19.6377 [0.0, 0.0, 0.0]
1 15.4233 13.1665 NaN
2 20.6217 19.3992 NaN
3 16.7000 17.4557 NaN
4 18.1850 14.0053 NaN
>>> result.loc[0, 'Predictors']
array([0., 0., 0.])
>>> result.loc[1, 'Predictors'] # actually contains a NaN value
nan
回答by Markus Dutschke
You need to change the object type of the column (in your case Predictors) first
您需要先更改列的对象类型(在您的情况下Predictors)
import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(20).reshape(5,4), columns=list('abcd'))
df=df.astype(object) # this line allows the signment of the array
df.iloc[1,2] = np.array([99,99,99])
print(df)
gives
给
a b c d
0 0 1 2 3
1 4 5 [99, 99, 99] 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19

