pandas 熊猫如何将数组放置在单个数据帧单元格中?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51218488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas how to place an array in a single dataframe cell?
提问by amadzebra
So I currently have a dataframe that looks like:
所以我目前有一个如下所示的数据框:
And I want to add a completely new column called "Predictors" with only one cell that contains an array.
我想添加一个名为“预测器”的全新列,其中只有一个包含数组的单元格。
So [0, 'Predictors'] should contain an array and everything below that cell in the same column should be empty.
所以 [0, 'Predictors'] 应该包含一个数组,并且同一列中该单元格下方的所有内容都应该是空的。
Here's my attempt, I tried to create a separate dataframe that just contained the "Predictors" column, and tried appending it to the current dataframe, but I get: 'Length mismatch: Expected axis has 3 elements, new values have 4 elements.'
这是我的尝试,我尝试创建一个仅包含“预测器”列的单独数据框,并尝试将其附加到当前数据框,但我得到:“长度不匹配:预期轴有 3 个元素,新值有 4 个元素。”
How do I append a single cell containing an array to my dataframe?
如何将包含数组的单个单元格附加到我的数据帧?
# create a list and dataframe to hold the names of predictors
dataframe=dataframe.drop(['price','Date'],axis=1)
predictorsList = dataframe.columns.get_values().tolist()
predictorsList = np.array(predictorsList, dtype=object)
# Combine actual and forecasted lists to one dataframe
combinedResults = pd.DataFrame({'Actual': actual, 'Forecasted': forecasted})
predictorsDF = pd.DataFrame({'Predictors': [predictorsList]})
# Add Predictors to dataframe
#combinedResults.at[0, 'Predictors'] = predictorsList
pd.concat([combinedResults,predictorsDF], ignore_index=True, axis=1)
采纳答案by Tomas Farias
You could fill the rest of the cells in the desired column with NaN
, but they will not "empty". To do that, use pd.merge
on both indexes:
您可以用 填充所需列中的其余单元格NaN
,但它们不会“为空”。为此,请pd.merge
在两个索引上使用:
Setup
设置
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Actual': [18.442, 15.4233, 20.6217, 16.7, 18.185],
'Forecasted': [19.6377, 13.1665, 19.3992, 17.4557, 14.0053]
})
arr = np.zeros(3)
df_arr = pd.DataFrame({'Predictors': [arr]})
Merging df and df_arr
合并 df 和 df_arr
result = pd.merge(
df,
df_arr,
how='left',
left_index=True, # Merge on both indexes, since right only has 0...
right_index=True # all the other rows will be NaN
)
Results
结果
>>> print(result)
Actual Forecasted Predictors
0 18.4420 19.6377 [0.0, 0.0, 0.0]
1 15.4233 13.1665 NaN
2 20.6217 19.3992 NaN
3 16.7000 17.4557 NaN
4 18.1850 14.0053 NaN
>>> result.loc[0, 'Predictors']
array([0., 0., 0.])
>>> result.loc[1, 'Predictors'] # actually contains a NaN value
nan
回答by Markus Dutschke
You need to change the object type of the column (in your case Predictors
) first
您需要先更改列的对象类型(在您的情况下Predictors
)
import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(20).reshape(5,4), columns=list('abcd'))
df=df.astype(object) # this line allows the signment of the array
df.iloc[1,2] = np.array([99,99,99])
print(df)
gives
给
a b c d
0 0 1 2 3
1 4 5 [99, 99, 99] 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19