Pandas:在列中使用多索引将行附加到 DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47338203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:46:48  来源:igfitidea点击:

Pandas: append row to DataFrame with multiindex in columns

pythonpandasdataframedata-structures

提问by lhk

I have a DataFrame with a multiindex in the columns and would like to use dictionaries to append new rows.

我有一个在列中带有多索引的 DataFrame,并且想使用字典来附加新行。

Let's say that each row in the DataFrame is a city. The columns contains "distance" and "vehicle". And each cell would be the percentage of the population that chooses this vehicle for this distance.

假设 DataFrame 中的每一行都是一个城市。列包含“距离”和“车辆”。每个单元格将是在此距离内选择此车辆的人口百分比。

I'm constructing an index like this:

我正在构建这样的索引:

index_tuples=[]

for distance in ["near", "far"]:
    for vehicle in ["bike", "car"]:
        index_tuples.append([distance, vehicle])

index = pd.MultiIndex.from_tuples(index_tuples, names=["distance", "vehicle"])

Then I'm creating a dataframe:

然后我正在创建一个数据框:

dataframe = pd.DataFrame(index=["city"], columns = index)

The structure of the dataframe looks good. Although pandas has added Nans as default values ?

数据框的结构看起来不错。虽然Pandas已经添加了 Nans 作为默认值?

layout of the dataframe

数据框的布局

Now I would like to set up a dictionary for the new city and add it:

现在我想为新城市建立一个字典并添加它:

my_home_city = {"near":{"bike":1, "car":0},"far":{"bike":0, "car":1}}
dataframe["my_home_city"] = my_home_city

But this fails:

但这失败了:

ValueError: Length of values does not match length of index

ValueError:值的长度与索引的长度不匹配

Here is the complete error message(pastebin)

这是完整的错误消息(pastebin)

UPDATE:

更新:

Thank you for all the good answers. I'm afraid I've oversimplified the problem in my example. Actually my index is nested with 3 levels (and it could become more).

谢谢你所有的好答案。恐怕我在我的例子中过于简化了问题。实际上我的索引嵌套了 3 个级别(它可能会变得更多)。

So I've accepted the universal answer of converting my dictionary into a list of tuples. This might not be as clean as the other approaches but works for any multiindex setup.

所以我接受了将字典转换为元组列表的通用答案。这可能不像其他方法那么干净,但适用于任何多索引设置。

采纳答案by YOBEN_S

Multi index is a listof tuple, we just need to modify your dict,then we could directly assign the value

多指数是listtuple,我们只需要修改dict,那么我们就可以直接赋值

d = {(x,y):my_home_city[x][y] for x in my_home_city for y in my_home_city[x]}
df.loc['my_home_city',:]=d
df
Out[994]: 
distance     near       far     
vehicle      bike  car bike  car
city          NaN  NaN  NaN  NaN
my_home_city    1    0    0    1

More Info

更多信息

d
Out[995]: 
{('far', 'bike'): 0,
 ('far', 'car'): 1,
 ('near', 'bike'): 1,
 ('near', 'car'): 0}

df.columns.values
Out[996]: array([('near', 'bike'), ('near', 'car'), ('far', 'bike'), ('far', 'car')], dtype=object)

回答by Scott Boston

You can append to you dataframe like this:

您可以像这样附加到您的数据框:

my_home_city = {"near":{"bike":1, "car":0},"far":{"bike":0, "car":1}}
dataframe.append(pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city'))

Output:

输出:

distance     near       far     
vehicle      bike  car bike  car
city          NaN  NaN  NaN  NaN
my_home_city    1    0    0    1

The trick is to create the dataframe row with from_dictthen unstackto get structure of your original dataframe with multiindex columns then renameto get index and append.

诀窍是创建数据帧行,from_dict然后unstack使用多rename索引列获取原始数据帧的结构,然后获取索引和append.

Or if you don't want to create the empty dataframe first you can use this method to create the dataframe with the new data.

或者,如果您不想先创建空数据框,则可以使用此方法使用新数据创建数据框。

pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city').to_frame().T

Output:

输出:

              far     near    
             bike car bike car
my_home_city    0   1    1   0

Explained:

解释:

pd.DataFrame.from_dict(my_home_city)

      far  near
bike    0     1
car     1     0

Now, let's unstack to create multiindex and get to that new dataframe into the structure of the original dataframe.

现在,让我们解除堆栈以创建多索引并将该新数据帧放入原始数据帧的结构中。

pd.DataFrame.from_dict(my_home_city).unstack()

far   bike    0
      car     1
near  bike    1
      car     0
dtype: int64

We use rename to give that series a name which becomes the index label of that dataframe row when appended to the original dataframe.

我们使用 rename 为该系列命名,当附加到原始数据帧时,该名称将成为该数据帧行的索引标签。

far   bike    0
      car     1
near  bike    1
      car     0
Name: my_home_city, dtype: int64

Now if you converted that series to a frame and transposed it would look very much like a new row, however, there is no need to do this because, Pandas does intrinsic data alignment, so appending this series to the dataframe will auto-align and add the new dataframe record.

现在,如果您将该系列转换为一个框架并转置它看起来非常像一个新行,但是,没有必要这样做,因为 Pandas 会进行内在数据对齐,因此将此系列附加到数据帧将自动对齐并添加新的数据帧记录。

dataframe.append(pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city'))
distance     near       far     
vehicle      bike  car bike  car
city          NaN  NaN  NaN  NaN
my_home_city    1    0    0    1

回答by cs95

I don't think you even need to initialise an empty dataframe. With your d, I can get your desired output with unstackand a transpose:

我认为您甚至不需要初始化一个空的数据框。使用您的d,我可以获得您想要的输出unstack和转置:

pd.DataFrame(d).unstack().to_frame().T

   far     near    
  bike car bike car
0    0   1    1   0

回答by Alexander

Initialize your empty dataframe using MultiIndex.from_product.

使用 初始化您的空数据框MultiIndex.from_product

distances = ['near', 'far']
vehicles = ['bike', 'car']
df = pd.DataFrame([], columns=pd.MultiIndex.from_product([distances, vehicles]), 
                  index=pd.Index([], name='city'))

Your dictionary results in a square matrix (distance by vehicle), so unstack it (which will result in a Series), then convert it into a dataframe row by calling (to_frame) using the relevant city name and transposing the column into a row.

您的字典产生一个方阵(车辆距离),因此将其拆开(这将产生一个系列),然后通过to_frame使用相关城市名称调用 ( ) 并将该列转换为一行,将其转换为数据帧行。

>>> df.append(pd.DataFrame(my_home_city).unstack().to_frame('my_home_city').T)
              far     near    
             bike car bike car
city                          
my_home_city    0   1    1   0

回答by Yanni Papadakis

try this workaround

试试这个解决方法

  • append to dict
  • then convert to pandas data frame
  • at the very last step select desired columns to create multi-index with set_index()
  • 追加到字典
  • 然后转换为Pandas数据框
  • 在最后一步选择所需的列以使用 set_index() 创建多索引
d = dict()
for g in predictor_types:
    for col in predictor_types[g]:
        tot = len(ames) - ames[col].count()
        if tot:
            d.setdefault('type',[]).append(g)
            d.setdefault('predictor',[]).append(col)
            d.setdefault('missing',[]).append(tot)
pd.DataFrame(d).set_index(['type','predictor']).style.bar(color='DodgerBlue')