pandas 列表到数据框的字典

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42869544/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:13:13  来源:igfitidea点击:

Dictionary of lists to dataframe

pythonpandas

提问by NewGuy

I have a dictionary with each key holding a list of float values. These lists are not of same size.

我有一个字典,每个键都包含一个浮点值列表。这些列表的大小不同。

I'd like to convert this dictionary to a pandas dataframe so that I can perform some analysis functions on the data easily such as (min, max, average, standard deviation, more).

我想将此字典转换为Pandas数据框,以便我可以轻松地对数据执行一些分析功能,例如(最小值、最大值、平均值、标准偏差等)。

My dictionary looks like this:

我的字典是这样的:

{
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

What is the best way to get this into a dataframe so that I can utilize basic functionslike sum, mean, describe, std?

将其放入数据帧以便我可以使用诸如, , , 之类的基本功能的最佳方法是什么?summeandescribestd

The examples I find (like the link above), all assume each of the keys have the same number of values in the list.

我找到的示例(如上面的链接)都假设每个键在列表中具有相同数量的值。

回答by Miriam Farber

d={
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

df=pd.DataFrame.from_dict(d,orient='index').transpose()

Then dfis

然后df

    key3    key2    key1
0   1.00    72.5    10.00
1   5.20    NaN     100.10
2   71.20   NaN     0.98
3   9.00    NaN     1.20
4   10.11   NaN     NaN

Note that numpy has some built in functions that can do calculations ignoring NaNvalues, which may be relevant here. For example, if you want to find the mean of 'key1'column, you can do it as follows:

请注意,numpy 有一些内置函数可以忽略NaN值进行计算,这可能与此处相关。例如,如果你想找到'key1'列的平均值,你可以这样做:

import numpy as np
np.nanmean(df[['key1']])
28.07

Other useful functions include numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum.

其他有用的功能包括numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum.

EDIT: Note that the functions from your basic functionslink can also handle nanvalues. However, their estimators may be different from those of numpy. For example, they calculate the unbiased estimator of sample variance, while the numpy version calculates the "usual" estimator of sample variance.

编辑:请注意,基本函数链接中的函数也可以处理nan值。但是,他们的估计量可能与 numpy 的估计量不同。例如,他们计算样本方差无偏估计量,而 numpy 版本计算样本方差“通常”估计量

回答by piRSquared

your_dict = {
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

pd.concat({k: pd.Series(v) for k, v in your_dict.items()})

key1  0     10.00
      1    100.10
      2      0.98
      3      1.20
key2  0     72.50
key3  0      1.00
      1      5.20
      2     71.20
      3      9.00
      4     10.11
      5     12.21
      6     65.00
      7      7.00
dtype: float64

Or with axis=1

或与 axis=1

your_dict = {
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

pd.concat({k: pd.Series(v) for k, v in your_dict.items()}, axis=1)

     key1  key2   key3
0   10.00  72.5   1.00
1  100.10   NaN   5.20
2    0.98   NaN  71.20
3    1.20   NaN   9.00
4     NaN   NaN  10.11
5     NaN   NaN  12.21
6     NaN   NaN  65.00
7     NaN   NaN   7.00

回答by John Zwinck

I suggest you just create a dict of Series, since your keys do not have the same number of values:

我建议你只创建一个系列的字典,因为你的键没有相同数量的值:

{ key: pd.Series(val) for key, val in x.items() }

You can then do Pandas operations on each column individually.

然后,您可以单独对每一列执行 Pandas 操作。

Once you have that, if you really want a DataFrame, you can:

一旦你有了它,如果你真的想要一个 DataFrame,你可以:

pd.DataFrame({ key: pd.Series(val) for key, val in x.items() })

     key1  key2   key3
0   10.00  72.5   1.00
1  100.10   NaN   5.20
2    0.98   NaN  71.20
3    1.20   NaN   9.00
4     NaN   NaN  10.11
5     NaN   NaN  12.21
6     NaN   NaN  65.00
7     NaN   NaN   7.00

回答by aerijman

You can:

你可以:

define the index as

将索引定义为

idx = counts.keys()

then concatenate series

然后连接系列

df = pd.concat([pd.Series(counts[i]) for i in idx], axis=1).T

lastly add the index

最后添加索引

df.index=idx