pandas 列表到数据框的字典

Question

提问by NewGuy

I have a dictionary with each key holding a list of float values. These lists are not of same size.

我有一个字典，每个键都包含一个浮点值列表。这些列表的大小不同。

I'd like to convert this dictionary to a pandas dataframe so that I can perform some analysis functions on the data easily such as (min, max, average, standard deviation, more).

我想将此字典转换为Pandas数据框，以便我可以轻松地对数据执行一些分析功能，例如（最小值、最大值、平均值、标准偏差等）。

My dictionary looks like this:

我的字典是这样的：

{
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

What is the best way to get this into a dataframe so that I can utilize basic functionslike sum, mean, describe, std?

将其放入数据帧以便我可以使用诸如, , , 之类的基本功能的最佳方法是什么？summeandescribestd

The examples I find (like the link above), all assume each of the keys have the same number of values in the list.

我找到的示例（如上面的链接）都假设每个键在列表中具有相同数量的值。

Answer 1

回答by Miriam Farber

d={
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

df=pd.DataFrame.from_dict(d,orient='index').transpose()

Then dfis

然后df是

    key3    key2    key1
0   1.00    72.5    10.00
1   5.20    NaN     100.10
2   71.20   NaN     0.98
3   9.00    NaN     1.20
4   10.11   NaN     NaN

Note that numpy has some built in functions that can do calculations ignoring NaNvalues, which may be relevant here. For example, if you want to find the mean of 'key1'column, you can do it as follows:

请注意，numpy 有一些内置函数可以忽略NaN值进行计算，这可能与此处相关。例如，如果你想找到'key1'列的平均值，你可以这样做：

import numpy as np
np.nanmean(df[['key1']])
28.07

Other useful functions include numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum.

其他有用的功能包括numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum.

EDIT: Note that the functions from your basic functionslink can also handle nanvalues. However, their estimators may be different from those of numpy. For example, they calculate the unbiased estimator of sample variance, while the numpy version calculates the "usual" estimator of sample variance.

编辑：请注意，基本函数链接中的函数也可以处理nan值。但是，他们的估计量可能与 numpy 的估计量不同。例如，他们计算样本方差的无偏估计量，而 numpy 版本计算样本方差的“通常”估计量。

Answer 2

回答by piRSquared

your_dict = {
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

pd.concat({k: pd.Series(v) for k, v in your_dict.items()})

key1  0     10.00
      1    100.10
      2      0.98
      3      1.20
key2  0     72.50
key3  0      1.00
      1      5.20
      2     71.20
      3      9.00
      4     10.11
      5     12.21
      6     65.00
      7      7.00
dtype: float64

Or with axis=1

或与 axis=1

your_dict = {
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

pd.concat({k: pd.Series(v) for k, v in your_dict.items()}, axis=1)

     key1  key2   key3
0   10.00  72.5   1.00
1  100.10   NaN   5.20
2    0.98   NaN  71.20
3    1.20   NaN   9.00
4     NaN   NaN  10.11
5     NaN   NaN  12.21
6     NaN   NaN  65.00
7     NaN   NaN   7.00

Answer 3

回答by John Zwinck

I suggest you just create a dict of Series, since your keys do not have the same number of values:

我建议你只创建一个系列的字典，因为你的键没有相同数量的值：

{ key: pd.Series(val) for key, val in x.items() }

You can then do Pandas operations on each column individually.

然后，您可以单独对每一列执行 Pandas 操作。

Once you have that, if you really want a DataFrame, you can:

一旦你有了它，如果你真的想要一个 DataFrame，你可以：

pd.DataFrame({ key: pd.Series(val) for key, val in x.items() })

     key1  key2   key3
0   10.00  72.5   1.00
1  100.10   NaN   5.20
2    0.98   NaN  71.20
3    1.20   NaN   9.00
4     NaN   NaN  10.11
5     NaN   NaN  12.21
6     NaN   NaN  65.00
7     NaN   NaN   7.00

Answer 4

回答by aerijman

You can:

你可以：

define the index as

将索引定义为

idx = counts.keys()

then concatenate series

然后连接系列

df = pd.concat([pd.Series(counts[i]) for i in idx], axis=1).T

lastly add the index

最后添加索引

df.index=idx

pandas 列表到数据框的字典

提问by NewGuy

回答by Miriam Farber

回答by piRSquared

回答by John Zwinck

回答by aerijman

相关推荐

最近更新

标签

pandas 列表到数据框的字典

提问by NewGuy

回答by Miriam Farber

回答by piRSquared

回答by John Zwinck

回答by aerijman

相关推荐

Pandas 数据框：截断字符串字段

pandas seaborn/matplotlib 中的散点图，点大小和颜色由连续数据框列给出

pandas 读取多个csv文件并将文件名添加为pandas中的新列

Pandas DataFrame 多列的并排箱线图

相关推荐

最近更新

标签