pandas 列表到数据框的字典
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42869544/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Dictionary of lists to dataframe
提问by NewGuy
I have a dictionary with each key holding a list of float values. These lists are not of same size.
我有一个字典,每个键都包含一个浮点值列表。这些列表的大小不同。
I'd like to convert this dictionary to a pandas dataframe so that I can perform some analysis functions on the data easily such as (min, max, average, standard deviation, more).
我想将此字典转换为Pandas数据框,以便我可以轻松地对数据执行一些分析功能,例如(最小值、最大值、平均值、标准偏差等)。
My dictionary looks like this:
我的字典是这样的:
{
'key1': [10, 100.1, 0.98, 1.2],
'key2': [72.5],
'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}
What is the best way to get this into a dataframe so that I can utilize basic functionslike sum
, mean
, describe
, std
?
将其放入数据帧以便我可以使用诸如, , , 之类的基本功能的最佳方法是什么?sum
mean
describe
std
The examples I find (like the link above), all assume each of the keys have the same number of values in the list.
我找到的示例(如上面的链接)都假设每个键在列表中具有相同数量的值。
回答by Miriam Farber
d={
'key1': [10, 100.1, 0.98, 1.2],
'key2': [72.5],
'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}
df=pd.DataFrame.from_dict(d,orient='index').transpose()
Then df
is
然后df
是
key3 key2 key1
0 1.00 72.5 10.00
1 5.20 NaN 100.10
2 71.20 NaN 0.98
3 9.00 NaN 1.20
4 10.11 NaN NaN
Note that numpy has some built in functions that can do calculations ignoring NaN
values, which may be relevant here. For example, if you want to find the mean of 'key1'
column, you can do it as follows:
请注意,numpy 有一些内置函数可以忽略NaN
值进行计算,这可能与此处相关。例如,如果你想找到'key1'
列的平均值,你可以这样做:
import numpy as np
np.nanmean(df[['key1']])
28.07
Other useful functions include numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum
.
其他有用的功能包括numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum
.
EDIT: Note that the functions from your basic functionslink can also handle nan
values. However, their estimators may be different from those of numpy. For example, they calculate the unbiased estimator of sample variance, while the numpy version calculates the "usual" estimator of sample variance.
编辑:请注意,基本函数链接中的函数也可以处理nan
值。但是,他们的估计量可能与 numpy 的估计量不同。例如,他们计算样本方差的无偏估计量,而 numpy 版本计算样本方差的“通常”估计量。
回答by piRSquared
your_dict = {
'key1': [10, 100.1, 0.98, 1.2],
'key2': [72.5],
'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}
pd.concat({k: pd.Series(v) for k, v in your_dict.items()})
key1 0 10.00
1 100.10
2 0.98
3 1.20
key2 0 72.50
key3 0 1.00
1 5.20
2 71.20
3 9.00
4 10.11
5 12.21
6 65.00
7 7.00
dtype: float64
Or with axis=1
或与 axis=1
your_dict = {
'key1': [10, 100.1, 0.98, 1.2],
'key2': [72.5],
'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}
pd.concat({k: pd.Series(v) for k, v in your_dict.items()}, axis=1)
key1 key2 key3
0 10.00 72.5 1.00
1 100.10 NaN 5.20
2 0.98 NaN 71.20
3 1.20 NaN 9.00
4 NaN NaN 10.11
5 NaN NaN 12.21
6 NaN NaN 65.00
7 NaN NaN 7.00
回答by John Zwinck
I suggest you just create a dict of Series, since your keys do not have the same number of values:
我建议你只创建一个系列的字典,因为你的键没有相同数量的值:
{ key: pd.Series(val) for key, val in x.items() }
You can then do Pandas operations on each column individually.
然后,您可以单独对每一列执行 Pandas 操作。
Once you have that, if you really want a DataFrame, you can:
一旦你有了它,如果你真的想要一个 DataFrame,你可以:
pd.DataFrame({ key: pd.Series(val) for key, val in x.items() })
key1 key2 key3
0 10.00 72.5 1.00
1 100.10 NaN 5.20
2 0.98 NaN 71.20
3 1.20 NaN 9.00
4 NaN NaN 10.11
5 NaN NaN 12.21
6 NaN NaN 65.00
7 NaN NaN 7.00
回答by aerijman
You can:
你可以:
define the index as
将索引定义为
idx = counts.keys()
then concatenate series
然后连接系列
df = pd.concat([pd.Series(counts[i]) for i in idx], axis=1).T
lastly add the index
最后添加索引
df.index=idx