pandas 将带有值列表的字典转换为数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25292568/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:22:13  来源:igfitidea点击:

Converting a dictionary with lists for values into a dataframe

pythonlistdictionarypandasdataframe

提问by stoves

I spent a while looking through SO and seems I have a unique problem.

我花了一段时间查看 SO,似乎我有一个独特的问题。

I have a dictionary that looks like the following:

我有一本字典,如下所示:

dict={
    123: [2,4],
    234: [6,8],
    ...
}

I want to convert this dictionary that has lists for values into a 3 column data frame like the following:

我想将这个包含值列表的字典转换为 3 列数据框,如下所示:

time, value1, value2
123, 2, 4
234, 6, 8
...

I can run:

我可以跑:

pandas.DataFrame(dict)

but this generates the following:

但这会产生以下内容:

123, 234, ...
2, 6, ...
4, 8, ...

Probably a simple fix but I'm still picking up pandas

可能是一个简单的修复,但我仍然在捡Pandas

回答by Roger Fan

You can either preprocess the data as levi suggests, or you can transpose the data frame after creating it.

您可以按照 Levi 的建议对数据进行预处理,也可以在创建数据框后转置数据框。

testdict={
    123: [2,4],
    234: [6,8],
    456: [10, 12]
}
df = pd.DataFrame(testdict)
df = df.transpose()

print(df)
#      0  1
# 123  2  4
# 234  6  8

回答by Robert Yi

It may be of interest to some that Roger Fan's pandas.DataFrame(dict)method is actually pretty slow if you have a ton of indices. The faster way is to just preprocess the data into separate lists and then create a DataFrame out of these lists. (Perhaps this was explained in levi's answer, but it is gone now.)

有些人可能会感兴趣的pandas.DataFrame(dict)是,如果您有大量索引,Roger Fan 的方法实际上非常慢。更快的方法是将数据预处理到单独的列表中,然后从这些列表中创建一个 DataFrame。(也许这在 levi 的回答中有所解释,但现在已经消失了。)

For example, consider this dictionary, dict1, where each value is a list. Specifically, dict1[i] = [ i*10, i*100](for ease of checking the final dataframe).

例如,考虑这个字典,dict1,其中每个值都是一个列表。具体来说,dict1[i] = [ i*10, i*100](为了便于检查最终数据帧)。

keys = range(1000)
values = zip(np.arange(1000)*10, np.arange(1000)*100)
dict1 = dict(zip(keys, values))

It takes roughly 30 times as long with the pandas method. E.g.

使用 pandas 方法大约需要 30 倍的时间。例如

t = time.time()
test1 = pd.DataFrame(dict1).transpose()
print time.time() - t

0.118762016296

versus:

相对:

t = time.time()
keys = []
list1 = []
list2 = []
for k in dict1:
    keys.append(k)
    list1.append(dict1[k][0])
    list2.append(dict1[k][1])
test2 = pd.DataFrame({'element1': list1, 'element2': list2}, index=keys)
print time.time() - t

0.00310587882996