pandas 从 .npy 文件制作熊猫数据框

Question

提问by Arnold

I'm trying to make a pandas dataframe from a .npy file which, when read in using np.load, returns a numpy array containing a dictionary. My initial instinct was to extract the dictionary and then create a dataframe using pd.from_dict, but this fails every time because I can't seem to get the dictionary out of the array returned from np.load. It looks like it's just np.array([dictionary, dtype=object]), but I can't get the dictionary by indexing the array or anything like that. I've also tried using np.load('filename').item() but the result still isn't recognized by pandas as a dictionary.

我正在尝试从 .npy 文件创建一个 Pandas 数据帧，当使用 np.load 读入时，它返回一个包含字典的 numpy 数组。我最初的直觉是提取字典，然后使用 pd.from_dict 创建一个数据框，但每次都失败，因为我似乎无法从 np.load 返回的数组中获取字典。看起来它只是 np.array([dictionary, dtype=object])，但我无法通过索引数组或类似的东西来获取字典。我也试过使用 np.load('filename').item() 但结果仍然不被 Pandas 识别为字典。

Alternatively, I tried pd.read_pickle and that didn't work either.

或者，我尝试了 pd.read_pickle，但也没有用。

How can I get this .npy dictionary into my dataframe? Here's the code that keeps failing...

如何将这个 .npy 字典放入我的数据框中？这是不断失败的代码......

import pandas as pd
import numpy as np
import os

targetdir = '../test_dir/'

filenames = []
successful = []
unsuccessful = []
for dirs, subdirs, files in os.walk(targetdir):
    for name in files:
        filenames.append(name)
        path_to_use = os.path.join(dirs, name)
        if path_to_use.endswith('.npy'):
            try:
                file_dict = np.load(path_to_use).item()
                df = pd.from_dict(file_dict)
                #df = pd.read_pickle(path_to_use)
                successful.append(path_to_use)
            except:
                unsuccessful.append(path_to_use)
                continue

print str(len(successful)) + " files were loaded successfully!"
print "The following files were not loaded:"
for item in unsuccessful:
    print item + "\n"

print df

Answer 1

回答by Grainier

Let's assume once you load the .npy, the item (np.load(path_to_use).item()) looks similar to this;

让我们假设一旦您加载了.npy，项目 ( np.load(path_to_use).item()) 看起来与此类似；

{'user_c': 'id_003', 'user_a': 'id_001', 'user_b': 'id_002'}

So, if you need to come up with a DataFrame like below using above dictionary;

因此，如果您需要使用上面的字典提出如下所示的 DataFrame；

  user_name user_id
0    user_c  id_003
1    user_a  id_001
2    user_b  id_002

You can use;

您可以使用;

df = pd.DataFrame(list(x.item().iteritems()), columns=['user_name','user_id'])

If you have a list of dictionaries like below;

如果您有以下字典列表；

users = [{'u_name': 'user_a', 'u_id': 'id_001'}, {'u_name': 'user_b', 'u_id': 'id_002'}]

You can simply use

你可以简单地使用

df = pd.DataFrame(users)

To come up with a DataFrame similar to;

想出一个类似于的DataFrame；

     u_id  u_name
0  id_001  user_a
1  id_002  user_b

Seems like you have a dictionary similar to this;

好像你有一本类似的字典；

data = {
    'Center': [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
    'Vpeak': [1.1, 2.2],
    'ID': ['id_001', 'id_002']
}

In this case, you can simply use;

在这种情况下，您可以简单地使用；

df = pd.DataFrame(data)  # df = pd.DataFrame(file_dict.item()) in your case

To come up with a DataFrame similar to;

想出一个类似于的DataFrame；

    Center          ID      Vpeak
0   [0.1, 0.2, 0.3] id_001  1.1
1   [0.4, 0.5, 0.6] id_002  2.2

If you have ndarraywithin the dict, do some preprocessing similar to below; and use it to create the df;

如果你有ndarray字典，做一些类似于下面的预处理；并使用它来创建 df；

for key in data:
    if isinstance(data[key], np.ndarray):
        data[key] = data[key].tolist()

df = pd.DataFrame(data)

pandas 从 .npy 文件制作熊猫数据框

提问by Arnold

回答by Grainier

相关推荐

最近更新

标签

pandas 从 .npy 文件制作熊猫数据框

提问by Arnold

回答by Grainier

相关推荐

将目录中的所有 csv 文件导入为 pandas dfs 并将它们命名为 csv 文件名

如何根据来自多列的数据在 Pandas Python 中的一个图中绘制多条线？

pandas 用字典替换熊猫系列中的值

如何在 Pandas Python 中从最大到最小的 groupby 数据排序

相关推荐

最近更新

标签