pandas 从 .npy 文件制作熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40201026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Making a pandas dataframe from a .npy file
提问by Arnold
I'm trying to make a pandas dataframe from a .npy file which, when read in using np.load, returns a numpy array containing a dictionary. My initial instinct was to extract the dictionary and then create a dataframe using pd.from_dict, but this fails every time because I can't seem to get the dictionary out of the array returned from np.load. It looks like it's just np.array([dictionary, dtype=object]), but I can't get the dictionary by indexing the array or anything like that. I've also tried using np.load('filename').item() but the result still isn't recognized by pandas as a dictionary.
我正在尝试从 .npy 文件创建一个 Pandas 数据帧,当使用 np.load 读入时,它返回一个包含字典的 numpy 数组。我最初的直觉是提取字典,然后使用 pd.from_dict 创建一个数据框,但每次都失败,因为我似乎无法从 np.load 返回的数组中获取字典。看起来它只是 np.array([dictionary, dtype=object]),但我无法通过索引数组或类似的东西来获取字典。我也试过使用 np.load('filename').item() 但结果仍然不被 Pandas 识别为字典。
Alternatively, I tried pd.read_pickle and that didn't work either.
或者,我尝试了 pd.read_pickle,但也没有用。
How can I get this .npy dictionary into my dataframe? Here's the code that keeps failing...
如何将这个 .npy 字典放入我的数据框中?这是不断失败的代码......
import pandas as pd
import numpy as np
import os
targetdir = '../test_dir/'
filenames = []
successful = []
unsuccessful = []
for dirs, subdirs, files in os.walk(targetdir):
for name in files:
filenames.append(name)
path_to_use = os.path.join(dirs, name)
if path_to_use.endswith('.npy'):
try:
file_dict = np.load(path_to_use).item()
df = pd.from_dict(file_dict)
#df = pd.read_pickle(path_to_use)
successful.append(path_to_use)
except:
unsuccessful.append(path_to_use)
continue
print str(len(successful)) + " files were loaded successfully!"
print "The following files were not loaded:"
for item in unsuccessful:
print item + "\n"
print df
回答by Grainier
Let's assume once you load the .npy
, the item (np.load(path_to_use).item()
) looks similar to this;
让我们假设一旦您加载了.npy
,项目 ( np.load(path_to_use).item()
) 看起来与此类似;
{'user_c': 'id_003', 'user_a': 'id_001', 'user_b': 'id_002'}
So, if you need to come up with a DataFrame like below using above dictionary;
因此,如果您需要使用上面的字典提出如下所示的 DataFrame;
user_name user_id
0 user_c id_003
1 user_a id_001
2 user_b id_002
You can use;
您可以使用;
df = pd.DataFrame(list(x.item().iteritems()), columns=['user_name','user_id'])
If you have a list of dictionaries like below;
如果您有以下字典列表;
users = [{'u_name': 'user_a', 'u_id': 'id_001'}, {'u_name': 'user_b', 'u_id': 'id_002'}]
You can simply use
你可以简单地使用
df = pd.DataFrame(users)
To come up with a DataFrame similar to;
想出一个类似于的DataFrame;
u_id u_name
0 id_001 user_a
1 id_002 user_b
Seems like you have a dictionary similar to this;
好像你有一本类似的字典;
data = {
'Center': [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
'Vpeak': [1.1, 2.2],
'ID': ['id_001', 'id_002']
}
In this case, you can simply use;
在这种情况下,您可以简单地使用;
df = pd.DataFrame(data) # df = pd.DataFrame(file_dict.item()) in your case
To come up with a DataFrame similar to;
想出一个类似于的DataFrame;
Center ID Vpeak
0 [0.1, 0.2, 0.3] id_001 1.1
1 [0.4, 0.5, 0.6] id_002 2.2
If you have ndarray
within the dict, do some preprocessing similar to below; and use it to create the df;
如果你有ndarray
字典,做一些类似于下面的预处理;并使用它来创建 df;
for key in data:
if isinstance(data[key], np.ndarray):
data[key] = data[key].tolist()
df = pd.DataFrame(data)