从多个字典创建一个 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17751626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Create a pandas DataFrame from multiple dicts
提问by tinproject
I'm new to pandas and that's my first question on stackoverflow, I'm trying to do some analytics with pandas.
我是 Pandas 的新手,这是我关于 stackoverflow 的第一个问题,我正在尝试对 Pandas 进行一些分析。
I have some text files with data records that I want to process. Each line of the file match to a record which fields are in a fixed place and have a length of a fixed number of characters. There are different kinds of records on the same file, all records share the first field that are two characters depending of the type of record. As an example:
我有一些包含要处理的数据记录的文本文件。文件的每一行都与一个记录相匹配,这些记录的字段位于固定位置并且具有固定字符数的长度。同一个文件中有不同种类的记录,所有记录共享第一个字段,根据记录类型的不同,该字段为两个字符。举个例子:
Some file:
01Jhon Smith 555-1234
03Cow Bos primigenius taurus 00401
01Jannette Jhonson 00100000000
...
field start length
type 1 2 *common to all records, example: 01 = person, 03 = animal
name 3 10
surname 13 10
phone 23 8
credit 31 11
fill of spaces
I'm writing some code to convert one record to a dictionary:
我正在编写一些代码来将一条记录转换为字典:
person1 = {'type': 01, 'name': = 'Jhon', 'surname': = 'Smith', 'phone': '555-1234'}
person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00}
animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 }
If a field is empty (filled with spaces) there will not be in the dictionary).
如果一个字段为空(用空格填充),则字典中将没有)。
With all records of one kind I want to create a pandas DataFrame with the dicts keys as columns names, I've try with pandas.DataFrame.from_dict() without success.
对于一种类型的所有记录,我想创建一个以 dicts 键作为列名的 Pandas DataFrame,我尝试使用 pandas.DataFrame.from_dict() 没有成功。
And here comes my question: Is any way to do this with pandas so dict keys become column names? Are any other standard method to deal with this kind of files?
我的问题来了:有没有办法用Pandas来做到这一点,让字典键成为列名?是否有其他标准方法来处理此类文件?
回答by DSM
To make a DataFrame from a dictionary, you can pass a listof dictionaries:
要从字典制作 DataFrame,您可以传递字典列表:
>>> person1 = {'type': 01, 'name': 'Jhon', 'surname': 'Smith', 'phone': '555-1234'}
>>> person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00}
>>> animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 }
>>> pd.DataFrame([person1])
name phone surname type
0 Jhon 555-1234 Smith 1
>>> pd.DataFrame([person1, person2])
credit name phone surname type
0 NaN Jhon 555-1234 Smith 1
1 1000000 Jannette NaN Jhonson 1
>>> pd.DataFrame.from_dict([person1, person2])
credit name phone surname type
0 NaN Jhon 555-1234 Smith 1
1 1000000 Jannette NaN Jhonson 1
For the more fundamental issue of two differently-formatted files intermixed, and assuming the files aren't so big that we can't read them and store them in memory, I'd use StringIOto make an object which is sort of like a file but which only has the lines we want, and then use read_fwf(fixed-width-file). For example:
对于混合两种不同格式的文件的更基本问题,并假设文件没有大到我们无法读取它们并将它们存储在内存中,我会StringIO用来制作一个有点像文件的对象但其中只有我们想要的行,然后使用read_fwf(固定宽度文件)。例如:
from StringIO import StringIO
def get_filelike_object(filename, line_prefix):
s = StringIO()
with open(filename, "r") as fp:
for line in fp:
if line.startswith(line_prefix):
s.write(line)
s.seek(0)
return s
and then
进而
>>> type01 = get_filelike_object("animal.dat", "01")
>>> df = pd.read_fwf(type01, names="type name surname phone credit".split(),
widths=[2, 10, 10, 8, 11], header=None)
>>> df
type name surname phone credit
0 1 Jhon Smith 555-1234 NaN
1 1 Jannette Jhonson NaN 100000000
should work. Of course you could also separate the files into different types before pandasever sees them, which might be easiest of all.
应该管用。当然,您也可以在pandas看到文件之前将文件分成不同的类型,这可能是最简单的。

