从多个字典创建一个 Pandas DataFrame

Question

提问by tinproject

I'm new to pandas and that's my first question on stackoverflow, I'm trying to do some analytics with pandas.

我是 Pandas 的新手，这是我关于 stackoverflow 的第一个问题，我正在尝试对 Pandas 进行一些分析。

I have some text files with data records that I want to process. Each line of the file match to a record which fields are in a fixed place and have a length of a fixed number of characters. There are different kinds of records on the same file, all records share the first field that are two characters depending of the type of record. As an example:

我有一些包含要处理的数据记录的文本文件。文件的每一行都与一个记录相匹配，这些记录的字段位于固定位置并且具有固定字符数的长度。同一个文件中有不同种类的记录，所有记录共享第一个字段，根据记录类型的不同，该字段为两个字符。举个例子：

Some file:
01Jhon      Smith     555-1234                                        
03Cow            Bos primigenius taurus        00401                  
01Jannette  Jhonson           00100000000                             
...


field    start  length   
type         1       2   *common to all records, example: 01 = person, 03 = animal
name         3      10
surname     13      10
phone       23       8
credit      31      11
fill of spaces

I'm writing some code to convert one record to a dictionary:

我正在编写一些代码来将一条记录转换为字典：

person1 = {'type': 01, 'name': = 'Jhon', 'surname': = 'Smith', 'phone': '555-1234'}
person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00}
animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 }

If a field is empty (filled with spaces) there will not be in the dictionary).

如果一个字段为空（用空格填充），则字典中将没有）。

With all records of one kind I want to create a pandas DataFrame with the dicts keys as columns names, I've try with pandas.DataFrame.from_dict() without success.

对于一种类型的所有记录，我想创建一个以 dicts 键作为列名的 Pandas DataFrame，我尝试使用 pandas.DataFrame.from_dict() 没有成功。

And here comes my question: Is any way to do this with pandas so dict keys become column names? Are any other standard method to deal with this kind of files?

我的问题来了：有没有办法用Pandas来做到这一点，让字典键成为列名？是否有其他标准方法来处理此类文件？

Answer 1

回答by DSM

To make a DataFrame from a dictionary, you can pass a listof dictionaries:

要从字典制作 DataFrame，您可以传递字典列表：

>>> person1 = {'type': 01, 'name': 'Jhon', 'surname': 'Smith', 'phone': '555-1234'}
>>> person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00}
>>> animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 }
>>> pd.DataFrame([person1])
   name     phone surname  type
0  Jhon  555-1234   Smith     1
>>> pd.DataFrame([person1, person2])
    credit      name     phone  surname  type
0      NaN      Jhon  555-1234    Smith     1
1  1000000  Jannette       NaN  Jhonson     1
>>> pd.DataFrame.from_dict([person1, person2])
    credit      name     phone  surname  type
0      NaN      Jhon  555-1234    Smith     1
1  1000000  Jannette       NaN  Jhonson     1

For the more fundamental issue of two differently-formatted files intermixed, and assuming the files aren't so big that we can't read them and store them in memory, I'd use StringIOto make an object which is sort of like a file but which only has the lines we want, and then use read_fwf(fixed-width-file). For example:

对于混合两种不同格式的文件的更基本问题，并假设文件没有大到我们无法读取它们并将它们存储在内存中，我会StringIO用来制作一个有点像文件的对象但其中只有我们想要的行，然后使用read_fwf（固定宽度文件）。例如：

from StringIO import StringIO

def get_filelike_object(filename, line_prefix):
    s = StringIO()
    with open(filename, "r") as fp:
        for line in fp:
            if line.startswith(line_prefix):
                s.write(line)
    s.seek(0)
    return s

and then

进而

>>> type01 = get_filelike_object("animal.dat", "01")
>>> df = pd.read_fwf(type01, names="type name surname phone credit".split(), 
                     widths=[2, 10, 10, 8, 11], header=None)
>>> df
   type      name  surname     phone     credit
0     1      Jhon    Smith  555-1234        NaN
1     1  Jannette  Jhonson       NaN  100000000

should work. Of course you could also separate the files into different types before pandasever sees them, which might be easiest of all.

应该管用。当然，您也可以在pandas看到文件之前将文件分成不同的类型，这可能是最简单的。

从多个字典创建一个 Pandas DataFrame

提问by tinproject

回答by DSM

相关推荐

最近更新

标签

从多个字典创建一个 Pandas DataFrame

提问by tinproject

回答by DSM

相关推荐

pandas 熊猫分组两列并乘以另外两列

pandas 在熊猫中创建百分位桶

pandas 在熊猫中按时间分组的更快方法

填充 MultiIndex Pandas Dataframe 中的日期间隔

相关推荐

最近更新

标签