Python 如何将列表转换为熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28227612/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:57:02  来源:igfitidea点击:

how to convert a list into a pandas dataframe

pythonpandas

提问by Elizabeth Susan Joseph

I have the following code:

我有以下代码:

rows =[]
for dt in new_info:
    x =  dt['state']
    est = dt['estimates']

    col_R = [val['choice'] for val in est if val['party'] == 'Rep']
    col_D = [val['choice'] for val in est if val['party'] == 'Dem']

    incumb = [val['party'] for val in est if val['incumbent'] == True ]

    rows.append((x, col_R, col_D, incumb))

Now I want to convert my rows list into a pandas data frame. Structure of my rows list is shown below and my list has 32 entries.

现在我想将我的行列表转换为熊猫数据框。我的行列表的结构如下所示,我的列表有 32 个条目。

enter image description here

在此处输入图片说明

When I convert this into a pandas data frame, I get the entries in the data frame as a list. :

当我将其转换为 Pandas 数据框时,我将数据框中的条目作为列表获取。:

pd.DataFrame(rows, columns=["State", "R", "D", "incumbent"])  

enter image description here

在此处输入图片说明

But I want my data frame like this

但我想要这样的数据框

enter image description here

在此处输入图片说明

The new info variable looks like this enter image description here

新的信息变量看起来像这样 在此处输入图片说明

采纳答案by Aaron Hall

Since you mind the objects in the columns being lists, I would use a generator to remove the lists wrapping your items:

由于您介意列中的对象是列表,我将使用生成器来删除包装您的项目的列表:

import pandas as pd
import numpy as np
rows = [(u'KY', [u'McConnell'], [u'Grimes'], [u'Rep']),
        (u'AR', [u'Cotton'], [u'Pryor'], [u'Dem']),
        (u'MI', [u'Land'], [u'Peters'], [])]

def get(r, nth):
    '''helper function to retrieve item from nth list in row r'''
    return r[nth][0] if r[nth] else np.nan

def remove_list_items(list_of_records):
    for r in list_of_records:
        yield r[0], get(r, 1), get(r, 2), get(r, 3)

The generator works similarly to this function, but instead of materializing a list unnecessarily in memory as an intermediate step, it just passes each row that would be in the list to the consumer of the list of rows:

生成器的工作方式与此函数类似,但它不会将列表中不必要的列表作为中间步骤在内存中实现,它只是将列表中的每一行传递给行列表的使用者:

def remove_list_items(list_of_records):
    result = []
    for r in list_of_records:
        result.append((r[0], get(r, 1), get(r, 2), get(r, 3)))
    return result

And then compose your DataFrame passing your data through the generator, (or the list version, if you wish.)

然后组合您的 DataFrame,通过生成器传递您的数据(或列表版本,如果您愿意。)

>>> df = pd.DataFrame.from_records(
        remove_list_items(rows), 
        columns=["State", "R", "D", "incumbent"])
>>> df
  State          R       D incumbent
0    KY  McConnell  Grimes       Rep
1    AR     Cotton   Pryor       Dem
2    MI       Land  Peters       NaN

Or you could use a list comprehension or a generator expression (shown) to do essentially the same:

或者您可以使用列表推导式或生成器表达式(如图所示)来执行基本相同的操作:

>>> df = pd.DataFrame.from_records(
      ((r[0], get(r, 1), get(r, 2), get(r, 3)) for r in rows), 
      columns=["State", "R", "D", "incumbent"])

回答by aus_lacy

You can use some built in python list manipulation and do something like:

您可以使用一些内置的 Python 列表操作并执行以下操作:

df['col1'] = df['col1'].apply(lambda i: ''.join(i))

which will produce:

这将产生:

    col1 col2
0    a  [d]
1    b  [e]
2    c  [f]

Obviously col2hasn't been formatted in order to show contrast.

显然col2没有被格式化以显示对比度。

Edit

编辑

As requested by OP, if you want to implement an apply(lambda...)to all the columns then you can either explicitly set each column with a line that looks like the one above replacing 'col1'with each of the column names you wish to alter or you can just loop over the columns like this:

根据 OP 的要求,如果您想对apply(lambda...)所有列实施 an ,那么您可以显式地设置每一列,其中的一行看起来像上面的一行,替换'col1'为您希望更改的每个列名,或者您可以循环遍历像这样的列:

if you have a data frame of type

如果你有一个类型的数据框

x = [['a'],['b'],['c'],['d']]
y = [['e'],['f'],['g'],['h']]
z = [['i'],['j'],['k'],['l']]

df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z})

then you can loop over the columns

然后你可以遍历列

for col in df.columns:
    df[col] = df[col].apply(lambda i: ''.join(i))

which converts a data frame that starts like:

它转换一个像这样开头的数据帧:

   col1 col2 col3
0  [a]  [e]  [i]
1  [b]  [f]  [j]
2  [c]  [g]  [k]
3  [d]  [h]  [l]

and becomes

并变成

    col1 col2 col3
0    a    e    i
1    b    f    j
2    c    g    k
3    d    h    l