将 psycopg2 DictRow 查询转换为 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35604186/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:45:00  来源:igfitidea点击:

Convert psycopg2 DictRow query to Pandas dataframe

pythonpandaspsycopg2

提问by n1000

I would like to convert a psycopg2 DictRowquery to a pandas dataframe, but pandas keeps complaining:

我想将psycopg2DictRow查询转换为Pandas数据框,但Pandas一直在抱怨:

curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
print pd.DataFrame(list(data))

However, I always get an error although I specifically passed a list???

但是,尽管我特意通过了一个list???

<type 'list'>
TypeError: Expected list, got DictRow

The result is the same if I do pd.DataFrame(data)Could someone please help me make this work?

如果我这样做,结果是一样的,pd.DataFrame(data)有人可以帮我完成这项工作吗?

It would also be nice if the column names of the dataframe worked (i.e. extract DictRowand pass them to the dataframe).

如果数据框的列名有效(即提取DictRow并将它们传递给数据框),那也会很好。

Update:
Since I need to process the data, I would like to use the data from the psycopg2 query as is and not the pandas approach, e.g. read_sql_query.

更新:
由于我需要处理数据,我想按原样使用来自 psycopg2 查询的数据,而不是Pandas 方法,例如read_sql_query

回答by n1000

Hmm, I eventually found this hacky solution:

嗯,我最终找到了这个 hacky 解决方案:

print pd.DataFrame([i.copy() for i in data])

The copy()function of the DictRowclass will return an actual dictionary. With the list comprehension I create a list of (identical) dictionaries, that Pandas will happily accept.

该类的copy()函数DictRow将返回一个实际的字典。通过列表理解,我创建了一个(相同的)字典列表,Pandas 会很乐意接受。

I am still puzzled why list(data)produced a TypeError. Maybe someone can still enlighten me.

我仍然不解为什么会list(data)产生一个TypeError. 也许有人仍然可以启发我。

回答by amball

UPDATE: pandas.read_sql_query()is a more elegant way to read a SQL query into a dataframe, without the need for psycopg2. See the pandas docs.

更新:pandas.read_sql_query()是一种将 SQL 查询读入数据帧的更优雅的方式,无需psycopg2. 请参阅Pandas文档

I've been having the same issue. The easiest way I found was to convert the DictRow to a numpy array.

我一直有同样的问题。我发现的最简单的方法是将 DictRow 转换为一个 numpy 数组。

import numpy as np
curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
print pd.DataFrame(np.array(data))

If you want to get the column names, you can access them as the keys for each row of the DictRow. However, converting to a numpy array doesn't preserve the order. So one (inelegant) way is as follows:

如果要获取列名,可以将它们作为DictRow. 但是,转换为 numpy 数组不会保留顺序。所以一种(不优雅的)方式如下:

curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
colNames = data[0].keys()
print pd.DataFrame([[row[col] for col in colNames] for row in data], columns=colNames)

回答by Jan Sila

You indeed need to parse out the elements first. You might be confused because the whole result is a list of DictRowelements, but element is not a list. Hence in pandas 0.22.0even reading from_recordswont not work straight away.

您确实需要先解析出元素。您可能会感到困惑,因为整个结果是一个DictRow元素列表,但 element 不是一个列表。因此,pandas 0.22.0即使阅读也from_records不会立即起作用。

This works fine with native types:

这适用于本机类型:

inp = [{'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}]
>>> pd.DataFrame(inp)
     a    b
0  1.0  NaN
1  NaN  2.0
2  1.0  NaN
3  NaN  2.0
4  1.0  NaN
5  NaN  2.0

But printing the results of Psycopg2 query is probably the source of confusion (own data):

但是打印 Psycopg2 查询的结果可能是混淆的来源(自己的数据):

[[157, 158, 83, 1], [157, 159, 47, 1], [158, 157, 53, 1], [158, 159, 38, 1], [159, 157, 76, 1], [159, 158, 24, 1]] <class 'list'>but in fact the first element [157, 158, 83, 1] <class 'psycopg2.extras.DictRow'>

[[157, 158, 83, 1], [157, 159, 47, 1], [158, 157, 53, 1], [158, 159, 38, 1], [159, 157, 76, 1], [159, 158, 24, 1]] <class 'list'>但实际上第一个元素 [157, 158, 83, 1] <class 'psycopg2.extras.DictRow'>