使用 index_col 时 Pandas read_sql 列不起作用 - 而是返回所有列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35937579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:51:15  来源:igfitidea点击:

Pandas read_sql columns not working when using index_col - returns all columns instead

pythonpostgresqlpandas

提问by Henhuy

I'm using pandas.read_sql()command to get data from my postgresql database. The SQL query is created generically with many columns from which I only want to get specific columns using one column as index. Creating an example table test_tablelike this:

我正在使用pandas.read_sql()命令从我的 postgresql 数据库中获取数据。SQL 查询通常使用许多列创建,我只想使用一列作为索引从中获取特定列。创建这样的示例表test_table

column1 column2 column3
1       2       3
2       4       6
3       6       9

I tried to use the index_coland columnsparameter from pandas.read_sql()to get column1as index and column2as data (and neglecting column3!). But it always returns the whole table. Also when writing columns=['column1', 'column2']nothing changes...

我尝试使用index_colandcolumns参数 frompandas.read_sql()来获取column1索引和column2数据(而忽略了column3!)。但它总是返回整个表。也当写columns=['column1', 'column2']什么都没有改变......

I'm using python 2.7.6 with pandas 0.17.1 - Thanks for help!

我正在使用 python 2.7.6 和 pandas 0.17.1 - 感谢您的帮助!

Example Code:

示例代码:

import pandas
import psycopg2
import sqlalchemy


def connect():
    connString = (
        "dbname=test_db "
        "host=localhost "
        "port=5432 "
        "user=postgres "
        "password=password"
    )
    return psycopg2.connect(connString)

engine = sqlalchemy.create_engine(
            'postgresql://',
            creator=connect)
sql = (
    'SELECT '
    'column1, '
    'column2, '
    'column3 '
    'FROM test_table'
)
data = pandas.read_sql(
    sql,
    engine,
    index_col=['column1'],
    columns=['column2'])
print(data)

采纳答案by mdls

I think the argument columnsdid not work for you because you were using sql statement instead of providing it with your table name.

我认为该参数columns对您不起作用,因为您使用的是 sql 语句而不是为它提供表名。

As mentioned from pandas website:

如Pandas网站所述:

columns : list, default: None List of column names to select from sql table (only used when reading a table).

列:列表,默认值:无 要从 sql 表中选择的列名列表(仅在读取表时使用)。

Therefore, I think if you try:

因此,我认为如果您尝试:

pandas.read_sql('test_table', engine, index_col=['column1'], columns=['column2'])

columnsargument will actually work.

columns论点实际上会起作用。