pandas 从本地 Jupyter Notebook 访问 Google BigQuery 数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37284435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:14:57  来源:igfitidea点击:

Access Google BigQuery Data from local Jupyter Notebooks

pandasgoogle-bigquerygoogle-cloud-platformgoogle-cloud-datalab

提问by dartdog

I have gotten a few Notebooks up and going on DataLab. I'd like, for a variety of reasons to access the same data from a local Jupyter notebook on my machine.

我已经安装了一些笔记本并继续使用 DataLab。出于各种原因,我想从我机器上的本地 Jupyter 笔记本访问相同的数据。

This questionsuggested a few approaches which so far I can't get working.

这个问题提出了一些到目前为止我无法工作的方法。

Specifically The Gcloud library:

特别是 Gcloud 库:

from gcloud import bigquery
client = bigquery.Client()

Give me a stack trace the last line of which:

给我一个堆栈跟踪,其中最后一行:

ContextualVersionConflict: (protobuf 2.6.1 (/usr/local/lib/python2.7/dist-packages), Requirement.parse('protobuf!=3.0.0.b2.post1,>=3.0.0b2'), set(['gcloud']))

The Pandas library seems promising:

Pandas 库似乎很有前途:

df=pd.io.gbq.read_gbq('SELECT CCS_Category_ICD9, Gender, Admit_Month FROM [xxxxxxxx-xxxxx:xxxx_100MB_newform.xxxxxx_100MB_newform]ORDER by CCS_Category_ICD9',
                 project_id='xxxxxxxx-xxxxx')

Also gives me a stack trace:

还给了我一个堆栈跟踪:

IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/httplib2-0.9.1.dist-info/METADATA'

Perhaps I have an auth issue on the Pandas approach, although my browser is currently Auth'd to the project? or am I missing a dependency?

也许我在 Pandas 方法上有一个身份验证问题,尽管我的浏览器目前已对项目进行身份验证?还是我缺少依赖项?

Any suggestions or guidance appreciated..

任何建议或指导表示赞赏..

What is the best way to access A BigQuery Datasource from within a local Jupyter notebook?

从本地 Jupyter 笔记本中访问 BigQuery 数据源的最佳方式是什么?

回答by Anthonios Partheniou

Based on the error from gbq.read(), it appears that httplib2 may not be correctly installed. On the pandas installation page, there are a few optional dependencies which are required for Google BigQuery support (httplib2 is one of them). To re-install/repair the installation try:

根据 中的错误gbq.read(),httplib2 似乎没有正确安装。在 Pandas安装页面上,有一些可选的依赖项是 Google BigQuery 支持所必需的(httplib2 就是其中之一)。要重新安装/修复安装,请尝试:

pip install httplib2 --ignore-installed

Once the optional dependencies for Google BigQuery support are installed, the following code should work:

安装 Google BigQuery 支持的可选依赖项后,以下代码应该可以工作:

from pandas.io import gbq
df = gbq.read_gbq('SELECT * FROM MyDataset.MyTable', project_id='my-project-id')

回答by Graham Wheeler

If you were using Datalab-specific ways of accessing GCP, then you may want to try using https://github.com/googledatalab/datalabinstead. That will give you Datalab-compatible functionality within Jupyter Notebook.

如果您使用 Datalab 特定的方式访问 GCP,那么您可能想尝试使用https://github.com/googledatalab/datalab。这将为您提供 Jupyter Notebook 中与 Datalab 兼容的功能。

回答by TICH

i had the same issue but managed to solve it by installing the conda version of gbq, i already had installed anaconda distribution of python so i guess there maybe some link missing if you use pip

我遇到了同样的问题,但通过安装 gbq 的 conda 版本设法解决了它,我已经安装了 python 的 anaconda 发行版,所以我想如果您使用 pip,可能会丢失一些链接

conda install pandas-gbq --channel conda-forgethis command did the business

conda install pandas-gbq --channel conda-forge这个命令完成了业务

回答by hkanjih

I have one example here: https://github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb

我这里有一个例子:https: //github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb

But, basically you first need some packages installed:

但是,基本上你首先需要安装一些软件包:

!pip install google-cloud --user
!pip install --upgrade google-cloud-bigquery[pandas] --user
!pip install google-cloud-storage --user

If you already have a service account file just execute this (replacing JSON_SERVICE_ACCOUNT_FILE):

如果您已经有一个服务帐户文件,只需执行此操作(替换JSON_SERVICE_ACCOUNT_FILE):

import logging
import json
import os
from datetime import datetime
import pprint

from googleapiclient import discovery
from oauth2client.service_account import ServiceAccountCredentials

# Default scope to get access token
_SCOPE = 'https://www.googleapis.com/auth/cloud-platform'
from google.cloud import bigquery

client = bigquery.Client.from_service_account_json(JSON_SERVICE_ACCOUNT_FILE)
# Perform a query.
QUERY = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
query_job = client.query(QUERY)  # API request
rows = query_job.result()  # Waits for query to finish

for row in rows:
    print(row.name)

But, if you have access to some GCP project, but don't know how to create a service account you can create it directly in your jupyter notebook:

但是,如果您有权访问某些 GCP 项目,但不知道如何创建服务帐户,则可以直接在您的 jupyter notebook 中创建它:

SERVICE_ACCOUNT='jupytersa'
JSON_SERVICE_ACCOUNT_FILE = 'sa1.json'
GCP_PROJECT_ID='<GCP_PROJECT_ID>' 

import subprocess
import sys
import logging

logger = logging.Logger('catch_all')


def run_command(parameters):

    try:
        return subprocess.check_output(parameters)
    except BaseException as e: 
       logger.error(e) 
       logger.error('ERROR: Looking in jupyter console for more information')

run_command([
        'gcloud', 'iam', 'service-accounts',
        'create', SERVICE_ACCOUNT,
        '--display-name', "Service Account for BETA SCC API",
        '--project', GCP_PROJECT_ID
])


IAM_ROLES = [
    'roles/editor'
]

for role in IAM_ROLES:
    run_command([
        'gcloud', 'projects', 'add-iam-policy-binding',GCP_PROJECT_ID,
        '--member', 'serviceAccount:{}@{}.iam.gserviceaccount.com'.format(SERVICE_ACCOUNT, GCP_PROJECT_ID),
        '--quiet',  '--role', role
    ])


run_command([
        'gcloud', 'iam', 'service-accounts',
        'keys', 'create', JSON_SERVICE_ACCOUNT_FILE ,
        '--iam-account', 
        '{}@{}.iam.gserviceaccount.com'.format(SERVICE_ACCOUNT, GCP_PROJECT_ID)
])

The full example you can found here: https://github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb

您可以在此处找到完整示例:https: //github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb

To conclude, if you want to execute this notebook from Docker you can use this image: https://cloud.docker.com/u/hkanjih/repository/docker/hkanjih/docker-jupyter-gcloud

总而言之,如果你想从 Docker 执行这个 notebook,你可以使用这个镜像:https: //cloud.docker.com/u/hkanjih/repository/docker/hkanjih/docker-jupyter-gcloud