pandas 从本地 Jupyter Notebook 访问 Google BigQuery 数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37284435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Access Google BigQuery Data from local Jupyter Notebooks
提问by dartdog
I have gotten a few Notebooks up and going on DataLab. I'd like, for a variety of reasons to access the same data from a local Jupyter notebook on my machine.
我已经安装了一些笔记本并继续使用 DataLab。出于各种原因,我想从我机器上的本地 Jupyter 笔记本访问相同的数据。
This questionsuggested a few approaches which so far I can't get working.
这个问题提出了一些到目前为止我无法工作的方法。
Specifically The Gcloud library:
特别是 Gcloud 库:
from gcloud import bigquery
client = bigquery.Client()
Give me a stack trace the last line of which:
给我一个堆栈跟踪,其中最后一行:
ContextualVersionConflict: (protobuf 2.6.1 (/usr/local/lib/python2.7/dist-packages), Requirement.parse('protobuf!=3.0.0.b2.post1,>=3.0.0b2'), set(['gcloud']))
The Pandas library seems promising:
Pandas 库似乎很有前途:
df=pd.io.gbq.read_gbq('SELECT CCS_Category_ICD9, Gender, Admit_Month FROM [xxxxxxxx-xxxxx:xxxx_100MB_newform.xxxxxx_100MB_newform]ORDER by CCS_Category_ICD9',
project_id='xxxxxxxx-xxxxx')
Also gives me a stack trace:
还给了我一个堆栈跟踪:
IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/httplib2-0.9.1.dist-info/METADATA'
Perhaps I have an auth issue on the Pandas approach, although my browser is currently Auth'd to the project? or am I missing a dependency?
也许我在 Pandas 方法上有一个身份验证问题,尽管我的浏览器目前已对项目进行身份验证?还是我缺少依赖项?
Any suggestions or guidance appreciated..
任何建议或指导表示赞赏..
What is the best way to access A BigQuery Datasource from within a local Jupyter notebook?
从本地 Jupyter 笔记本中访问 BigQuery 数据源的最佳方式是什么?
回答by Anthonios Partheniou
Based on the error from gbq.read()
, it appears that httplib2 may not be correctly installed. On the pandas installation page, there are a few optional dependencies which are required for Google BigQuery support (httplib2 is one of them).
To re-install/repair the installation try:
根据 中的错误gbq.read()
,httplib2 似乎没有正确安装。在 Pandas安装页面上,有一些可选的依赖项是 Google BigQuery 支持所必需的(httplib2 就是其中之一)。要重新安装/修复安装,请尝试:
pip install httplib2 --ignore-installed
Once the optional dependencies for Google BigQuery support are installed, the following code should work:
安装 Google BigQuery 支持的可选依赖项后,以下代码应该可以工作:
from pandas.io import gbq
df = gbq.read_gbq('SELECT * FROM MyDataset.MyTable', project_id='my-project-id')
回答by Graham Wheeler
If you were using Datalab-specific ways of accessing GCP, then you may want to try using https://github.com/googledatalab/datalabinstead. That will give you Datalab-compatible functionality within Jupyter Notebook.
如果您使用 Datalab 特定的方式访问 GCP,那么您可能想尝试使用https://github.com/googledatalab/datalab。这将为您提供 Jupyter Notebook 中与 Datalab 兼容的功能。
回答by TICH
i had the same issue but managed to solve it by installing the conda version of gbq, i already had installed anaconda distribution of python so i guess there maybe some link missing if you use pip
我遇到了同样的问题,但通过安装 gbq 的 conda 版本设法解决了它,我已经安装了 python 的 anaconda 发行版,所以我想如果您使用 pip,可能会丢失一些链接
conda install pandas-gbq --channel conda-forgethis command did the business
conda install pandas-gbq --channel conda-forge这个命令完成了业务
回答by hkanjih
I have one example here: https://github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb
我这里有一个例子:https: //github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb
But, basically you first need some packages installed:
但是,基本上你首先需要安装一些软件包:
!pip install google-cloud --user
!pip install --upgrade google-cloud-bigquery[pandas] --user
!pip install google-cloud-storage --user
If you already have a service account file just execute this (replacing JSON_SERVICE_ACCOUNT_FILE):
如果您已经有一个服务帐户文件,只需执行此操作(替换JSON_SERVICE_ACCOUNT_FILE):
import logging
import json
import os
from datetime import datetime
import pprint
from googleapiclient import discovery
from oauth2client.service_account import ServiceAccountCredentials
# Default scope to get access token
_SCOPE = 'https://www.googleapis.com/auth/cloud-platform'
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(JSON_SERVICE_ACCOUNT_FILE)
# Perform a query.
QUERY = (
'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
'WHERE state = "TX" '
'LIMIT 100')
query_job = client.query(QUERY) # API request
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.name)
But, if you have access to some GCP project, but don't know how to create a service account you can create it directly in your jupyter notebook:
但是,如果您有权访问某些 GCP 项目,但不知道如何创建服务帐户,则可以直接在您的 jupyter notebook 中创建它:
SERVICE_ACCOUNT='jupytersa'
JSON_SERVICE_ACCOUNT_FILE = 'sa1.json'
GCP_PROJECT_ID='<GCP_PROJECT_ID>'
import subprocess
import sys
import logging
logger = logging.Logger('catch_all')
def run_command(parameters):
try:
return subprocess.check_output(parameters)
except BaseException as e:
logger.error(e)
logger.error('ERROR: Looking in jupyter console for more information')
run_command([
'gcloud', 'iam', 'service-accounts',
'create', SERVICE_ACCOUNT,
'--display-name', "Service Account for BETA SCC API",
'--project', GCP_PROJECT_ID
])
IAM_ROLES = [
'roles/editor'
]
for role in IAM_ROLES:
run_command([
'gcloud', 'projects', 'add-iam-policy-binding',GCP_PROJECT_ID,
'--member', 'serviceAccount:{}@{}.iam.gserviceaccount.com'.format(SERVICE_ACCOUNT, GCP_PROJECT_ID),
'--quiet', '--role', role
])
run_command([
'gcloud', 'iam', 'service-accounts',
'keys', 'create', JSON_SERVICE_ACCOUNT_FILE ,
'--iam-account',
'{}@{}.iam.gserviceaccount.com'.format(SERVICE_ACCOUNT, GCP_PROJECT_ID)
])
The full example you can found here: https://github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb
您可以在此处找到完整示例:https: //github.com/kanjih-ciandt/docker-jupyter-gcloud/blob/master/ka.ipynb
To conclude, if you want to execute this notebook from Docker you can use this image: https://cloud.docker.com/u/hkanjih/repository/docker/hkanjih/docker-jupyter-gcloud
总而言之,如果你想从 Docker 执行这个 notebook,你可以使用这个镜像:https: //cloud.docker.com/u/hkanjih/repository/docker/hkanjih/docker-jupyter-gcloud