使用 Python 从 Google Docs 下载电子表格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3287651/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:26:18  来源:igfitidea点击:

Download a spreadsheet from Google Docs using Python

pythongoogle-docsgoogle-docs-apigdata-python-client

提问by a paid nerd

Can you produce a Python example of how to download a Google Docs spreadsheet given its key and worksheet ID (gid)? I can't.

您能否制作一个 Python 示例,说明如何根据密钥和工作表 ID ( gid)下载 Google Docs 电子表格?我不能。

I've scoured versions 1, 2 and 3 of the API. I'm having no luck, I can't figure out their compilcated ATOM-like feeds API, the gdata.docs.service.DocsService._DownloadFileprivate method says that I'm unauthorized, and I don't want to write an entire Google Login authentication system myself. I'm about to stab myself in the face due to frustration.

我已经搜索了 API 的第 1、2 和 3 版。我没有运气,我无法弄清楚他们编译的类似 ATOM 的提要 API,gdata.docs.service.DocsService._DownloadFile私有方法说我未经授权,而且我不想自己编写整个 Google 登录身份验证系统。由于沮丧,我正要刺伤自己的脸。

I have a few spreadsheets and I want to access them like so:

我有一些电子表格,我想像这样访问它们:

username = '[email protected]'
password = getpass.getpass()

def get_spreadsheet(key, gid=0):
    ... (help!) ...

for row in get_spreadsheet('5a3c7f7dcee4b4f'):
    cell1, cell2, cell3 = row
    ...

Please save my face.

请保住我的脸。



Update 1:I've tried the following, but no combination of Download()or Export()seems to work. (Docs for DocsServicehere)

更新 1:我尝试了以下方法,但没有组合Download()Export()似乎有效。(DocsService此处的文档)

import gdata.docs.service
import getpass
import os
import tempfile
import csv

def get_csv(file_path):
  return csv.reader(file(file_path).readlines())

def get_spreadsheet(key, gid=0):
  gd_client = gdata.docs.service.DocsService()
  gd_client.email = '[email protected]'
  gd_client.password = getpass.getpass()
  gd_client.ssl = False
  gd_client.source = "My Fancy Spreadsheet Downloader"
  gd_client.ProgrammaticLogin()

  file_path = tempfile.mktemp(suffix='.csv')
  uri = 'http://docs.google.com/feeds/documents/private/full/%s' % key
  try:
    entry = gd_client.GetDocumentListEntry(uri)

    # XXXX - The following dies with RequestError "Unauthorized"
    gd_client.Download(entry, file_path)

    return get_csv(file_path)
  finally:
    try:
      os.remove(file_path)
    except OSError:
      pass

采纳答案by Cameron Spickert

In case anyone comes across this looking for a quick fix, here's another (currently) working solutionthat doesn't rely on the gdata client library:

如果有人在寻找快速解决方案时遇到此问题,这是另一个(当前)不依赖于 gdata 客户端库的工作解决方案

#!/usr/bin/python

import re, urllib, urllib2

class Spreadsheet(object):
    def __init__(self, key):
        super(Spreadsheet, self).__init__()
        self.key = key

class Client(object):
    def __init__(self, email, password):
        super(Client, self).__init__()
        self.email = email
        self.password = password

    def _get_auth_token(self, email, password, source, service):
        url = "https://www.google.com/accounts/ClientLogin"
        params = {
            "Email": email, "Passwd": password,
            "service": service,
            "accountType": "HOSTED_OR_GOOGLE",
            "source": source
        }
        req = urllib2.Request(url, urllib.urlencode(params))
        return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

    def get_auth_token(self):
        source = type(self).__name__
        return self._get_auth_token(self.email, self.password, source, service="wise")

    def download(self, spreadsheet, gid=0, format="csv"):
        url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
        headers = {
            "Authorization": "GoogleLogin auth=" + self.get_auth_token(),
            "GData-Version": "3.0"
        }
        req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
        return urllib2.urlopen(req)

if __name__ == "__main__":
    import getpass
    import csv

    email = "" # (your email here)
    password = getpass.getpass()
    spreadsheet_id = "" # (spreadsheet id here)

    # Create client and spreadsheet objects
    gs = Client(email, password)
    ss = Spreadsheet(spreadsheet_id)

    # Request a file-like object containing the spreadsheet's contents
    csv_file = gs.download(ss)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)

回答by nearlymonolith

This isn't a complete answer, but Andreas Kahlerwrote up an interesting CMS solution using Google Docs + Google App Engline + Python. Not having any experience in the area, I cannot see exactly what portion of the code may be of use to you, but check it out. I know it interfaces with a Google Docs account and plays with files, so I have a feeling you'll recognize what's going on. It should at least point you in the right direction.

这不是一个完整的答案,但Andreas Kahler使用 Google Docs + Google App Engline + Python 编写了一个有趣的 CMS 解决方案。在该领域没有任何经验,我无法确切地看到代码的哪一部分可能对您有用,但请检查一下。我知道它与 Google Docs 帐户交互并处理文件,所以我有一种感觉,你会知道发生了什么。它至少应该为您指明正确的方向。

Google AppEngine + Google Docs + Some Python = Simple CMS

Google AppEngine + Google Docs + Some Python = Simple CMS

回答by tcarobruce

You might try using the AuthSub method described in the Exporting Spreadsheetssection of the documentation.

您可以尝试使用文档的导出电子表格部分中描述的 AuthSub 方法。

Get a separate login token for the spreadsheets service and substitue that for the export. Adding this to the get_spreadsheetcode worked for me:

为电子表格服务获取单独的登录令牌并将其替换为导出。将此添加到get_spreadsheet代码中对我有用:

import gdata.spreadsheet.service

def get_spreadsheet(key, gid=0):
    # ...
    spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
    spreadsheets_client.email = gd_client.email
    spreadsheets_client.password = gd_client.password
    spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
    spreadsheets_client.ProgrammaticLogin()

    # ...
    entry = gd_client.GetDocumentListEntry(uri)
    docs_auth_token = gd_client.GetClientLoginToken()
    gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
    gd_client.Export(entry, file_path)
    gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token

Notice I also used Export, as Downloadseems to give only PDF files.

请注意,我还使用了Export,因为Download似乎只提供 PDF 文件。

回答by KPax

This no longer works as of gdata 2.0.1.4:

这不再适用于 gdata 2.0.1.4:

gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())

Instead, you have to do:

相反,您必须执行以下操作:

gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))

回答by grin

The following code works in my case (Ubuntu 10.4, python 2.6.5 gdata 2.0.14)

以下代码适用于我的情况(Ubuntu 10.4,python 2.6.5 gdata 2.0.14)

import gdata.docs.service
import gdata.spreadsheet.service
gd_client = gdata.docs.service.DocsService()
gd_client.ClientLogin(email,password)
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin(email,password)
#...
file_path = file_path.strip()+".xls"
docs_token = gd_client.auth_token
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)  
gd_client.auth_token = docs_token

回答by aculich

The https://github.com/burnash/gspreadlibrary is a newer, simpler way to interact with Google Spreadsheets, rather than the old answers to this that suggest the gdatalibrary which is not only too low-level, but is also overly-complicated.

https://github.com/burnash/gspread图书馆与谷歌电子表格进行交互,而不是旧的答案,这样暗示的一个较新的,更简单的方法gdata这不仅是太低级库,但也overly-复杂的。

You will also need to create and download (in JSON format) a Service Account key: https://console.developers.google.com/apis/credentials/serviceaccountkey

您还需要创建和下载(以 JSON 格式)服务帐户密钥:https: //console.developers.google.com/apis/credentials/serviceaccountkey

Here's an example of how to use it:

以下是如何使用它的示例:

import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)

docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"

client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
    filename = docid + '-worksheet' + str(i) + '.csv'
    with open(filename, 'wb') as f:
        writer = csv.writer(f)
        writer.writerows(worksheet.get_all_values())

回答by DeltaG

Gspread is indeed a big improvement over GoogleCL and Gdata (both of which I've used and thankfully phased out in favor of Gspread). I think that this code is even quicker than the earlier answer to get the contents of the sheet:

Gspread 确实比 GoogleCL 和 Gdata(这两个我都使用过,幸好已被 Gspread 淘汰)有了很大的改进。我认为这段代码比之前获取工作表内容的答案还要快:

username = '[email protected]'
password = 'sdfsdfsadfsdw'
sheetname = "Sheety Sheet"

client = gspread.login(username, password)
spreadsheet = client.open(sheetname)

worksheet = spreadsheet.sheet1
contents = []
for rows in worksheet.get_all_values():
    contents.append(rows)

回答by Michael Matthew Toomim

I've simplified @Cameron's answer even further, by removing the unnecessary object orientation. This makes the code smaller and easier to understand. I also edited the url, which might work better.

通过删除不必要的对象方向,我进一步简化了@Cameron 的答案。这使得代码更小,更容易理解。我还编辑了网址,这可能会更好。

#!/usr/bin/python
import re, urllib, urllib2

def get_auth_token(email, password):
    url = "https://www.google.com/accounts/ClientLogin"
    params = {
        "Email": email, "Passwd": password,
        "service": 'wise',
        "accountType": "HOSTED_OR_GOOGLE",
        "source": 'Client'
    }
    req = urllib2.Request(url, urllib.urlencode(params))
    return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

def download(spreadsheet, worksheet, email, password, format="csv"):
    url_format = 'https://docs.google.com/spreadsheets/d/%s/export?exportFormat=%s#gid=%s'

    headers = {
        "Authorization": "GoogleLogin auth=" + get_auth_token(email, password),
        "GData-Version": "3.0"
    }
    req = urllib2.Request(url_format % (spreadsheet, format, worksheet), headers=headers)
    return urllib2.urlopen(req)


if __name__ == "__main__":
    import getpass
    import csv

    spreadsheet_id = ""             # (spreadsheet id here)
    worksheet_id = ''               # (gid here)
    email = ""                      # (your email here)
    password = getpass.getpass()

    # Request a file-like object containing the spreadsheet's contents
    csv_file = download(spreadsheet_id, worksheet_id, email, password)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)

回答by wescpy

(Jul 2016)Rephrasing with current terminology: "How do I download a Google Sheet in CSV or XLSX format from Google Driveusing Python?". (Google Docs now only refers to the cloud-based word processor/text editor which doesn't provide access to Google Sheets spreadsheets.)

(2016 年 7 月)使用当前术语重新表述:“如何使用 Python从 Google Drive下载 CSV 或 XLSX 格式的 Google 表格?”。(Google Docs 现在仅指基于云的文字处理器/文本编辑器,它不提供对 Google Sheets 电子表格的访问。)

First, all other answers are pretty much outdated or will be, either because they use the old GData("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.

首先,所有其他答案都已经过时或将会过时,要么是因为它们使用旧的GData(“ Google 数据”)协议ClientLoginAuthSub,所有这些都已被弃用。对于使用 Google Sheets API v3 或更早版本的所有代码或库也是如此。

Modern Google API access occurs using API keys (public data) or OAuth2 authorization (authorized data), primarily with the Google APIs Client Libraries, including the one for Python. (And no, you don't have to build an entire auth system just to access the APIs... see the blogpost below.)

现代 Google API 访问使用 API 密钥(公共数据)或 OAuth2 授权(授权数据)进行,主要使用Google API 客户端库,包括用于 Python 的客户端库。(不,您不必为了访问 API 而构建整个身份验证系统……请参阅下面的博文。)

To perform the task requested in/by the OP, you need authorzed access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpostsharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of postsalong with a video that outlines how to upload files to and download files from Google Drive.

要执行 OP 中/由 OP 请求的任务,您需要获得对Google Drive API 的授权访问权限,也许可以查询要下载的特定表格,然后执行实际导出。由于这可能是一个常见的操作,我写了一篇博文,分享了一个代码片段,可以为您执行此操作。如果您想进一步了解这一点,我还有另外一对帖子和一个视频,其中概述了如何将文件上传到 Google Drive 和从 Google Drive 下载文件。

Note that there is also a newer Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based requestlike exporting where the Drive API is the correct one to use.

请注意,还有一个较新的Google Sheets API v4,但它主要用于面向电子表格的操作,即插入数据、读取电子表格行、单元格格式、创建图表、添加数据透视表等,而不是基于文件的请求,如导出Drive API 是正确使用的。

To see an example of exporting a Google Sheet as CSV from Drive, check out this blog postI wrote; to learn more about using Google Sheets with Python, see this answer I wrotefor a similar question. You can also download a Sheet in XLSX and other formats supported by Drive.

要查看从 Drive 将 Google Sheet 导出为 CSV 的示例,请查看我写的这篇博文;要了解有关在 Python 中使用 Google Sheets 的更多信息,请参阅为类似问题编写的这个答案。您还可以下载 XLSX 和Drive 支持的其他格式的工作表。

If you're completely new to Google APIs, then you need to take a further step back and review these videos first:

如果您完全不熟悉 Google API,那么您需要退后一步,先查看这些视频:

If you already have experience with G Suite APIs and want to see more videos on using both APIs:

如果您已经拥有 G Suite API 的使用经验并希望观看有关使用这两种 API 的更多视频:

回答by Nithin

(Dec 16)Try another library i wrote : pygsheets. Its similar to gspread, but uses google api v4. It has an exportmethod to export spreadsheet.

(12 月 16 日)尝试我写的另一个库:pygsheets。它类似于 gspread,但使用 google api v4。它有一种export导出电子表格的方法。

import pygsheets

gc = pygsheets.authorize()

# Open spreadsheet and then workseet
sh = gc.open('my new ssheet')
wks = sh.sheet1

#export as csv
wks.export(pygsheets.ExportType.CSV)