在 Pandas 中加载通用的 Google 电子表格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24063629/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:07:58  来源:igfitidea点击:

Loading a generic Google Spreadsheet in Pandas

pythonpandasgdata

提问by Amelio Vazquez-Reina

When I try to load a Google Spreadsheet in pandas

当我尝试在 Pandas 中加载 Google 电子表格时

from StringIO import StringIO  
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=<some_long_code>&output=csv')
data = r.content
df = pd.read_csv(StringIO(data), index_col=0)

I get the following:

我得到以下信息:

CParserError: Error tokenizing data. C error: Expected 1316 fields in line 73, saw 1386

Why? I would think that one could identify the spreadsheet set of rows and columns with data and use the spreadsheets rows and columns as the dataframe index and columns respectively (with NaN for anything empty). Why does it fail?

为什么?我认为可以识别包含数据的行和列的电子表格集,并分别使用电子表格的行和列作为数据框索引和列(对于任何空值都使用 NaN)。为什么会失败?

回答by dartdog

This question of mine shows how Getting Google Spreadsheet CSV into A Pandas Dataframe

我的这个问题展示了如何将 Google 电子表格 CSV 导入 Pandas 数据框

As one of the commentators noted you have not asked for the data in CSV format you have the "edit" request at the end of the url You can use this code and see it work on the spreadsheet (which by the way needs to be public..) It is possible to do private sheets as well but that is another topic.

正如一位评论员指出的,您没有要求提供 CSV 格式的数据,您在 url 末尾有“编辑”请求您可以使用此代码并查看它在电子表格上的工作(顺便说一下需要公开..) 也可以做私人工作表,但那是另一个话题。

from StringIO import StringIO  # got moved around in python3 if you're using that.

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

The new Google spreadsheet url format for getting the csv output is

用于获取 csv 输出的新 Google 电子表格 url 格式是

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

Well they changed the url format slightly again now you need:

好吧,他们再次稍微更改了 url 格式,现在您需要:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=0 #for the 1st sheet

I also found I needed to do the following to deal with Python 3 a slight revision to the above:

我还发现我需要执行以下操作来处理 Python 3 对上述内容的轻微修改:

from io import StringIO 

and to get the file:

并获取文件:

guid=0 #for the 1st sheet
act = requests.get('https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=%s' % guid)
dataact = act.content.decode('utf-8') #To convert to string for Stringio
actdf = pd.read_csv(StringIO(dataact),index_col=0,parse_dates=[0], thousands=',').sort()

actdf is now a full pandas dataframe with headers (column names)

actdf 现在是一个带有标题(列名)的完整 Pandas 数据框

回答by zhukovgreen

In google sheet click file>publish to web. Then select what do you need to publish and select export format .csv. You'll have the link something like: https://docs.google.com/spreadsheets/d/<your sheets key yhere>/pub?gid=1317664180&single=true&output=csv

在谷歌表中点击文件>发布到网络。然后选择您需要发布的内容并选择导出格式 .csv。您将获得类似以下内容的链接: https://docs.google.com/spreadsheets/d/<your sheets key yhere>/pub?gid=1317664180&single=true&output=csv

Then simply:

然后简单地:

import pandas as pd
pathtoCsv = r'https://docs.google.com/spreadsheets/d/<sheets key>/pub?gid=1317664180&single=true&output=csv'
dev = pd.read_csv(pathtoCsv)
print dev

回答by FrankC

The current Google Drive URL to export as csv is:

当前要导出为 csv 的 Google Drive URL 是:

https://drive.google.com/uc?export=download&id=EnterIDHere

https://drive.google.com/uc?export=download&id=EnterIDHere

So:

所以:

import pandas as pd
pathtocsv = r'https://drive.google.com/uc?export=download&id=EnterIDHere'
df = pd.read_csv(pathtocsv)