Python 将 Google 电子表格 CSV 导入 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19611729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:10:35  来源:igfitidea点击:

Getting Google Spreadsheet CSV into A Pandas Dataframe

pythonpandasgoogle-sheetsgoogle-drive-apigoogle-apps

提问by dartdog

I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)

我将文件上传到 Google 电子表格(以制作一个可公开访问的示例 IPython Notebook,带有数据)我使用的文件以其原生形式可以读入 Pandas Dataframe。所以现在我使用下面的代码来读取电子表格,工作正常,但只是作为字符串输入,而且我没有任何运气试图将它恢复到数据帧中(您可以获取数据)

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

The data ends up looking like: (1st row headers)

数据最终看起来像:(第一行标题)

',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n

The native pandas code that brings in the disk resident file looks like:

引入磁盘驻留文件的原生 Pandas 代码如下所示:

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.

“干净”的解决方案将有助于许多人提供一种简单的方法来共享 Pandas 使用的数据集!我尝试了一堆替代方案但没有成功,而且我很确定我又错过了一些明显的东西。

Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:

只是更新说明 新的 Google 电子表格具有不同的 URL 模式只需使用它代替上面示例和/或以下答案中的 URL,您应该没问题,这是一个示例:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...

请参阅下面来自@Max Ghenis 的解决方案,它只使用了 pd.read_csv,不需要 StringIO 或请求...

采纳答案by TomAugspurger

You can use read_csv()on a StringIOobject:

您可以read_csv()StringIO对象上使用:

from io import BytesIO

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

    Inventory_exp  Price_exp  Credit_exp  
0               2          3           3  
10              4          4           3  
12              2          2           3  
38              3          3           2  
44              4          4           4  

回答by Max Ghenis

Seems to work for me without the StringIO:

似乎对我有用,没有StringIO

test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + 
                   '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
                   '/export?gid=0&format=csv',
                   # Set first column as rownames in data frame
                   index_col=0,
                   # Parse column values to datetime
                   parse_dates=['Quradate']
                  )
test.head(5)  # Same result as @TomAugspurger

BTW, including the ?gid=enables importing different sheets, find the gid in the URL.

顺便说一句,包括?gid=允许导入不同的工作表,在 URL 中找到 gid。

回答by Abhery Guha

My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!

我的方法有点不同。我只是使用了 pandas.Dataframe() 但显然需要安装和导入 gspread。它工作得很好!

gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())

回答by Gianmario Spacagna

I have been using the following utils and it worked so far:

我一直在使用以下实用程序,到目前为止它有效:

def load_from_gspreadsheet(sheet_name, key):
    url = 'https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}&headers=1'.format(
        key=key, sheet_name=sheet_name.replace(' ', '%20'))

    log.info('Loading google spreadsheet from {}'.format(url))

    df = pd.read_csv(url)
    return df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)

You must specify the sheet_name and the key. The key is the string you get from the url in the following path: https://docs.google.com/spreadsheets/d/{key}/edit/.

您必须指定 sheet_name 和密钥。关键是您从以下路径中的 url 获取的字符串:https://docs.google.com/spreadsheets/d/{key}/edit/.

You can change the value of headers if you have more than one row for the column names but I am not sure if it still work with multi-headers.

如果列名称有多于一行,您可以更改标题的值,但我不确定它是否仍然适用于多标题。

It may brake if Google will change their APIs.

如果谷歌改变他们的 API,它可能会刹车。

Also please bear in mind that your spreadsheet must be public, everyone with the link can read it.

另请记住,您的电子表格必须是公开的,每个知道链接的人都可以阅读。

回答by kaza

If the csv file was shared via drive and not via spreadsheet then the below change to the url would work

如果 csv 文件是通过驱动器共享的,而不是通过电子表格共享的,那么对 url 的以下更改将起作用

#Derive the id from the google drive shareable link.
#For the file at hand the link is as below
#<https://drive.google.com/open?id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69>
file_id='1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
link='https://drive.google.com/uc?export=download&id={FILE_ID}'
csv_url=link.format(FILE_ID=file_id)
#The final url would be as below:-
#csv_url='https://drive.google.com/uc?export=download&id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
df = pd.read_csv(csv_url)

And the dataframe would be (if you just ran the above code)

数据框将是(如果你只是运行上面的代码)

    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9   10  11
3   12  13  14  15

See working code here.

请参阅此处的工作代码。

回答by Ken Arnold

Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.

在浏览器中打开所需的特定工作表。确保它至少可供知道链接的任何人查看。复制并粘贴 URL。你会得到类似的东西https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER

sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'

First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:

首先,我们将其转换为 CSV 导出 URL,例如https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER

csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')

Then we pass it to pd.read_csv, which can take a URL.

然后我们将它传递给pd.read_csv,它可以接受一个 URL。

df = pd.read_csv(csv_export_url)

This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.

如果 Google 更改其 API(似乎未记录),这将中断,并且如果发生网络故障,可能会出现无益的错误。