pandas 从 github python 下载和访问数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23464138/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Downloading and accessing data from github python
提问by user3314418
Hi I'm going through Python for Data analysis and I'd like to analyze the data he goes through in the book. In chapter 9, he uses the data below. However, I'm having a difficult time understanding how to utilize the data in my ipython notebook once I download it to my github application on mac.
嗨,我正在学习 Python 进行数据分析,我想分析他在书中经历的数据。在第 9 章中,他使用了以下数据。但是,一旦我将 ipython 笔记本中的数据下载到 mac 上的 github 应用程序,我就很难理解如何利用它。
The stock data is here: https://github.com/pydata/pydata-book/blob/master/ch09/stock_px.csv
股票数据在这里:https: //github.com/pydata/pydata-book/blob/master/ch09/stock_px.csv
I clicked "open" which downloaded a large file on my github application. It looks like the below. How do I get this data to open in my ipython notebook?
我点击了“打开”,它在我的 github 应用程序上下载了一个大文件。它看起来像下面。如何让这些数据在我的 ipython 笔记本中打开?
**Looking at other stackoverflow questions, I know I can just download the zip file, which I am doing as well. It would be cool to know how to use the github application efficiently.
**查看其他 stackoverflow 问题,我知道我可以只下载 zip 文件,我也在这样做。知道如何有效地使用 github 应用程序会很酷。
Right clicking and saving the csv file seems to save the json/html file
右键单击并保存 csv 文件似乎保存了 json/html 文件


回答by Karl D.
You should be able to just use the urlof the raw version (a link to the raw version is a button on the link you provided) and then read it into a dataframe directly using read_csv:
您应该能够只使用url原始版本的 的(原始版本的链接是您提供的链接上的按钮),然后直接使用read_csv以下命令将其读入数据帧:
import pandas as pd
url = 'https://raw.githubusercontent.com/pydata/pydata-book/master/ch09/stock_px.csv'
df = pd.read_csv(url,index_col=0,parse_dates=[0])
print df.head(5)
AAPL MSFT XOM SPX
2003-01-02 7.40 21.11 29.22 909.03
2003-01-03 7.45 21.14 29.24 908.59
2003-01-06 7.45 21.52 29.96 929.01
2003-01-07 7.43 21.93 28.95 922.93
2003-01-08 7.28 21.31 28.83 909.93
Edit: a brief explanation about the options I used to read in the file:
编辑:关于我曾经在文件中阅读的选项的简要说明:
df = pd.read_csv(url,index_col=0,parse_dates=[0])
The first column (column = 0) is a column of dates in the file and because it had no column name it looked like it was meant to be the index; index_col=0makes it the index and parse_dates[0] tells read_csv to parse column=0 (the first column) as dates.
第一列(column = 0)是文件中的一列日期,因为它没有列名,所以它看起来像是索引;index_col=0使其成为索引, parse_dates[0] 告诉 read_csv 将 column=0(第一列)解析为日期。

