pandas 从 github python 下载和访问数据

Question

提问by user3314418

Hi I'm going through Python for Data analysis and I'd like to analyze the data he goes through in the book. In chapter 9, he uses the data below. However, I'm having a difficult time understanding how to utilize the data in my ipython notebook once I download it to my github application on mac.

嗨，我正在学习 Python 进行数据分析，我想分析他在书中经历的数据。在第 9 章中，他使用了以下数据。但是，一旦我将 ipython 笔记本中的数据下载到 mac 上的 github 应用程序，我就很难理解如何利用它。

The stock data is here: https://github.com/pydata/pydata-book/blob/master/ch09/stock_px.csv

股票数据在这里：https: //github.com/pydata/pydata-book/blob/master/ch09/stock_px.csv

I clicked "open" which downloaded a large file on my github application. It looks like the below. How do I get this data to open in my ipython notebook?

我点击了“打开”，它在我的 github 应用程序上下载了一个大文件。它看起来像下面。如何让这些数据在我的 ipython 笔记本中打开？

**Looking at other stackoverflow questions, I know I can just download the zip file, which I am doing as well. It would be cool to know how to use the github application efficiently.

**查看其他 stackoverflow 问题，我知道我可以只下载 zip 文件，我也在这样做。知道如何有效地使用 github 应用程序会很酷。

Right clicking and saving the csv file seems to save the json/html file

右键单击并保存 csv 文件似乎保存了 json/html 文件

enter image description here

在此处输入图片说明

Answer 1

回答by Karl D.

You should be able to just use the urlof the raw version (a link to the raw version is a button on the link you provided) and then read it into a dataframe directly using read_csv:

您应该能够只使用url原始版本的的（原始版本的链接是您提供的链接上的按钮），然后直接使用read_csv以下命令将其读入数据帧：

import pandas as pd
url = 'https://raw.githubusercontent.com/pydata/pydata-book/master/ch09/stock_px.csv'
df = pd.read_csv(url,index_col=0,parse_dates=[0])

print df.head(5)

            AAPL   MSFT    XOM     SPX
2003-01-02  7.40  21.11  29.22  909.03
2003-01-03  7.45  21.14  29.24  908.59
2003-01-06  7.45  21.52  29.96  929.01
2003-01-07  7.43  21.93  28.95  922.93
2003-01-08  7.28  21.31  28.83  909.93

Edit: a brief explanation about the options I used to read in the file:

编辑：关于我曾经在文件中阅读的选项的简要说明：

df = pd.read_csv(url,index_col=0,parse_dates=[0])

The first column (column = 0) is a column of dates in the file and because it had no column name it looked like it was meant to be the index; index_col=0makes it the index and parse_dates[0] tells read_csv to parse column=0 (the first column) as dates.

第一列（column = 0）是文件中的一列日期，因为它没有列名，所以它看起来像是索引；index_col=0使其成为索引， parse_dates[0] 告诉 read_csv 将 column=0（第一列）解析为日期。

pandas 从 github python 下载和访问数据

提问by user3314418

回答by Karl D.

相关推荐

最近更新

标签

pandas 从 github python 下载和访问数据

提问by user3314418

回答by Karl D.

相关推荐

pandas 熊猫get_data_yahoo面板数据表

如何使用多索引移动 Pandas DataFrame？

pandas 过滤掉超过一定数量的 NaN 的行

pandas 熊猫内存错误

相关推荐

最近更新

标签