Python 如何在 Google Colab 中读取 csv 到数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48340341/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:39:26  来源:igfitidea点击:

How to read csv to dataframe in Google Colab

pythoncsvdataframegoogle-colaboratory

提问by PagMax

I am trying to read a csv file which I stored locally on my machine. (Just for additional reference it is titanic data from Kaggle which is here.)

我正在尝试读取我本地存储在我的机器上的 csv 文件。(只为额外的参考是从Kaggle泰坦尼克号的数据是在这里。)

From thisquestion and answers I learnt that you can import data using this code which works well from me.

这个问题和答案中,我了解到您可以使用此代码导入数据,这对我来说效果很好。

from google.colab import files
uploaded = files.upload()

Where I am lost is how to convert it to dataframe from here. The sample google notebook pagelisted in the answer above does not talk about it.

我迷路的是如何从这里将其转换为数据帧。上面答案中列出的示例 google notebook 页面没有谈论它。

I am trying to convert the dictionary uploadedto dataframe using from_dictcommand but not able to make it work. There is some discussion on converting dict to dataframe herebut the solutions are not applicable to me (I think).

我正在尝试uploaded使用from_dict命令将字典转换为数据框,但无法使其工作。有上转换字典内数据帧的一些讨论,在这里,但解决方案并不适用于我(我认为)。

So summarizing, my question is:

总结一下,我的问题是:

How do I convert a csv file stored locally on my files to pandas dataframe on Google Colaboratory?

如何将本地存储在我的文件中的 csv 文件转换为 Google Colaboratory 上的 Pandas 数据框?

回答by Bob Smith

Pandas read_csvshould do the trick. You'll want to wrap your uploaded bytes in an io.StringIOsince read_csvexpects a file-like object.

熊猫read_csv应该可以解决问题。io.StringIO由于read_csv需要一个类似文件的对象,因此您需要将上传的字节包装起来。

Here's a full example: https://colab.research.google.com/notebook#fileId=1JmwtF5OmSghC-y3-BkvxLan0zYXqCJJf

这是一个完整的示例:https: //colab.research.google.com/notebook#fileId=1JmwtF5OmSghC-y3-BkvxLan0zYXqCJJf

The key snippet is:

关键片段是:

import pandas as pd
import io

df = pd.read_csv(io.StringIO(uploaded['train.csv'].decode('utf-8')))
df

回答by Garima Jain

step 1- Mount your Google Drive to Collaboratory

第 1 步 - 将您的 Google Drive 安装到 Collaboratory

from google.colab import drive 
drive.mount('/content/gdrive')

step 2- Now you will see your Google Drive files in the left pane (file explorer). Right click on the file that you need to import and select ?opy path. Then import as usual in pandas, using this copied path.

第 2 步 - 现在您将在左窗格(文件资源管理器)中看到您的 Google Drive 文件。右键单击需要导入的文件并选择 ?opy 路径。然后像往常一样导入熊猫,使用这个复制的路径。

import pandas as pd 
df=pd.read_csv('gdrive/My Drive/data.csv')

Done!

完毕!

回答by Yasser Mustafa

Colab google: uploading csv from your PCI had the same problem with an excel file (*.xlsx), I solved the problem as the following and I think you could do the same with csv files: - If you have a file in your PC drive called (file.xlsx) then: 1- Upload it from your hard drive by using this simple code:

Colab google:从您的 PC 上传 csv我遇到了与 excel 文件 (*.xlsx) 相同的问题,我解决了以下问题,我认为您可以对 csv 文件执行相同的操作: - 如果您的文件中有一个文件PC 驱动器称为 (file.xlsx) 然后: 1- 使用以下简单代码从您的硬盘驱动器上传它:

from google.colab import files
uploaded = files.upload()

Press on (Choose Files) and upload it to your google drive.

按(选择文件)并将其上传到您的谷歌驱动器。

2- Then:

2-然后:

import io
data = io.BytesIO(uploaded['file.XLSX'])    

3- Finally, read your file:

3- 最后,阅读您的文件:

import pandas as pd   
f = pd.read_excel(data , sheet_name = '1min', header = 0, skiprows = 2)
#df.sheet_names
df.head()

4- Please, change parameters values to read your own file. I think this could be generalized to read other types of files!
Enjoy it!

4- 请更改参数值以读取您自己的文件。我认为这可以推广到读取其他类型的文件!
好好享受!

回答by JARS

This worked for me:

这对我有用:

from google.colab import auth
auth.authenticate_user()

from pydrive.drive import GoogleDrive
from pydrive.auth import GoogleAuth
from oauth2client.client import GoogleCredentials
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

myfile = drive.CreateFile({'id': '!!!YOUR FILE ID!!!'})
myfile.GetContentFile('file.csv')

Replace !!!YOUR FILE ID!!!with the id of the file in google drive (this is the long alphanumeric string that appears when you click on "obtain link to share"). Then you can access file.csv with pandas' read_csv:

替换!!!YOUR FILE ID!!!为 google drive 中文件的 id(这是当您单击“获取共享链接”时出现的长字母数字字符串)。然后你可以使用pandas的read_csv访问file.csv:

import pandas as pd
frm = pd.read_csv('file.csv', header=None)

回答by Diwakar

Alternatively, you can use github to import files also. You can take this as an example: https://drive.google.com/file/d/1D6ViUx8_ledfBqcxHCrFPcqBvNZitwCs/view?usp=sharing

或者,您也可以使用 github 导入文件。你可以以此为例:https: //drive.google.com/file/d/1D6ViUx8_ledfBqcxHCrFPcqBvNZitwCs/view?usp=sharing

Also google does not persist the file for longer so you may have to run the github snippets time and again.

此外,谷歌不会将文件保留更长时间,因此您可能必须一次又一次地运行 github 代码段。