Python 如何使用请求从 Github 下载和写入文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14120502/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:32:20  来源:igfitidea点击:

How to download and write a file from Github using Requests

pythongithubpython-requests

提问by Fomite

Lets say there's a file that lives at the github repo:

假设有一个文件存在于 github 存储库中:

https://github.com/someguy/brilliant/blob/master/somefile.txt

https://github.com/someguy/brilliant/blob/master/somefile.txt

I'm trying to use requests to request this file, write the content of it to disk in the current working directory where it can be used later. Right now, I'm using the following code:

我正在尝试使用请求来请求这个文件,将它的内容写入当前工作目录中的磁盘,以后可以使用它。现在,我正在使用以下代码:

import requests
from os import getcwd

url = "https://github.com/someguy/brilliant/blob/master/somefile.txt"
directory = getcwd()
filename = directory + 'somefile.txt'
r = requests.get(url)

f = open(filename,'w')
f.write(r.content)

Undoubtedly ugly, and more importantly, not working. Instead of the expected text, I get:

无疑是丑陋的,更重要的是,不工作。而不是预期的文本,我得到:

<!DOCTYPE html>
<!--

Hello future GitHubber! I bet you're here to remove those nasty inline styles,
DRY up these templates and make 'em nice and re-usable, right?

Please, don't. https://github.com/styleguide/templates/2.0

-->
<html>
  <head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
    <title>Page not found &middot; GitHub</title>
    <style type="text/css" media="screen">
      body {
        background: #f1f1f1;
        font-family: "HelveticaNeue", Helvetica, Arial, sans-serif;
        text-rendering: optimizeLegibility;
        margin: 0; }

      .container { margin: 50px auto 40px auto; width: 600px; text-align: center; }

      a { color: #4183c4; text-decoration: none; }
      a:visited { color: #4183c4 }
      a:hover { text-decoration: none; }

      h1 { letter-spacing: -1px; line-height: 60px; font-size: 60px; font-weight: 100; margin: 0px; text-shadow: 0 1px 0 #fff; }
      p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

      ul { list-style: none; margin: 25px 0; padding: 0; }
      li { display: table-cell; font-weight: bold; width: 1%; }
      #error-suggestions { font-size: 14px; }
      #next-steps { margin: 25px 0 50px 0;}
      #next-steps li { display: block; width: 100%; text-align: center; padding: 5px 0; font-weight: normal; color: rgba(0, 0, 0, 0.5); }
      #next-steps a { font-weight: bold; }
      .divider { border-top: 1px solid #d5d5d5; border-bottom: 1px solid #fafafa;}

      #parallax_wrapper {
        position: relative;
        z-index: 0;
      }
      #parallax_field {
        overflow: hidden;
        position: absolute;
        left: 0;
        top: 0;
        height: 370px;
        width: 100%;
      }

etc etc.

Content from Github, but not the content of the file. What am I doing wrong?

等等

来自 Github 的内容,但不是文件的内容。我究竟做错了什么?

采纳答案by Martijn Pieters

The content of the file in question is includedin the returned data. You are getting the full GitHub view of that file, not just the contents.

相关文件的内容包含在返回的数据中。您将获得该文件的完整 GitHub 视图,而不仅仅是内容。

If you want to download justthe file, you need to use the Rawlink at the top of the page, which will be (for your example):

如果你想下载只是文件,你需要使用的Raw链接页面的顶部,这将是(你的例子):

https://raw.github.com/someguy/brilliant/master/somefile.txt

Note the change in domain name, and the blob/part of the path is gone.

注意域名的变化,blob/路径的部分消失了。

To demonstrate this with the requestsGitHub repository itself:

要使用requestsGitHub 存储库本身来演示这一点:

>>> import requests
>>> r = requests.get('https://github.com/kennethreitz/requests/blob/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/html; charset=utf-8'
>>> r = requests.get('https://raw.github.com/kennethreitz/requests/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/plain; charset=utf-8'
>>> print r.text
Requests: HTTP for Humans
=========================


.. image:: https://travis-ci.org/kennethreitz/requests.png?branch=master
[... etc. ...]

回答by Burhan Khalid

You need to request the raw version of the file, from https://raw.github.com.

您需要从https://raw.github.com.

See the difference:

看到不同:

https://raw.github.com/django/django/master/setup.pyvs. https://github.com/django/django/blob/master/setup.py

https://raw.github.com/django/django/master/setup.py对比https://github.com/django/django/blob/master/setup.py

Also, you should probably add a /between your directory and the filename:

此外,您可能应该/在目录和文件名之间添加一个:

>>> getcwd()+'foo.txt'
'/Users/burhanfoo.txt'
>>> import os
>>> os.path.join(getcwd(),'foo.txt')
'/Users/burhan/foo.txt'