如何使用 Python 读取 URL 的内容？

Question

提问by Helen Neely

The following works when I paste it on the browser:

当我将其粘贴到浏览器上时，以下内容有效：

http://www.somesite.com/details.pl?urn=2344

But when I try reading the URL with Python nothing happens:

但是当我尝试使用 Python 读取 URL 时，没有任何反应：

 link = 'http://www.somesite.com/details.pl?urn=2344'
 f = urllib.urlopen(link)           
 myfile = f.readline()  
 print myfile

Do I need to encode the URL, or is there something I'm not seeing?

我是否需要对 URL 进行编码，或者有什么我没有看到的内容？

Answer 1

采纳答案by woozyking

To answer your question:

回答你的问题：

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)

You need to read(), not readline()

你需要read()，而不是readline()

EDIT (2018-06-25): Since Python 3, the legacy urllib.urlopen()was replaced by urllib.request.urlopen()(see notes from https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopenfor details).

编辑（2018-06-25）：自 Python 3 起，遗留urllib.urlopen()被替换为urllib.request.urlopen()（有关详细信息，请参阅https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen 的注释） .

If you're using Python 3, see answers by Martin Thoma or i.n.n.m within this question: https://stackoverflow.com/a/28040508/158111(Python 2/3 compat) https://stackoverflow.com/a/45886824/158111(Python 3)

如果您使用的是 Python 3，请参阅 Martin Thoma 或 innm 在此问题中的回答：https://stackoverflow.com/a/28040508/158111 （Python 2/3 兼容） https://stackoverflow.com/a/45886824 /158111（Python 3）

Or, just get this library here: http://docs.python-requests.org/en/latest/and seriously use it :)

或者，只需在此处获取此库：http: //docs.python-requests.org/en/latest/并认真使用它:)

import requests

link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)

Answer 2

回答by ATOzTOA

The URL should be a string:

URL 应该是一个字符串：

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)           
myfile = f.readline()  
print myfile

Answer 3

回答by Martin Thoma

A solution with works with Python 2.X and Python 3.X makes use of the Python 2 and 3 compatibility library six:

适用于 Python 2.X 和 Python 3.X 的解决方案利用了 Python 2 和 3 兼容性库six：

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)

Answer 4

回答by Giorgio Giuliani

I used the following code:

我使用了以下代码：

import urllib

def read_text():
      quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
      contents_file = quotes.read()
      print contents_file

read_text()

Answer 5

回答by i.n.n.m

For python3users, to save time, use the following code,

对于python3用户，为了节省时间，请使用以下代码，

from urllib.request import urlopen

link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen(link)
myfile = f.read()
print(myfile)

I know there are different threads for error: Name Error: urlopen is not defined, but thought this might save time.

我知道 error: 有不同的线程Name Error: urlopen is not defined，但我认为这可能会节省时间。

Answer 6

回答by Akash K

We can read website html content as below :

我们可以阅读网站 html 内容如下：

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)

Answer 7

回答by Jared

None of these answers are very good for Python 3 (tested on latest version at the time of this post).

这些答案都不适用于 Python 3（在撰写本文时在最新版本上进行了测试）。

This is how you do it...

这就是你如何做到...

def print_some_url():
    with urllib.request.urlopen('http://mywebsiteurl') as f:
        print(f.read().decode('utf-8'))

The above is for contents that return 'utf-8'. Remove .decode('utf-8') if you want python to "guess the appropriate encoding."

以上是返回 'utf-8' 的内容。删除 .decode('utf-8') 如果你想让 python “猜测适当的编码”。

Documentation: https://docs.python.org/3/library/urllib.request.html#module-urllib.request

文档：https: //docs.python.org/3/library/urllib.request.html#module-urllib.request

Answer 8

回答by ARVIND CHAUHAN

#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
    url:
    data = url.read()

print data

# When the server does not know where the request is coming from.
# Works on python 3.

import urllib.request

user_agent = \
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}

request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data

Answer 9

回答by ksono

# retrieving data from url
# only for python 3

import urllib.request

def main():
  url = "http://docs.python.org"

# retrieving data from URL
  webUrl = urllib.request.urlopen(url)
  print("Result code: " + str(webUrl.getcode()))

# print data from URL 
  print("Returned data: -----------------")
  data = webUrl.read().decode("utf-8")
  print(data)

if __name__ == "__main__":
  main()

Answer 10

回答by u13553792

from urllib.request import urlopen

# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)

如何使用 Python 读取 URL 的内容？

提问by Helen Neely

采纳答案by woozyking

回答by ATOzTOA

回答by Martin Thoma

回答by Giorgio Giuliani

回答by i.n.n.m

回答by Akash K

回答by Jared

回答by ARVIND CHAUHAN

回答by ksono

回答by u13553792

相关推荐

最近更新

标签

如何使用 Python 读取 URL 的内容？

提问by Helen Neely

采纳答案by woozyking

回答by ATOzTOA

回答by Martin Thoma

回答by Giorgio Giuliani

回答by i.n.n.m

回答by Akash K

回答by Jared

回答by ARVIND CHAUHAN

回答by ksono

回答by u13553792

相关推荐

|= (ior) 在 Python 中做什么？

Python 将函数应用于熊猫数据框的每一行以创建两个新列

Python 在 Ruby 中是否有像 ||= 这样的“或等于”函数？

Python 是否存在重边阶跃函数？

相关推荐

最近更新

标签