如何使用 Python 读取 URL 的内容?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15138614/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I read the contents of an URL with Python?
提问by Helen Neely
The following works when I paste it on the browser:
当我将其粘贴到浏览器上时,以下内容有效:
http://www.somesite.com/details.pl?urn=2344
But when I try reading the URL with Python nothing happens:
但是当我尝试使用 Python 读取 URL 时,没有任何反应:
link = 'http://www.somesite.com/details.pl?urn=2344'
f = urllib.urlopen(link)
myfile = f.readline()
print myfile
Do I need to encode the URL, or is there something I'm not seeing?
我是否需要对 URL 进行编码,或者有什么我没有看到的内容?
采纳答案by woozyking
To answer your question:
回答你的问题:
import urllib
link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)
You need to read(), not readline()
你需要read(),而不是readline()
EDIT (2018-06-25): Since Python 3, the legacy urllib.urlopen()was replaced by urllib.request.urlopen()(see notes from https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopenfor details).
编辑(2018-06-25):自 Python 3 起,遗留urllib.urlopen()被替换为urllib.request.urlopen()(有关详细信息,请参阅https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen 的注释) .
If you're using Python 3, see answers by Martin Thoma or i.n.n.m within this question: https://stackoverflow.com/a/28040508/158111(Python 2/3 compat) https://stackoverflow.com/a/45886824/158111(Python 3)
如果您使用的是 Python 3,请参阅 Martin Thoma 或 innm 在此问题中的回答:https://stackoverflow.com/a/28040508/158111 (Python 2/3 兼容) https://stackoverflow.com/a/45886824 /158111(Python 3)
Or, just get this library here: http://docs.python-requests.org/en/latest/and seriously use it :)
或者,只需在此处获取此库:http: //docs.python-requests.org/en/latest/并认真使用它:)
import requests
link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)
回答by ATOzTOA
The URL should be a string:
URL 应该是一个字符串:
import urllib
link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.readline()
print myfile
回答by Martin Thoma
A solution with works with Python 2.X and Python 3.X makes use of the Python 2 and 3 compatibility library six:
适用于 Python 2.X 和 Python 3.X 的解决方案利用了 Python 2 和 3 兼容性库six:
from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)
回答by Giorgio Giuliani
I used the following code:
我使用了以下代码:
import urllib
def read_text():
quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
contents_file = quotes.read()
print contents_file
read_text()
回答by i.n.n.m
For python3users, to save time, use the following code,
对于python3用户,为了节省时间,请使用以下代码,
from urllib.request import urlopen
link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"
f = urlopen(link)
myfile = f.read()
print(myfile)
I know there are different threads for error: Name Error: urlopen is not defined, but thought this might save time.
我知道 error: 有不同的线程Name Error: urlopen is not defined,但我认为这可能会节省时间。
回答by Akash K
We can read website html content as below :
我们可以阅读网站 html 内容如下:
from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)
回答by Jared
None of these answers are very good for Python 3 (tested on latest version at the time of this post).
这些答案都不适用于 Python 3(在撰写本文时在最新版本上进行了测试)。
This is how you do it...
这就是你如何做到...
def print_some_url():
with urllib.request.urlopen('http://mywebsiteurl') as f:
print(f.read().decode('utf-8'))
The above is for contents that return 'utf-8'. Remove .decode('utf-8') if you want python to "guess the appropriate encoding."
以上是返回 'utf-8' 的内容。删除 .decode('utf-8') 如果你想让 python “猜测适当的编码”。
Documentation: https://docs.python.org/3/library/urllib.request.html#module-urllib.request
文档:https: //docs.python.org/3/library/urllib.request.html#module-urllib.request
回答by ARVIND CHAUHAN
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.
import sys
if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
url:
data = url.read()
print data
# When the server does not know where the request is coming from.
# Works on python 3.
import urllib.request
user_agent = \
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}
request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data
回答by ksono
# retrieving data from url
# only for python 3
import urllib.request
def main():
url = "http://docs.python.org"
# retrieving data from URL
webUrl = urllib.request.urlopen(url)
print("Result code: " + str(webUrl.getcode()))
# print data from URL
print("Returned data: -----------------")
data = webUrl.read().decode("utf-8")
print(data)
if __name__ == "__main__":
main()
回答by u13553792
from urllib.request import urlopen
# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)

