如何使用 Python 请求伪造浏览器访问？

Question

提问by user1726366

I want to get the content from the below website. If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wgetcommand) to get it, it returns a totally different HTML page. I thought the developer of the website had made some blocks for this, so the question is:

我想从以下网站获取内容。如果我使用像 Firefox 或 Chrome 这样的浏览器，我可以获得我想要的真实网站页面，但是如果我使用 Python requests 包（或wget命令）来获取它，它会返回一个完全不同的 HTML 页面。我以为网站的开发者为此做了一些阻止，所以问题是：

How do I fake a browser visit by using python requests or command wget?

如何使用 python 请求或命令 wget 伪造浏览器访问？

http://www.ichangtou.com/#company:data_000008.html

Answer 1

采纳答案by alecxe

Provide a User-Agentheader:

提供User-Agent标题：

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

FYI, here is a list of User-Agent strings for different browsers:

仅供参考，这里是不同浏览器的用户代理字符串列表：

List of all Browsers

所有浏览器列表

As a side note, there is a pretty useful third-party package called fake-useragentthat provides a nice abstraction layer over user agents:

作为旁注，有一个非常有用的第三方包，称为fake-useragent，它在用户代理上提供了一个很好的抽象层：

fake-useragent
Up to date simple useragent faker with real world database

假用户代理
具有真实世界数据库的最新简单用户代理伪造器

Demo:

演示：

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

Answer 2

回答by Gilles Quenot

Try doing this, using firefox as fake user agent(moreover, it's a good startup script for web scraping with the use of cookies):

尝试这样做，使用 firefox 作为假用户代理（此外，它是使用 cookie 进行网页抓取的一个很好的启动脚本）：

#!/usr/bin/env python2
# -*- coding: utf8 -*-
# vim:ts=4:sw=4


import cookielib, urllib2, sys

def doIt(uri):
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    page = opener.open(uri)
    page.addheaders = [('User-agent', 'Mozilla/5.0')]
    print page.read()

for i in sys.argv[1:]:
    doIt(i)

USAGE:

用法：

python script.py "http://www.ichangtou.com/#company:data_000008.html"

Answer 3

回答by Umesh Kaushik

if this question is still valid

如果这个问题仍然有效

I used fake UserAgent

我使用了假的 UserAgent

How to use:

如何使用：

from fake_useragent import UserAgent
import requests


ua = UserAgent()
print(ua.chrome)
header = {'User-Agent':str(ua.chrome)}
print(header)
url = "https://www.hybrid-analysis.com/recent-submissions?filter=file&sort=^timestamp"
htmlContent = requests.get(url, headers=header)
print(htmlContent)

outPut:

输出：

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17
{'User-Agent': 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
<Response [200]>

Answer 4

回答by Daniel Butler

The root of the answer is that the person asking the question needs to have a JavaScript interpreter to get what they are after. What I have found is I am able to get all of the information I wanted on a website in json before it was interpreted by JavaScript. This has saved me a ton of time in what would be parsing html hoping each webpage is in the same format.

答案的根源在于提出问题的人需要有一个 JavaScript 解释器才能得到他们想要的东西。我发现我能够在被 JavaScript 解释之前以 json 格式在网站上获取我想要的所有信息。这为我在解析 html 时节省了大量时间，希望每个网页都采用相同的格式。

So when you get a response from a website using requests really look at the html/text because you might find the javascripts JSON in the footer ready to be parsed.

因此，当您使用请求从网站获得响应时，请真正查看 html/text，因为您可能会发现页脚中的 javascripts JSON 已准备好进行解析。

如何使用 Python 请求伪造浏览器访问？

提问by user1726366

采纳答案by alecxe

回答by Gilles Quenot

USAGE:

用法：

回答by Umesh Kaushik

回答by Daniel Butler

相关推荐

最近更新

标签

如何使用 Python 请求伪造浏览器访问？

提问by user1726366

采纳答案by alecxe

回答by Gilles Quenot

USAGE:

用法：

回答by Umesh Kaushik

回答by Daniel Butler

相关推荐

Python 如何在 Flask 页面之间传递变量？

Python Pandas：使用合并单元格读取 Excel

Python Django 管理命令参数

Python 链接到 Django 页面的最佳方式

相关推荐

最近更新

标签