如何使用 Python 搜索互联网?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15798878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:02:23  来源:igfitidea点击:

How to search internet with Python?

pythonsearch

提问by

I want to write a program that searches through a fairly large website and extracts certain things. I've had a couple online Python courses, but neither said anything about how to access the internet with Python. I have no idea where I ought to start with this.

我想编写一个程序来搜索一个相当大的网站并提取某些内容。我有几个在线 Python 课程,但都没有提到如何使用 Python 访问互联网。我不知道我应该从哪里开始。

采纳答案by Chakib

You have first to read about the standard python library urllib2.

您必须首先阅读标准 python 库urllib2

Once you are comfortable with the basic ideas behind this lib you can try requestswhich is much easier to interact with the web especially APIs. I suggest using it in parallel with httpieto test out queries quick and dirty from command line.

一旦您熟悉了这个库背后的基本思想,您就可以尝试更容易与网络交互的请求,尤其是 API。我建议将它与httpie并行使用以从命令行快速测试查询。

If you go a little further building a librairy or an engine to crawl the web you will need some sort of asynchronous programming, I recommend starting with Gevent

如果你更进一步构建一个库或一个引擎来爬网,你将需要某种异步编程,我建议从Gevent开始

Finally, if you want to create a crawler/bot you can take a look at Scrapy. You should however start with basic libraries before diving into this one as it can get quite complex

最后,如果你想创建一个爬虫/机器人,你可以看看Scrapy。但是,在深入研究这个库之前,您应该从基本库开始,因为它可能会变得非常复杂

回答by actkatiemacias

It sounds like you want a web crawler/scraper. What sorts of things do you want to pull? Images? Links? Just the job for a web crawler/scraper.

听起来您想要一个网络爬虫/抓取工具。你想拉什么类型的东西?图片?链接?只是网络爬虫/抓取工具的工作。

Start there, there should be lots of articles on Stackoverflow that will help you implement details such as connecting to the internet (getting a web response).

从那里开始,应该有很多关于 Stackoverflow 的文章可以帮助您实现细节,例如连接到互联网(获得网络响应)。

See thisarticle.

看到这篇文章。

回答by Achim

There is much more in the internet than just websites, but I assume that you just want to crawl some html pages and extract data from them. You have many many options to solve that problem. Just some starting points:

互联网上的内容远不止网站,但我假设您只想抓取一些 html 页面并从中提取数据。你有很多选择来解决这个问题。只是一些起点: