python 程序化表单提交

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/393738/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:00:23  来源:igfitidea点击:

Programmatic Form Submit

pythonformsscreen-scrapingsubmit

提问by

I want to scrape the contents of a webpage. The contents are produced after a form on that site has been filled in and submitted.

我想抓取网页的内容。内容是在填写并提交该网站上的表格后生成的。

I've read on how to scrape the end result content/webpage - but how to I programmatically submit the form?

我已经阅读了如何抓取最终结果内容/网页 - 但如何以编程方式提交表单?

I'm using python and have read that I might need to get the original webpage with the form, parse it, get the form parameters and then do X?

我正在使用 python 并且读到我可能需要获取带有表单的原始网页,解析它,获取表单参数然后执行 X?

Can anyone point me in the rigth direction?

谁能指出我正确的方向?

回答by Cogsy

you'll need to generate a HTTP request containing the data for the form.

您需要生成一个包含表单数据的 HTTP 请求。

The form will look something like:

表格将类似于:

<form action="submit.php" method="POST"> ... </form>

This tells you the url to request is www.example.com/submit.php and your request should be a POST.

这告诉您要请求的 url 是 www.example.com/submit.php 并且您的请求应该是 POST。

In the form will be several input items, eg:

表单中有几个输入项,例如:

<input type="text" name="itemnumber"> ... </input>

you need to create a string of all these input name=value pairs encoded for a URL appended to the end of your requested URL, which now becomes www.example.com/submit.php?itemnumber=5234&otherinput=othervalue etc... This will work fine for GET. POST is a little trickier.

您需要为附加到您请求的 URL 末尾的 URL 创建一个所有这些输入名称 = 值对的字符串,现在变成 www.example.com/submit.php?itemnumber=5234&otherinput=othervalue 等......这对于 GET 会正常工作。POST 有点棘手。

</motivation>

Just follow S.Lott's links for some much easier to use library support :P

只需按照 S.Lott 的链接获取一些更易于使用的库支持:P

回答by user49117

Using python, I think it takes the following steps:

使用python,我认为它需要以下步骤:

  1. parse the web page that contains the form, find out the form submit address, and the submit method ("post" or "get").
  1. 解析包含表单的网页,找出表单提交地址,以及提交方法(“post”或“get”)。

this explains form elements in html file

这解释了 html 文件中的表单元素

  1. Use urllib2 to submit the form. You may need some functions like "urlencode", "quote" from urllib to generate the url and data for post method. Read the library doc for details.
  1. 使用 urllib2 提交表单。您可能需要一些函数,如 urllib 中的“urlencode”、“quote”来生成 post 方法的 url 和数据。阅读图书馆文档了解详细信息。

回答by gimel

From a similar question - options-for-html-scraping- you can learn that with Python you can use Beautiful Soup.

从一个类似的问题 - options-for-html-scraping- 您可以了解到使用 Python 可以使用Beautiful Soup

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:

  1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
  2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
  3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.

Beautiful Soup 是一个 Python HTML/XML 解析器,专为屏幕抓取等快速周转项目而设计。三大特点使其功能强大:

  1. 如果你给它不好的标记,Beautiful Soup 不会窒息。它生成的解析树与原始文档的意义大致相同。这通常足以收集您需要的数据并逃跑。
  2. Beautiful Soup 提供了一些用于导航、搜索和修改解析树的简单方法和 Pythonic 习惯用法:一个用于剖析文档并提取所需内容的工具包。您不必为每个应用程序创建自定义解析器。
  3. Beautiful Soup 自动将传入文档转换为 Unicode,将传出文档转换为 UTF-8。您不必考虑编码,除非文档未指定编码并且 Beautiful Soup 无法自动检测编码。然后你只需要指定原始编码。

The unusual name caught the attention of our host, November 12, 2008.

2008 年 11 月 12 日,这个不寻常的名字引起了我们的主持人的注意

回答by Joao da Silva

You can do it with javascript. If the form is something like:

你可以用javascript来做。如果表格是这样的:

<form name='myform' ...

Then you can do this in javascript:

然后你可以在javascript中做到这一点:

<script language="JavaScript">
function submitform()
{
document.myform.submit();
}
</script> 

You can use the "onClick" attribute of links or buttons to invoke this code. To invoke it automatically when a page is loaded, use the "onLoad" attribute of the element:

您可以使用链接或按钮的“onClick”属性来调用此代码。要在页面加载时自动调用它,请使用元素的“onLoad”属性:

<body onLoad="submitform()" ...>