使用 Python 要求网页运行搜索

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13962006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:02:29  来源:igfitidea点击:

Using Python to ask a web page to run a search

pythonsearchweb

提问by Uncle_Dick

I have a list of protein names in the "Uniprot" format, and I'd like to convert them all to the MGI format. If you go to www.uniprot.org and type the uniprot protein name into the "Query" bar, it will generate a page with a bunch of information about that protein, including its MGI name (albeit much further down the page).

我有一个“Uniprot”格式的蛋白质名称列表,我想将它们全部转换为 MGI 格式。如果您访问 www.uniprot.org 并在“查询”栏中键入 uniprot 蛋白质名称,它将生成一个包含有关该蛋白质的一堆信息的页面,包括其 MGI 名称(尽管在页面下方)。

For example, one Uniprot name is "Q9D880", and by scrolling down, you can see that its corresponding MGI name is "1913775".

例如,一个 Uniprot 名称是“Q9D880”,向下滚动可以看到其对应的 MGI 名称是“1913775”。

I already know how to use Python's urllib to extract the MGI name from a page once I get to that page. What I don'tknow how to do is write Python code to get the main page to run a query of "Q9D880". My list contains 270 protein names, so it would be nice to avoid copying&pasting each protein name into the Query bar.

一旦我到达该页面,我已经知道如何使用 Python 的 urllib 从该页面中提取 MGI 名称。我知道如何做的是编写 Python 代码来获取主页以运行“Q9D880”的查询。我的列表包含 270 个蛋白质名称,因此最好避免将每个蛋白质名称复制并粘贴到查询栏中。

I saw the "Google Search from a Python App" post, and I have a firmer understanding of this concept, but I suspect that running a google search is different from running the search function on some other website, like uniprot.org.

我看到了“Google Search from a Python App”的帖子,对这个概念有了更深刻的理解,但我怀疑运行 google 搜索与在其他网站上运行搜索功能不同,比如 uniprot.org。

I'm running Python 2.7.2, but I'm open to implementing solutions that use other versions of Python. Thanks for the help!

我正在运行 Python 2.7.2,但我愿意实施使用其他版本 Python 的解决方案。谢谢您的帮助!

回答by Silas Ray

Running the search appears to do a GET on

运行搜索似乎在执行 GET

http://www.uniprot.org/?dataset=uniprot&query=Q9D880&sort=score&url=&lucky=no&random=no

Which eventually redirects you to

最终将您重定向到

http://www.uniprot.org/uniprot/Q9D880

So you should be able to use urllibor an http library (I use httplib2) to do a GET on that address, parameterizing the protein name in the URL so you can search for whichever protein name you want.

因此,您应该能够使用urllib或 http 库(我使用httplib2)对该地址执行 GET,参数化 URL 中的蛋白质名称,以便您可以搜索所需的任何蛋白质名称。

回答by jdotjdot

Easier way to do this is with the requestslibrary. My solution for you also grabs the information itself from the page using BeautifulSoup4.

更简单的方法是使用requests图书馆。我的解决方案还使用 BeautifulSoup4 从页面中获取信息本身。

All you'd have to do, given a dictionary of your query parameters, is:

所有你必须做的,因为你的查询参数的字典是:

from bs4 import BeautifulSoup as BS
for protein in my_protein_list:
    text = requests.get('http://www.uniprot.org/uniprot/' + protein).text
    soup = BS(text)
    MGI = soup.find(name='a', onclick="UniProt.analytics('DR-lines', 'click', 'DR-MGI');").text
    MGI = MGI[4:]
    print protein +' - ' + MGI

回答by Anonymous

The query is in the URL, you can call:
http://www.uniprot.org/uniprot/?query=1913775&sort=score

查询在网址,可以调用:http: //www.uniprot.org/uniprot/?query=1913775&
sort=score

I didn't have time to test this script since I don't have 2.x installed anymore butthe code in 2.x should be something like this:

我没有时间测试这个脚本,因为我没有安装 2.x 但 2.x 中的代码应该是这样的:

import urllib
MGIName = "1913775"
print urllib.urlopen(
    "http://www.uniprot.org/uniprot/?query="+ MGIName +"&sort=score").read()

The code in 3.2 I ran was this and it worked fine:

我运行的 3.2 中的代码是这样的,它运行良好:

>>> import urllib.request
>>> MGIName = "1913775"
>>> print(urllib.request.urlopen("http://www.uniprot.org/uniprot/?query="+ MGIName +"&sort=score").read())

Just loop the MGIname over the list of names

只需在名称列表上循环 MGIname

回答by Bryan

You can also do this with PyQuery:

你也可以这样做PyQuery

>>> from pyquery import PyQuery as pq    
>>> url = "http://www.uniprot.org/uniprot/{name}"
>>> name = "Q9D880"
>>> response = pq(url=url.format(name=name))
>>> print html("a").filter(lambda e: pq(this).text().startswith("MGI:")).text()
MGI:1913775