使用 Python 要求网页运行搜索

Question

提问by Uncle_Dick

I have a list of protein names in the "Uniprot" format, and I'd like to convert them all to the MGI format. If you go to www.uniprot.org and type the uniprot protein name into the "Query" bar, it will generate a page with a bunch of information about that protein, including its MGI name (albeit much further down the page).

我有一个“Uniprot”格式的蛋白质名称列表，我想将它们全部转换为 MGI 格式。如果您访问 www.uniprot.org 并在“查询”栏中键入 uniprot 蛋白质名称，它将生成一个包含有关该蛋白质的一堆信息的页面，包括其 MGI 名称（尽管在页面下方）。

For example, one Uniprot name is "Q9D880", and by scrolling down, you can see that its corresponding MGI name is "1913775".

例如，一个 Uniprot 名称是“Q9D880”，向下滚动可以看到其对应的 MGI 名称是“1913775”。

I already know how to use Python's urllib to extract the MGI name from a page once I get to that page. What I don'tknow how to do is write Python code to get the main page to run a query of "Q9D880". My list contains 270 protein names, so it would be nice to avoid copying&pasting each protein name into the Query bar.

一旦我到达该页面，我已经知道如何使用 Python 的 urllib 从该页面中提取 MGI 名称。我不知道如何做的是编写 Python 代码来获取主页以运行“Q9D880”的查询。我的列表包含 270 个蛋白质名称，因此最好避免将每个蛋白质名称复制并粘贴到查询栏中。

I saw the "Google Search from a Python App" post, and I have a firmer understanding of this concept, but I suspect that running a google search is different from running the search function on some other website, like uniprot.org.

我看到了“Google Search from a Python App”的帖子，对这个概念有了更深刻的理解，但我怀疑运行 google 搜索与在其他网站上运行搜索功能不同，比如 uniprot.org。

I'm running Python 2.7.2, but I'm open to implementing solutions that use other versions of Python. Thanks for the help!

我正在运行 Python 2.7.2，但我愿意实施使用其他版本 Python 的解决方案。谢谢您的帮助！

Answer 1

回答by Silas Ray

Running the search appears to do a GET on

运行搜索似乎在执行 GET

http://www.uniprot.org/?dataset=uniprot&query=Q9D880&sort=score&url=&lucky=no&random=no

Which eventually redirects you to

最终将您重定向到

http://www.uniprot.org/uniprot/Q9D880

So you should be able to use urllibor an http library (I use httplib2) to do a GET on that address, parameterizing the protein name in the URL so you can search for whichever protein name you want.

因此，您应该能够使用urllib或 http 库（我使用httplib2）对该地址执行 GET，参数化 URL 中的蛋白质名称，以便您可以搜索所需的任何蛋白质名称。

Answer 2

回答by jdotjdot

Easier way to do this is with the requestslibrary. My solution for you also grabs the information itself from the page using BeautifulSoup4.

更简单的方法是使用requests图书馆。我的解决方案还使用 BeautifulSoup4 从页面中获取信息本身。

All you'd have to do, given a dictionary of your query parameters, is:

所有你必须做的，因为你的查询参数的字典是：

from bs4 import BeautifulSoup as BS
for protein in my_protein_list:
    text = requests.get('http://www.uniprot.org/uniprot/' + protein).text
    soup = BS(text)
    MGI = soup.find(name='a', onclick="UniProt.analytics('DR-lines', 'click', 'DR-MGI');").text
    MGI = MGI[4:]
    print protein +' - ' + MGI

Answer 3

回答by Anonymous

The query is in the URL, you can call:
http://www.uniprot.org/uniprot/?query=1913775&sort=score

查询在网址，可以调用：http: //www.uniprot.org/uniprot/?query=1913775&
sort=score

I didn't have time to test this script since I don't have 2.x installed anymore butthe code in 2.x should be something like this:

我没有时间测试这个脚本，因为我没有安装 2.x 但 2.x 中的代码应该是这样的：

import urllib
MGIName = "1913775"
print urllib.urlopen(
    "http://www.uniprot.org/uniprot/?query="+ MGIName +"&sort=score").read()

The code in 3.2 I ran was this and it worked fine:

我运行的 3.2 中的代码是这样的，它运行良好：

>>> import urllib.request
>>> MGIName = "1913775"
>>> print(urllib.request.urlopen("http://www.uniprot.org/uniprot/?query="+ MGIName +"&sort=score").read())

Just loop the MGIname over the list of names

只需在名称列表上循环 MGIname

Answer 4

回答by Bryan

You can also do this with PyQuery:

你也可以这样做PyQuery：

>>> from pyquery import PyQuery as pq    
>>> url = "http://www.uniprot.org/uniprot/{name}"
>>> name = "Q9D880"
>>> response = pq(url=url.format(name=name))
>>> print html("a").filter(lambda e: pq(this).text().startswith("MGI:")).text()
MGI:1913775

使用 Python 要求网页运行搜索

提问by Uncle_Dick

回答by Silas Ray

回答by jdotjdot

回答by Anonymous

回答by Bryan

相关推荐

最近更新

标签

使用 Python 要求网页运行搜索

提问by Uncle_Dick

回答by Silas Ray

回答by jdotjdot

回答by Anonymous

回答by Bryan

相关推荐

我如何在scrapy python中使用多个请求并在它们之间传递项目

Python 如何使用 pip 安装特定版本的软件包？

Python - 数据框的维度

检查数字是否在python中的某个范围内（带循环）？

相关推荐

最近更新

标签