bash 在 shell 脚本中获取第一个 Google 搜索结果的 URL
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5506561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting the URLs for the first Google search results in a shell script
提问by Lri
It's relatively easy to parse the output of the AJAX API using a scripting language:
使用脚本语言解析 AJAX API 的输出相对容易:
#!/usr/bin/env python
import urllib
import json
base = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&'
query = urllib.urlencode({'q' : "something"})
response = urllib.urlopen(base + query).read()
data = json.loads(response)
print data['responseData']['results'][0]['url']
But are there any better ways to do something similar with just basic shell scripting? If you just curled the API page, how should you encode the URL parameters or parse JSON?
但是有没有更好的方法可以只用基本的 shell 脚本来做类似的事情?如果你只是卷曲 API 页面,你应该如何编码 URL 参数或解析 JSON?
采纳答案by Lri
I ended up using curl's --data-urlencode option to encode the query parameter and just sed for extracting the first result.
我最终使用 curl 的 --data-urlencode 选项对查询参数进行编码,并仅使用 sed 来提取第一个结果。
curl -s --get --data-urlencode "q=example" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | sed 's/"unescapedUrl":"\([^"]*\).*/\1/;s/.*GwebSearch",//'
curl -s --get --data-urlencode "q=example" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | sed 's/"unescapedUrl":"\([^"]*\).*/\1/;s/.*GwebSearch",//'
回答by r-n
@Lri - Here is a script I personally use for my purpose of command line tools & scripts. It uses the command line utility "lynx" for dumping the URLs. Script can be downloaded from HEREand code view is HERE. Here is the code for your reference,
@Lri - 这是我个人用于命令行工具和脚本的脚本。它使用命令行实用程序“lynx”来转储 URL。脚本可以从这里下载,代码视图在这里。这是代码供您参考,
#!/bin/bash
clear
echo ""
echo ".=========================================================."
echo "| |"
echo "| COMMAND LINE GOOGLE SEARCH |"
echo "| --------------------------------------------------- |"
echo "| |"
echo "| Version: 1.0 |"
echo "| Developed by: Rishi Narang |"
echo "| Blog: www.wtfuzz.com |"
echo "| |"
echo "| Usage: ./gocmd.sh <search strings> |"
echo "| Example: ./gocmd.sh example and test |"
echo "| |"
echo ".=========================================================."
echo ""
if [ -z ]
then
echo "ERROR: No search string supplied."
echo "USAGE: ./gocmd.sh <search srting>"
echo ""
echo -n "Anyways for now, supply the search string here: "
read SEARCH
else
SEARCH=$@
fi
URL="http://google.com/search?hl=en&safe=off&q="
STRING=`echo $SEARCH | sed 's/ /%20/g'`
URI="$URL%22$STRING%22"
lynx -dump $URI > gone.tmp
sed 's/http/\^http/g' gone.tmp | tr -s "^" "\n" | grep http| sed 's/\ .*//g' > gtwo.tmp
rm gone.tmp
sed '/google.com/d' gtwo.tmp > urls
rm gtwo.tmp
echo "SUCCESS: Extracted `wc -l urls` and listed them in '`pwd`/urls' file for reference."
echo ""
cat urls
echo ""
#EOF
回答by once
many years later, you can install googler
多年后,你可以安装googler
googler -n 1 -c in -l en search something here --json
googler -n 1 -c in -l en search something here --json
you can control the number of output page using the n flag.
您可以使用 n 标志控制输出页面的数量。
To get only the url, simply pipe it to:
要仅获取 url,只需将其通过管道传输到:
grep "\"url\""|tr -s ' ' |cut -d ' ' -f3|tr -d "\""
回答by qwerty
Untested approach as I don't have access to a unix box currently ...
未经测试的方法,因为我目前无法访问 unix 框......
Assuming "test" is the query string, you could use a simple wget on the following url http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi=g10&aql=&oq=test&fp=3cc29334ffc8c2c
假设“test”是查询字符串,您可以在以下网址上使用简单的 wget http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi= g10&aql=&oq=test&fp=3cc29334ffc8c2c
This would leverage Google's "I'm feeling lucky" functionality and wget the first url for you. You may be able to clean up the above url a bit too.
这将利用 Google 的“我很幸运”功能并为您获取第一个 url。您也可以稍微清理一下上面的网址。
回答by katbyte
Lri's answer only returned the last result for me and i needed the top so I changed it to:
Lri 的回答只为我返回了最后一个结果,我需要顶部,所以我将其更改为:
JSON=$(curl -s --get --data-urlencode "q=QUERY STRING HERE" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | python -mjson.tool)
response=$(echo "$JSON" | sed -n -e 's/^.*responseStatus\": //p')
if [ $response -eq 200 ] ; then
url=$(echo "$JSON" | egrep "unescapedUrl" | sed -e '1!d' -e "s/^.*unescapedUrl\": \"//" -e "s/\".*$//")
echo "Success! [$url]"
wget $url;
else
echo "FAILED! [$response]"
fi
Its not as compact as I'd like but in a rush.
它不像我想要的那么紧凑,但很匆忙。
回答by neverlastn
Just for reference: By November 2013, you will need to replace the ajax.googleapis.com/ajax/services/search/webcalls completely.
仅供参考:到 2013 年 11 月,您将需要ajax.googleapis.com/ajax/services/search/web完全更换电话。
Most likely, it has to be replaced with Custom Search Engine (CSE). The problem is that you won't be able to get "global" results from CSE. Here is a nice tip on how to do this: http://groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ.
最有可能的是,它必须被自定义搜索引擎 (CSE) 替换。问题是您将无法从 CSE 获得“全局”结果。以下是有关如何执行此操作的一个很好的提示:http: //groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ。

