bash 在 shell 脚本中获取第一个 Google 搜索结果的 URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5506561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 23:43:17  来源:igfitidea点击:

Getting the URLs for the first Google search results in a shell script

bash

提问by Lri

It's relatively easy to parse the output of the AJAX API using a scripting language:

使用脚本语言解析 AJAX API 的输出相对容易:

#!/usr/bin/env python

import urllib
import json

base = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&'
query = urllib.urlencode({'q' : "something"})
response = urllib.urlopen(base + query).read()
data = json.loads(response)
print data['responseData']['results'][0]['url']

But are there any better ways to do something similar with just basic shell scripting? If you just curled the API page, how should you encode the URL parameters or parse JSON?

但是有没有更好的方法可以只用基本的 shell 脚本来做类似的事情?如果你只是卷曲 API 页面,你应该如何编码 URL 参数或解析 JSON?

采纳答案by Lri

I ended up using curl's --data-urlencode option to encode the query parameter and just sed for extracting the first result.

我最终使用 curl 的 --data-urlencode 选项对查询参数进行编码,并仅使用 sed 来提取第一个结果。

curl -s --get --data-urlencode "q=example" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | sed 's/"unescapedUrl":"\([^"]*\).*/\1/;s/.*GwebSearch",//'

curl -s --get --data-urlencode "q=example" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | sed 's/"unescapedUrl":"\([^"]*\).*/\1/;s/.*GwebSearch",//'

回答by r-n

@Lri - Here is a script I personally use for my purpose of command line tools & scripts. It uses the command line utility "lynx" for dumping the URLs. Script can be downloaded from HEREand code view is HERE. Here is the code for your reference,

@Lri - 这是我个人用于命令行工具和脚本的脚本。它使用命令行实用程序“lynx”来转储 URL。脚本可以从这里下载,代码视图在这里。这是代码供您参考,

#!/bin/bash

clear
echo ""
echo ".=========================================================."
echo "|                                                         |"
echo "|  COMMAND LINE GOOGLE SEARCH                             |"
echo "|  ---------------------------------------------------    |"
echo "|                                                         |"
echo "|  Version: 1.0                                           |"
echo "|  Developed by: Rishi Narang                             |"
echo "|  Blog: www.wtfuzz.com                                   |"
echo "|                                                         |"
echo "|  Usage: ./gocmd.sh <search strings>                     |"
echo "|  Example: ./gocmd.sh example and test                   |"
echo "|                                                         |"
echo ".=========================================================."
echo ""

if [ -z  ]
then
 echo "ERROR: No search string supplied."
 echo "USAGE: ./gocmd.sh <search srting>"
 echo ""
 echo -n "Anyways for now, supply the search string here: "
 read SEARCH
else
 SEARCH=$@
fi

URL="http://google.com/search?hl=en&safe=off&q="
STRING=`echo $SEARCH | sed 's/ /%20/g'`
URI="$URL%22$STRING%22"

lynx -dump $URI > gone.tmp
sed 's/http/\^http/g' gone.tmp | tr -s "^" "\n" | grep http| sed 's/\ .*//g' > gtwo.tmp
rm gone.tmp
sed '/google.com/d' gtwo.tmp > urls
rm gtwo.tmp

echo "SUCCESS: Extracted `wc -l urls` and listed them in '`pwd`/urls' file for reference."
echo ""
cat urls
echo ""

#EOF

回答by once

many years later, you can install googler

多年后,你可以安装googler

googler -n 1 -c in -l en search something here --json

googler -n 1 -c in -l en search something here --json

you can control the number of output page using the n flag.

您可以使用 n 标志控制输出页面的数量。

To get only the url, simply pipe it to:

要仅获取 url,只需将其通过管道传输到:

grep "\"url\""|tr -s ' ' |cut -d ' ' -f3|tr -d "\""

回答by qwerty

Untested approach as I don't have access to a unix box currently ...

未经测试的方法,因为我目前无法访问 unix 框......

Assuming "test" is the query string, you could use a simple wget on the following url http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi=g10&aql=&oq=test&fp=3cc29334ffc8c2c

假设“test”是查询字符串,您可以在以下网址上使用简单的 wget http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi= g10&aql=&oq=test&fp=3cc29334ffc8c2c

This would leverage Google's "I'm feeling lucky" functionality and wget the first url for you. You may be able to clean up the above url a bit too.

这将利用 Google 的“我很幸运”功能并为您获取第一个 url。您也可以稍微清理一下上面的网址。

回答by katbyte

Lri's answer only returned the last result for me and i needed the top so I changed it to:

Lri 的回答只为我返回了最后一个结果,我需要顶部,所以我将其更改为:

JSON=$(curl -s --get --data-urlencode "q=QUERY STRING HERE" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | python -mjson.tool)
response=$(echo "$JSON" | sed -n -e 's/^.*responseStatus\": //p')
if [ $response -eq 200 ] ; then 
    url=$(echo "$JSON" | egrep "unescapedUrl" | sed -e '1!d' -e "s/^.*unescapedUrl\": \"//" -e "s/\".*$//")
    echo "Success! [$url]"
    wget $url;
else 
    echo "FAILED! [$response]" 
fi

Its not as compact as I'd like but in a rush.

它不像我想要的那么紧凑,但很匆忙。

回答by neverlastn

Just for reference: By November 2013, you will need to replace the ajax.googleapis.com/ajax/services/search/webcalls completely.

仅供参考:到 2013 年 11 月,您将需要ajax.googleapis.com/ajax/services/search/web完全更换电话。

Most likely, it has to be replaced with Custom Search Engine (CSE). The problem is that you won't be able to get "global" results from CSE. Here is a nice tip on how to do this: http://groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ.

最有可能的是,它必须被自定义搜索引擎 (CSE) 替换。问题是您将无法从 CSE 获得“全局”结果。以下是有关如何执行此操作的一个很好的提示:http: //groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ