bash 在 shell 脚本中获取第一个 Google 搜索结果的 URL

Question

提问by Lri

It's relatively easy to parse the output of the AJAX API using a scripting language:

使用脚本语言解析 AJAX API 的输出相对容易：

#!/usr/bin/env python

import urllib
import json

base = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&'
query = urllib.urlencode({'q' : "something"})
response = urllib.urlopen(base + query).read()
data = json.loads(response)
print data['responseData']['results'][0]['url']

But are there any better ways to do something similar with just basic shell scripting? If you just curled the API page, how should you encode the URL parameters or parse JSON?

但是有没有更好的方法可以只用基本的 shell 脚本来做类似的事情？如果你只是卷曲 API 页面，你应该如何编码 URL 参数或解析 JSON？

Answer 1

采纳答案by Lri

I ended up using curl's --data-urlencode option to encode the query parameter and just sed for extracting the first result.

我最终使用 curl 的 --data-urlencode 选项对查询参数进行编码，并仅使用 sed 来提取第一个结果。

curl -s --get --data-urlencode "q=example" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | sed 's/"unescapedUrl":"\([^"]*\).*/\1/;s/.*GwebSearch",//'

Answer 2

回答by r-n

@Lri - Here is a script I personally use for my purpose of command line tools & scripts. It uses the command line utility "lynx" for dumping the URLs. Script can be downloaded from HEREand code view is HERE. Here is the code for your reference,

@Lri - 这是我个人用于命令行工具和脚本的脚本。它使用命令行实用程序“lynx”来转储 URL。脚本可以从这里下载，代码视图在这里。这是代码供您参考，

#!/bin/bash

clear
echo ""
echo ".=========================================================."
echo "|                                                         |"
echo "|  COMMAND LINE GOOGLE SEARCH                             |"
echo "|  ---------------------------------------------------    |"
echo "|                                                         |"
echo "|  Version: 1.0                                           |"
echo "|  Developed by: Rishi Narang                             |"
echo "|  Blog: www.wtfuzz.com                                   |"
echo "|                                                         |"
echo "|  Usage: ./gocmd.sh <search strings>                     |"
echo "|  Example: ./gocmd.sh example and test                   |"
echo "|                                                         |"
echo ".=========================================================."
echo ""

if [ -z  ]
then
 echo "ERROR: No search string supplied."
 echo "USAGE: ./gocmd.sh <search srting>"
 echo ""
 echo -n "Anyways for now, supply the search string here: "
 read SEARCH
else
 SEARCH=$@
fi

URL="http://google.com/search?hl=en&safe=off&q="
STRING=`echo $SEARCH | sed 's/ /%20/g'`
URI="$URL%22$STRING%22"

lynx -dump $URI > gone.tmp
sed 's/http/\^http/g' gone.tmp | tr -s "^" "\n" | grep http| sed 's/\ .*//g' > gtwo.tmp
rm gone.tmp
sed '/google.com/d' gtwo.tmp > urls
rm gtwo.tmp

echo "SUCCESS: Extracted `wc -l urls` and listed them in '`pwd`/urls' file for reference."
echo ""
cat urls
echo ""

#EOF

Answer 3

回答by once

many years later, you can install googler

多年后，你可以安装googler

googler -n 1 -c in -l en search something here --json

you can control the number of output page using the n flag.

您可以使用 n 标志控制输出页面的数量。

To get only the url, simply pipe it to:

要仅获取 url，只需将其通过管道传输到：

grep "\"url\""|tr -s ' ' |cut -d ' ' -f3|tr -d "\""

Answer 4

回答by qwerty

Untested approach as I don't have access to a unix box currently ...

未经测试的方法，因为我目前无法访问 unix 框......

Assuming "test" is the query string, you could use a simple wget on the following url http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi=g10&aql=&oq=test&fp=3cc29334ffc8c2c

假设“test”是查询字符串，您可以在以下网址上使用简单的 wget http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi= g10&aql=&oq=test&fp=3cc29334ffc8c2c

This would leverage Google's "I'm feeling lucky" functionality and wget the first url for you. You may be able to clean up the above url a bit too.

这将利用 Google 的“我很幸运”功能并为您获取第一个 url。您也可以稍微清理一下上面的网址。

Answer 5

回答by katbyte

Lri's answer only returned the last result for me and i needed the top so I changed it to:

Lri 的回答只为我返回了最后一个结果，我需要顶部，所以我将其更改为：

JSON=$(curl -s --get --data-urlencode "q=QUERY STRING HERE" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | python -mjson.tool)
response=$(echo "$JSON" | sed -n -e 's/^.*responseStatus\": //p')
if [ $response -eq 200 ] ; then 
    url=$(echo "$JSON" | egrep "unescapedUrl" | sed -e '1!d' -e "s/^.*unescapedUrl\": \"//" -e "s/\".*$//")
    echo "Success! [$url]"
    wget $url;
else 
    echo "FAILED! [$response]" 
fi

Its not as compact as I'd like but in a rush.

它不像我想要的那么紧凑，但很匆忙。

Answer 6

回答by neverlastn

Just for reference: By November 2013, you will need to replace the ajax.googleapis.com/ajax/services/search/webcalls completely.

仅供参考：到 2013 年 11 月，您将需要ajax.googleapis.com/ajax/services/search/web完全更换电话。

Most likely, it has to be replaced with Custom Search Engine (CSE). The problem is that you won't be able to get "global" results from CSE. Here is a nice tip on how to do this: http://groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ.

最有可能的是，它必须被自定义搜索引擎 (CSE) 替换。问题是您将无法从 CSE 获得“全局”结果。以下是有关如何执行此操作的一个很好的提示：http: //groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ。

bash 在 shell 脚本中获取第一个 Google 搜索结果的 URL

提问by Lri

采纳答案by Lri

回答by r-n

回答by once

回答by qwerty

回答by katbyte

回答by neverlastn

相关推荐

最近更新

标签

bash 在 shell 脚本中获取第一个 Google 搜索结果的 URL

提问by Lri

采纳答案by Lri

回答by r-n

回答by once

回答by qwerty

回答by katbyte

回答by neverlastn

相关推荐

bash 如何通过SSH查找特定目录中存在的文件

bash OS X 终端命令创建一个以当前日期命名的文件

为什么 RVM 不能像在交互式 shell 中那样在 bash 脚本中工作？

bash Linux 输出路径错误。如何恢复 ~./bashrc 文件

相关推荐

最近更新

标签