java 从 Google 网络历史记录中检索旧搜索
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4332913/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Retrieve old searches from Google web history
提问by Pratik
I want to retrieve old Google searches which I did a few years/months back and that are present in Google web history. How can I programmatically retrieve them all?
我想检索我几年/几个月前进行的旧 Google 搜索,这些搜索存在于 Google 网络历史记录中。如何以编程方式检索它们?
https://www.google.com/history/?output=rssonly provides recent Google searches, but not all of them.
https://www.google.com/history/?output=rss仅提供最近的 Google 搜索,但不是全部。
Also this question : How can I retrieve my Google search history?doesn't provide any answer for my question!
还有这个问题:如何检索我的 Google 搜索历史记录?没有为我的问题提供任何答案!
回答by BalusC
You can pass month, day and year as parameters to obtain history of a specific day.
您可以传递月、日和年作为参数来获取特定日期的历史记录。
E.g. https://www.google.com/history/lookup?month=12&day=1&yr=2010&output=rssfor Dec, 1 2010.
例如https://www.google.com/history/lookup?month=12&day=1&yr=2010&output=rss为 2010 年 12 月 1 日。
There are no ways to obtain history for a full month or year, let alone the entire history. But this information about the parameters must at least enable you to obtain the entire history in some loop which goes one day further back in the time everytime. Be carecul that you don't leech too much in a too short time.
没有办法获得整月或整年的历史,更不用说整个历史了。但是,关于参数的这些信息必须至少使您能够在某个循环中获得整个历史,每次都将时间往前推一天。注意不要在太短的时间内吸得太多。
回答by user530477
You really need to parse HTML page by page and then fetch your data, because i dont think there is any alternative!
您确实需要逐页解析 HTML,然后获取您的数据,因为我认为没有其他选择!
回答by peter.murray.rust
I think this will be very difficult.
我认为这将是非常困难的。
I know this doesn't answer you question completely but at least the web pages may be preserved. There are organizations and tools that allow you to recreate web pages from past dates - see for example http://www.mementoweb.org/.
我知道这并不能完全回答您的问题,但至少可以保留网页。有一些组织和工具允许您从过去的日期重新创建网页 - 例如参见http://www.mementoweb.org/。
UPDATE: I have just learnt that Memento has won a digital preservation award (http://www.dpconline.org/newsroom)
更新:我刚刚得知 Memento 获得了数字保存奖 (http://www.dpconline.org/newsroom)
回答by Jake Stevens-Haas
I know you're not looking to go back through every page, but you don't really need to parse the whole page, just look for the html that always precedes an entry. From me just starting up google web history and doing some simple searches, if you look through a history page, each String that you've searched follows: <td style="padding:3px 0"><table id=bkmk_view_ class=noborder ><tr><td><table class="elem noborder"><tr><td class="grey" nowrap>Searched for </td><td nowrap><a title="http://www.google.com/search?q=
and is followed by &
(ampersand). This sequence of preceding html is unique on the page, only occuring when historical search terms are listed.
我知道您不希望返回每个页面,但您实际上并不需要解析整个页面,只需查找始终位于条目之前的 html。从我刚刚启动 google 网络历史记录并进行一些简单的搜索开始,如果您浏览历史记录页面,您搜索的每个字符串都如下:<td style="padding:3px 0"><table id=bkmk_view_ class=noborder ><tr><td><table class="elem noborder"><tr><td class="grey" nowrap>Searched for </td><td nowrap><a title="http://www.google.com/search?q=
并且后跟&
(&符号)。这个前面的 html 序列在页面上是唯一的,只有在列出历史搜索词时才会出现。
If you use two terms, you get a + in between the terms. Other conventions for different searching modes, I didn't go through them all.
如果您使用两个术语,则会在术语之间得到一个 +。不同搜索模式的其他约定,我没有全部介绍。
It looks like if you use BalusC's method to pass parameters, then you could retreive the html, search the document for the string I mentioned(be sure to \" and other special characters), then copy the next String until you reach a & character. Then, all you need to do is parse your search term, not the whole page. Go through source code until you reach the end, then go to your next iteration in the loop.
看起来如果你用BalusC的方法传参数,那么你可以检索html,在文档中搜索我提到的字符串(一定要\"等特殊字符),然后复制下一个String,直到遇到&字符. 然后,您需要做的就是解析您的搜索词,而不是整个页面。通过源代码直到到达结尾,然后在循环中进行下一次迭代。
回答by Pratik
static void GetGoogleWebHistory(int month, int day, int yr, string UserName, string Pass)
{
string iURL = "http://www.google.com/history/lookup?month=" + month + "&day=" + day + "&yr=" + yr + "&output=rss";
WebClient client = new WebClient();
GDataCredentials gdc = new GDataCredentials(UserName, Pass);
RequestSettings rs = new RequestSettings(Guid.NewGuid().ToString(), gdc);
XmlDocument XDoc = new XmlDocument();
XDoc.LoadXml(client.DownloadString(iURL));
}