database 如何解析来自 Google Alerts 的数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/860442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:18:51  来源:igfitidea点击:

How to parse the data from Google Alerts?

databaseinformation-retrievalgoogle-alerts

提问by John Scipione

Firstly, How would you get Google Alerts information into a database other than to parse the text of the email message that Google sends you?

首先,除了解析 Google 发送给您的电子邮件的文本之外,您将如何将 Google Alerts 信息输入到数据库中?

It seems that there is no Google Alerts API.

似乎没有 Google Alerts API。

If you must parse text, how would you go about parsing out the relevant pieces of the email message?

如果您必须解析文本,您将如何解析电子邮件的相关部分?

回答by Bill Karwin

When you create the alert, set the "Deliver To" to "Feed" and then you can consume the feed XML as you would any other feed. This is much easier to parse and digest into a database.

创建警报时,将“交付到”设置为“供稿”,然后您就可以像使用任何其他供稿一样使用供稿 XML。这更容易解析和消化到数据库中。

回答by user1279525

class googleAlerts{
    public function createAlert($alert){
        $USERNAME = '[email protected]';
        $PASSWORD = 'YYYYYY';
        $COOKIEFILE = 'cookies.txt';

        $ch = curl_init();
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_COOKIEJAR, $COOKIEFILE);
        curl_setopt($ch, CURLOPT_COOKIEFILE, $COOKIEFILE);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
        curl_setopt($ch, CURLOPT_TIMEOUT, 120);

        curl_setopt($ch, CURLOPT_URL,
            'https://accounts.google.com/ServiceLogin?hl=en&service=alerts&continue=http://www.google.com/alerts/manage');
        $data = curl_exec($ch);

        $formFields = $this->getFormFields($data);

        $formFields['Email']  = $USERNAME;
        $formFields['Passwd'] = $PASSWORD;
        unset($formFields['PersistentCookie']);

        $post_string = '';
        foreach($formFields as $key => $value) {
            $post_string .= $key . '=' . urlencode($value) . '&';
        }

        $post_string = substr($post_string, 0, -1);

        curl_setopt($ch, CURLOPT_URL, 'https://accounts.google.com/ServiceLoginAuth');
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);

        $result = curl_exec($ch);

        if (strpos($result, '<title>') === false) {
            return false;

        } else {
            curl_setopt($ch, CURLOPT_URL, 'http://www.google.com/alerts');
            curl_setopt($ch, CURLOPT_POST, 0);
            curl_setopt($ch, CURLOPT_POSTFIELDS, null);

            $result = curl_exec($ch);

            curl_setopt($ch, CURLOPT_URL, 'http://www.google.com/alerts/create');
            curl_setopt($ch, CURLOPT_POST, 0);
            $result = curl_exec($ch);
            //var_dump($result);
            $result = $this->getFormFieldsCreate($result);
            $result['q'] = $alert;
            $result['t'] = '7';
            $result['f'] = '1';
            $result['l'] = '0';
            $result['e'] = 'feed';
            unset($result['PersistentCookie']);

            $post_string = '';
            foreach($result as $key => $value) {
                $post_string .= $key . '=' . urlencode($value) . '&';
            }

            $post_string = substr($post_string, 0, -1);
            curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
            $result = curl_exec($ch);
            curl_setopt($ch, CURLOPT_URL, 'http://www.google.com/alerts/manage');
            $result = curl_exec($ch);
            if (preg_match_all('%'.$alert.'(?=</a>).*?<a href=[\'"]http://www.google.com/alerts/feeds/([^\'"]+)%i', $result, $matches)) {
                return ('http://www.google.com/alerts/feeds/'.$matches[1][0]);
            } else {
                return false;
            }


        }
    }

    private function getFormFields($data)
    {
        if (preg_match('/(<form.*?id=.?gaia_loginform.*?<\/form>)/is', $data, $matches)) {
            $inputs = $this->getInputs($matches[1]);

            return $inputs;
        } else {
            die('didnt find login form');
        }
    }
    private function getFormFieldsCreate($data)
    {
        if (preg_match('/(<form.*?name=.?.*?<\/form>)/is', $data, $matches)) {
            $inputs = $this->getInputs($matches[1]);

            return $inputs;
        } else {
            die('didnt find login form1');
        }
    }


    private function getInputs($form)
    {
        $inputs = array();

        $elements = preg_match_all('/(<input[^>]+>)/is', $form, $matches);

        if ($elements > 0) {
            for($i = 0; $i < $elements; $i++) {
                $el = preg_replace('/\s{2,}/', ' ', $matches[1][$i]);

                if (preg_match('/name=(?:["\'])?([^"\'\s]*)/i', $el, $name)) {
                    $name  = $name[1];
                    $value = '';

                    if (preg_match('/value=(?:["\'])?([^"\'\s]*)/i', $el, $value)) {
                        $value = $value[1];
                    }

                    $inputs[$name] = $value;
                }
            }
        }

        return $inputs;
    }
}
$alert = new googleAlerts;
echo $alert->createAlert('YOUR ALERT');

It will return link to rss feed of your newly created alert

它将返回指向您新创建的警报的 RSS 提要的链接

回答by Bob Ray

I found a Google Alerts API here. It's pretty minimal and I haven't tested it.

我在这里找到了一个 Google Alerts API 。它非常小,我还没有测试过。