C++ 如何获取网页内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1053099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 18:34:47  来源:igfitidea点击:

How can i get content of web-page

c++qtwebpage

提问by Max Frai

i'm trying to get web-page data in string that than i could parse it. I didn't found any methods in qwebview, qurl and another. Could you help me? Linux, C++, Qt.

我试图在字符串中获取网页数据,而不是我可以解析它。我没有在 qwebview、qurl 和另一个中找到任何方法。你可以帮帮我吗?Linux、C++、Qt。

EDIT:

编辑:

Thanks for help. Code is working, but some pages after downloading have broken charset. I tried something like this to repair it:

感谢帮助。代码正在运行,但下载后的某些页面已损坏字符集。我尝试过这样的事情来修复它:

QNetworkRequest *request = new QNetworkRequest(QUrl("http://ru.wiktionary.org/wiki/bovo"));

request->setRawHeader( "User-Agent", "Mozilla/5.0 (X11; U; Linux i686 (x86_64); "
                       "en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1" );
request->setRawHeader( "Accept-Charset", "win1251,utf-8;q=0.7,*;q=0.7" );
request->setRawHeader( "charset", "utf-8" );
request->setRawHeader( "Connection", "keep-alive" );

manager->get(*request);

Any results =(.

任何结果=(。

回答by Paul Dixon

Have you looked at QNetworkAccessManager? Here's a rough and ready sample illustrating usage:

你看过QNetworkAccessManager吗?这是一个说明用法的粗略和准备好的示例:

class MyClass : public QObject
{
Q_OBJECT

public:
    MyClass();
    void fetch(); 

public slots:
    void replyFinished(QNetworkReply*);

private:
    QNetworkAccessManager* m_manager;
};


MyClass::MyClass()
{
    m_manager = new QNetworkAccessManager(this);

    connect(m_manager, SIGNAL(finished(QNetworkReply*)),
         this, SLOT(replyFinished(QNetworkReply*)));

}

void MyClass::fetch()
{
    m_manager->get(QNetworkRequest(QUrl("http://stackoverflow.com")));
}

void MyClass::replyFinished(QNetworkReply* pReply)
{

    QByteArray data=pReply->readAll();
    QString str(data);

    //process str any way you like!

}

In your in your handler for the finishedsignal you will be passed a QNetworkReplyobject, which you can read the response from as it inherits from QIODevice. A simple way to do this is just call readAllto get a QByteArray. You can construct a QStringfrom that QByteArray and do whatever you want to do with it.

完成信号的处理程序中,您将收到一个QNetworkReply对象,您可以从中读取响应,因为它继承自QIODevice。一个简单的方法是调用readAll来获取QByteArray。你可以从那个 QByteArray构造一个QString并做任何你想做的事情。

回答by C-o-r-E

Paul Dixon's answer is probably the best approach but Jesse's answer does touch something worth mentioning.

Paul Dixon 的回答可能是最好的方法,但 Jesse 的回答确实触及了一些值得一提的地方。

cURL -- or more precisely libcURL is a wonderfully powerful library. No need for executing shell scripts and parsing output, libCURL is available C,C++ and more languages than you can shake an URL at. It might be useful if you are doing some weird operation (like http POST over ssl?) that qt doesnt support.

cURL —— 或者更准确地说 libcURL 是一个非常强大的库。不需要执行 shell 脚本和解析输出,libCURL 是可用的 C、C++ 和更多的语言,而不是你可以动摇一个 URL。如果您正在执行一些 qt 不支持的奇怪操作(例如 http POST over ssl?),它可能会很有用。

回答by Jesse

Have you looked into lynx, curl, or wget? In the past I have needed to grab and parse info from a website, sans db access, and if you are trying to get dynamically formatted data, I believe this would be the quickest way. I'm not a C guy, but I assume there is a way to run shell scripts and grab the data, or at least get the script running and grab the output from a file after writing to it. Worst case scenario, you could run a cron and check for a "finished" line at the end of the written file with C, but I doubt that will be necessary. I suppose it depends on what you're needing it for, but if you just want the output html of a page, something as east as a wget piped to awk or grep can work wonders.

您是否研究过 lynx、curl 或 wget?过去,我需要从网站获取和解析信息,没有 db 访问权限,如果您想获取动态格式化的数据,我相信这将是最快的方法。我不是一个 C 人,但我认为有一种方法可以运行 shell 脚本并获取数据,或者至少让脚本运行并在写入文件后从文件中获取输出。最坏的情况是,您可以运行一个 cron 并在用 C 编写的文件末尾检查“完成”行,但我怀疑这是否必要。我想这取决于您需要它做什么,但是如果您只想要页面的输出 html,那么像 wget 管道到 awk 或 grep 之类的东西可以创造奇迹。