在 C++ 跨平台中解析 url 的简单方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2616011/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Easy way to parse a url in C++ cross platform?
提问by Andrew Bucknell
I need to parse a URL to get the protocol, host, path, and query in an application I am writing in C++. The application is intended to be cross-platform. I'm surprised I can't find anything that does this in the boostor POCOlibraries. Is it somewhere obvious I'm not looking? Any suggestions on appropriate open source libs? Or is this something I just have to do my self? It's not super complicated but it seems like such a common task I am surprised there isn't a common solution.
我需要解析一个 URL 以获取我用 C++ 编写的应用程序中的协议、主机、路径和查询。该应用程序旨在跨平台。我很惊讶我在boost或POCO库中找不到任何可以做到这一点的东西。是不是很明显我没在看?关于适当的开源库的任何建议?或者这是我必须自己做的事情?这不是超级复杂,但似乎是一项常见的任务,我很惊讶没有通用的解决方案。
采纳答案by Dean Michael
There is a library that's proposed for Boost inclusion and allows you to parse HTTP URI's easily. It uses Boost.Spirit and is also released under the Boost Software License. The library is cpp-netlib which you can find the documentation for at http://cpp-netlib.github.com/-- you can download the latest release from http://github.com/cpp-netlib/cpp-netlib/downloads.
有一个建议包含 Boost 的库,它允许您轻松解析 HTTP URI。它使用 Boost.Spirit,也是在 Boost 软件许可下发布的。该库是 cpp-netlib,您可以在http://cpp-netlib.github.com/找到其文档——您可以从http://github.com/cpp-netlib/cpp-netlib下载最新版本/下载。
The relevant type you'll want to use is boost::network::http::uri
and is documented here.
您要使用的相关类型是boost::network::http::uri
并且记录在此处。
回答by Tom
Wstring version of above, added other fields I needed. Could definitely be refined, but good enough for my purposes.
上面的 Wstring 版本,添加了我需要的其他字段。绝对可以改进,但对我的目的来说已经足够了。
#include <string>
#include <algorithm> // find
struct Uri
{
public:
std::wstring QueryString, Path, Protocol, Host, Port;
static Uri Parse(const std::wstring &uri)
{
Uri result;
typedef std::wstring::const_iterator iterator_t;
if (uri.length() == 0)
return result;
iterator_t uriEnd = uri.end();
// get query start
iterator_t queryStart = std::find(uri.begin(), uriEnd, L'?');
// protocol
iterator_t protocolStart = uri.begin();
iterator_t protocolEnd = std::find(protocolStart, uriEnd, L':'); //"://");
if (protocolEnd != uriEnd)
{
std::wstring prot = &*(protocolEnd);
if ((prot.length() > 3) && (prot.substr(0, 3) == L"://"))
{
result.Protocol = std::wstring(protocolStart, protocolEnd);
protocolEnd += 3; // ://
}
else
protocolEnd = uri.begin(); // no protocol
}
else
protocolEnd = uri.begin(); // no protocol
// host
iterator_t hostStart = protocolEnd;
iterator_t pathStart = std::find(hostStart, uriEnd, L'/'); // get pathStart
iterator_t hostEnd = std::find(protocolEnd,
(pathStart != uriEnd) ? pathStart : queryStart,
L':'); // check for port
result.Host = std::wstring(hostStart, hostEnd);
// port
if ((hostEnd != uriEnd) && ((&*(hostEnd))[0] == L':')) // we have a port
{
hostEnd++;
iterator_t portEnd = (pathStart != uriEnd) ? pathStart : queryStart;
result.Port = std::wstring(hostEnd, portEnd);
}
// path
if (pathStart != uriEnd)
result.Path = std::wstring(pathStart, queryStart);
// query
if (queryStart != uriEnd)
result.QueryString = std::wstring(queryStart, uri.end());
return result;
} // Parse
}; // uri
Tests/Usage
测试/使用
Uri u0 = Uri::Parse(L"http://localhost:80/foo.html?&q=1:2:3");
Uri u1 = Uri::Parse(L"https://localhost:80/foo.html?&q=1");
Uri u2 = Uri::Parse(L"localhost/foo");
Uri u3 = Uri::Parse(L"https://localhost/foo");
Uri u4 = Uri::Parse(L"localhost:8080");
Uri u5 = Uri::Parse(L"localhost?&foo=1");
Uri u6 = Uri::Parse(L"localhost?&foo=1:2:3");
u0.QueryString, u0.Path, u0.Protocol, u0.Host, u0.Port....
回答by wilhelmtell
Terribly sorry, couldn't help it. :s
非常抱歉,忍不住了。:s
url.hh
网址.hh
#ifndef URL_HH_
#define URL_HH_
#include <string>
struct url {
url(const std::string& url_s); // omitted copy, ==, accessors, ...
private:
void parse(const std::string& url_s);
private:
std::string protocol_, host_, path_, query_;
};
#endif /* URL_HH_ */
url.cc
网址.cc
#include "url.hh"
#include <string>
#include <algorithm>
#include <cctype>
#include <functional>
using namespace std;
// ctors, copy, equality, ...
void url::parse(const string& url_s)
{
const string prot_end("://");
string::const_iterator prot_i = search(url_s.begin(), url_s.end(),
prot_end.begin(), prot_end.end());
protocol_.reserve(distance(url_s.begin(), prot_i));
transform(url_s.begin(), prot_i,
back_inserter(protocol_),
ptr_fun<int,int>(tolower)); // protocol is icase
if( prot_i == url_s.end() )
return;
advance(prot_i, prot_end.length());
string::const_iterator path_i = find(prot_i, url_s.end(), '/');
host_.reserve(distance(prot_i, path_i));
transform(prot_i, path_i,
back_inserter(host_),
ptr_fun<int,int>(tolower)); // host is icase
string::const_iterator query_i = find(path_i, url_s.end(), '?');
path_.assign(path_i, query_i);
if( query_i != url_s.end() )
++query_i;
query_.assign(query_i, url_s.end());
}
main.cc
主文件
// ...
url u("HTTP://stackoverflow.com/questions/2616011/parse-a.py?url=1");
cout << u.protocol() << '\t' << u.host() << ...
回答by Elliot Cameron
For completeness, there is one written in C that you could use (with a little wrapping, no doubt): http://uriparser.sourceforge.net/
为了完整起见,您可以使用用 C 编写的一个(毫无疑问,有一点包装):http: //uriparser.sourceforge.net/
[RFC-compliant and supports Unicode]
[符合RFC并支持Unicode]
Here's a very basic wrapper I've been using for simply grabbing the results of a parse.
这是一个非常基本的包装器,我一直在使用它来简单地获取解析结果。
#include <string>
#include <uriparser/Uri.h>
namespace uriparser
{
class Uri //: boost::noncopyable
{
public:
Uri(std::string uri)
: uri_(uri)
{
UriParserStateA state_;
state_.uri = &uriParse_;
isValid_ = uriParseUriA(&state_, uri_.c_str()) == URI_SUCCESS;
}
~Uri() { uriFreeUriMembersA(&uriParse_); }
bool isValid() const { return isValid_; }
std::string scheme() const { return fromRange(uriParse_.scheme); }
std::string host() const { return fromRange(uriParse_.hostText); }
std::string port() const { return fromRange(uriParse_.portText); }
std::string path() const { return fromList(uriParse_.pathHead, "/"); }
std::string query() const { return fromRange(uriParse_.query); }
std::string fragment() const { return fromRange(uriParse_.fragment); }
private:
std::string uri_;
UriUriA uriParse_;
bool isValid_;
std::string fromRange(const UriTextRangeA & rng) const
{
return std::string(rng.first, rng.afterLast);
}
std::string fromList(UriPathSegmentA * xs, const std::string & delim) const
{
UriPathSegmentStructA * head(xs);
std::string accum;
while (head)
{
accum += delim + fromRange(head->text);
head = head->next;
}
return accum;
}
};
}
回答by Michael Mc Donnell
POCO's URI class can parse URLs for you. The following example is shortened version of the one in POCO URI and UUID slides:
POCO 的 URI 类可以为您解析 URL。以下示例是POCO URI 和 UUID 幻灯片中示例的缩短版本:
#include "Poco/URI.h"
#include <iostream>
int main(int argc, char** argv)
{
Poco::URI uri1("http://www.appinf.com:88/sample?example-query#frag");
std::string scheme(uri1.getScheme()); // "http"
std::string auth(uri1.getAuthority()); // "www.appinf.com:88"
std::string host(uri1.getHost()); // "www.appinf.com"
unsigned short port = uri1.getPort(); // 88
std::string path(uri1.getPath()); // "/sample"
std::string query(uri1.getQuery()); // "example-query"
std::string frag(uri1.getFragment()); // "frag"
std::string pathEtc(uri1.getPathEtc()); // "/sample?example-query#frag"
return 0;
}
回答by velcrow
//sudo apt-get install libboost-all-dev; #install boost
//g++ urlregex.cpp -lboost_regex; #compile
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, char* argv[])
{
string url="https://www.google.com:443/webhp?gws_rd=ssl#q=cpp";
boost::regex ex("(http|https)://([^/ :]+):?([^/ ]*)(/?[^ #?]*)\x3f?([^ #]*)#?([^ ]*)");
boost::cmatch what;
if(regex_match(url.c_str(), what, ex))
{
cout << "protocol: " << string(what[1].first, what[1].second) << endl;
cout << "domain: " << string(what[2].first, what[2].second) << endl;
cout << "port: " << string(what[3].first, what[3].second) << endl;
cout << "path: " << string(what[4].first, what[4].second) << endl;
cout << "query: " << string(what[5].first, what[5].second) << endl;
cout << "fragment: " << string(what[6].first, what[6].second) << endl;
}
return 0;
}
回答by Tom Makin
The Poco library now has a class for dissecting URI's and feeding back the host, path segments and query string etc.
Poco 库现在有一个用于剖析 URI 并反馈主机、路径段和查询字符串等的类。
回答by Sun
Facebook's Follylibrary can do the job for you easily. Simply use the Uriclass:
Facebook 的Folly库可以轻松为您完成这项工作。只需使用Uri类:
#include <folly/Uri.h>
int main() {
folly::Uri folly("https://code.facebook.com/posts/177011135812493/");
folly.scheme(); // https
folly.host(); // code.facebook.com
folly.path(); // posts/177011135812493/
}
回答by Sergey K.
This library is very tiny and lightweight: https://github.com/corporateshark/LUrlParser
这个库非常小巧轻便:https: //github.com/corporateshark/LUrlParser
However, it is parsing only, no URL normalization/validation.
但是,它只是解析,没有 URL 规范化/验证。
回答by Ralf
Also of interest could be http://code.google.com/p/uri-grammar/which like Dean Michael's netlib uses boost spirit to parse a URI. Came across it at Simple expression parser example using Boost::Spirit?
同样感兴趣的可能是http://code.google.com/p/uri-grammar/,它像 Dean Michael 的 netlib 一样使用 boost 精神来解析 URI。在使用 Boost::Spirit 的简单表达式解析器示例中遇到它?