在 C++ 中编码/解码 URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/154536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 13:19:18  来源:igfitidea点击:

Encode/Decode URLs in C++

c++urlencodeurldecodepercent-encoding

提问by user126593

Does anyone know of any good C++ code that does this?

有谁知道这样做的任何好的 C++ 代码?

回答by xperroni

I faced the encoding half of this problem the other day. Unhappy with the available options, and after taking a look at this C sample code, i decided to roll my own C++ url-encode function:

前几天我遇到了这个问题的编码一半。对可用选项不满意,在查看了这个 C 示例代码后,我决定推出自己的 C++ url-encode 函数:

#include <cctype>
#include <iomanip>
#include <sstream>
#include <string>

using namespace std;

string url_encode(const string &value) {
    ostringstream escaped;
    escaped.fill('0');
    escaped << hex;

    for (string::const_iterator i = value.begin(), n = value.end(); i != n; ++i) {
        string::value_type c = (*i);

        // Keep alphanumeric and other accepted characters intact
        if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
            escaped << c;
            continue;
        }

        // Any other characters are percent-encoded
        escaped << uppercase;
        escaped << '%' << setw(2) << int((unsigned char) c);
        escaped << nouppercase;
    }

    return escaped.str();
}

The implementation of the decode function is left as an exercise to the reader. :P

decode 函数的实现留给读者作为练习。:P

回答by user126593

Answering my own question...

回答我自己的问题...

libcurl has curl_easy_escapefor encoding.

libcurl 有curl_easy_escape用于编码。

For decoding, curl_easy_unescape

解码,curl_easy_unescape

回答by user126593

string urlDecode(string &SRC) {
    string ret;
    char ch;
    int i, ii;
    for (i=0; i<SRC.length(); i++) {
        if (int(SRC[i])==37) {
            sscanf(SRC.substr(i+1,2).c_str(), "%x", &ii);
            ch=static_cast<char>(ii);
            ret+=ch;
            i=i+2;
        } else {
            ret+=SRC[i];
        }
    }
    return (ret);
}

not the best, but working fine ;-)

不是最好的,但工作正常;-)

回答by Yuriy Petrovskiy

cpp-netlibhas functions

cpp-netlib有函数

namespace boost {
  namespace network {
    namespace uri {    
      inline std::string decoded(const std::string &input);
      inline std::string encoded(const std::string &input);
    }
  }
}

they allow to encode and decode URL strings very easy.

它们允许非常容易地编码和解码 URL 字符串。

回答by tormuto

Ordinarily adding '%' to the int value of a char will not work when encoding, the value is supposed to the the hex equivalent. e.g '/' is '%2F' not '%47'.

通常在编码时将 '%' 添加到 char 的 int 值将不起作用,该值应该是等效的十六进制值。例如,“/”是“%2F”而不是“%47”。

I think this is the best and concise solutions for both url encoding and decoding (No much header dependencies).

我认为这是 url 编码和解码的最佳和简洁的解决方案(没有太多的头依赖)。

string urlEncode(string str){
    string new_str = "";
    char c;
    int ic;
    const char* chars = str.c_str();
    char bufHex[10];
    int len = strlen(chars);

    for(int i=0;i<len;i++){
        c = chars[i];
        ic = c;
        // uncomment this if you want to encode spaces with +
        /*if (c==' ') new_str += '+';   
        else */if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') new_str += c;
        else {
            sprintf(bufHex,"%X",c);
            if(ic < 16) 
                new_str += "%0"; 
            else
                new_str += "%";
            new_str += bufHex;
        }
    }
    return new_str;
 }

string urlDecode(string str){
    string ret;
    char ch;
    int i, ii, len = str.length();

    for (i=0; i < len; i++){
        if(str[i] != '%'){
            if(str[i] == '+')
                ret += ' ';
            else
                ret += str[i];
        }else{
            sscanf(str.substr(i + 1, 2).c_str(), "%x", &ii);
            ch = static_cast<char>(ii);
            ret += ch;
            i = i + 2;
        }
    }
    return ret;
}

回答by kreuzerkrieg

[Necromancer mode on]
Stumbled upon this question when was looking for fast, modern, platform independent and elegant solution. Didnt like any of above, cpp-netlib would be the winner but it has horrific memory vulnerability in "decoded" function. So I came up with boost's spirit qi/karma solution.

[死灵法师模式开启]
在寻找快速、现代、独立于平台且优雅的解决方案时偶然发现了这个问题。与上述任何一个不同,cpp-netlib 将是赢家,但它在“解码”功能中具有可怕的内存漏洞。所以我想出了boost的灵气/业力解决方案。

namespace bsq = boost::spirit::qi;
namespace bk = boost::spirit::karma;
bsq::int_parser<unsigned char, 16, 2, 2> hex_byte;
template <typename InputIterator>
struct unescaped_string
    : bsq::grammar<InputIterator, std::string(char const *)> {
  unescaped_string() : unescaped_string::base_type(unesc_str) {
    unesc_char.add("+", ' ');

    unesc_str = *(unesc_char | "%" >> hex_byte | bsq::char_);
  }

  bsq::rule<InputIterator, std::string(char const *)> unesc_str;
  bsq::symbols<char const, char const> unesc_char;
};

template <typename OutputIterator>
struct escaped_string : bk::grammar<OutputIterator, std::string(char const *)> {
  escaped_string() : escaped_string::base_type(esc_str) {

    esc_str = *(bk::char_("a-zA-Z0-9_.~-") | "%" << bk::right_align(2,0)[bk::hex]);
  }
  bk::rule<OutputIterator, std::string(char const *)> esc_str;
};

The usage of above as following:

以上用法如下:

std::string unescape(const std::string &input) {
  std::string retVal;
  retVal.reserve(input.size());
  typedef std::string::const_iterator iterator_type;

  char const *start = "";
  iterator_type beg = input.begin();
  iterator_type end = input.end();
  unescaped_string<iterator_type> p;

  if (!bsq::parse(beg, end, p(start), retVal))
    retVal = input;
  return retVal;
}

std::string escape(const std::string &input) {
  typedef std::back_insert_iterator<std::string> sink_type;
  std::string retVal;
  retVal.reserve(input.size() * 3);
  sink_type sink(retVal);
  char const *start = "";

  escaped_string<sink_type> g;
  if (!bk::generate(sink, g(start), input))
    retVal = input;
  return retVal;
}

[Necromancer mode off]

[死灵法师模式关闭]

EDIT01: fixed the zero padding stuff - special thanks to Hartmut Kaiser
EDIT02: Live on CoLiRu

EDIT01:修复零填充的东西 - 特别感谢 Hartmut Kaiser
EDIT02:Live on CoLiRu

回答by kometen

Inspired by xperroni I wrote a decoder. Thank you for the pointer.

受 xperroni 的启发,我写了一个解码器。谢谢指点。

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

char from_hex(char ch) {
    return isdigit(ch) ? ch - '0' : tolower(ch) - 'a' + 10;
}

string url_decode(string text) {
    char h;
    ostringstream escaped;
    escaped.fill('0');

    for (auto i = text.begin(), n = text.end(); i != n; ++i) {
        string::value_type c = (*i);

        if (c == '%') {
            if (i[1] && i[2]) {
                h = from_hex(i[1]) << 4 | from_hex(i[2]);
                escaped << h;
                i += 2;
            }
        } else if (c == '+') {
            escaped << ' ';
        } else {
            escaped << c;
        }
    }

    return escaped.str();
}

int main(int argc, char** argv) {
    string msg = "J%C3%B8rn!";
    cout << msg << endl;
    string decodemsg = url_decode(msg);
    cout << decodemsg << endl;

    return 0;
}

edit: Removed unneeded cctype and iomainip includes.

编辑:删除了不需要的 cctype 和 iomainip 包含。

回答by alanc10n

CGICCincludes methods to do url encode and decode. form_urlencode and form_urldecode

CGICC包括进行 url 编码和解码的方法。form_urlencode 和 form_urldecode

回答by Bagelzone Ha'bonè

Adding a follow-up to Bill's recommendation for using libcurl: great suggestion, and to be updated:
after 3 years, the curl_escapefunction is deprecated, so for future use it's better to use curl_easy_escape.

为 Bill 使用 libcurl 的建议添加后续内容:很好的建议,并进行更新:
3 年后,不推荐使用curl_escape函数,因此为了将来使用,最好使用curl_easy_escape

回答by moonlightdock

I ended up on this question when searching for an api to decode url in a win32 c++ app. Since the question doesn't quite specify platform assuming windows isn't a bad thing.

在 win32 c++ 应用程序中搜索用于解码 url 的 api 时,我最终遇到了这个问题。由于这个问题并没有完全指定平台,假设 windows 不是一件坏事。

InternetCanonicalizeUrl is the API for windows programs. More info here

InternetCanonicalizeUrl 是 Windows 程序的 API。更多信息在这里

        LPTSTR lpOutputBuffer = new TCHAR[1];
        DWORD dwSize = 1;
        BOOL fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
        DWORD dwError = ::GetLastError();
        if (!fRes && dwError == ERROR_INSUFFICIENT_BUFFER)
        {
            delete lpOutputBuffer;
            lpOutputBuffer = new TCHAR[dwSize];
            fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
            if (fRes)
            {
                //lpOutputBuffer has decoded url
            }
            else
            {
                //failed to decode
            }
            if (lpOutputBuffer !=NULL)
            {
                delete [] lpOutputBuffer;
                lpOutputBuffer = NULL;
            }
        }
        else
        {
            //some other error OR the input string url is just 1 char and was successfully decoded
        }

InternetCrackUrl (here) also seems to have flags to specify whether to decode url

InternetCrackUrl (这里) 似乎也有标志来指定是否解码 url