C++ 的简单 JSON 字符串转义？

Question

提问by ddinchev

I'm having a very simple program that outputs simple JSON string that I manually concatenate together and output through the std::cout stream (the output really is that simple) but I have strings that could contain double-quotes, curly-braces and other characters that could break the JSON string. So I need a library (or a function more accurately) to escape strings accordingly to the JSON standard, as lightweight as possible, nothing more, nothing less.

我有一个非常简单的程序，它输出简单的 JSON 字符串，我手动将这些字符串连接在一起并通过 std::cout 流输出（输出真的很简单），但我的字符串可能包含双引号、花括号和其他可能破坏 JSON 字符串的字符。所以我需要一个库（或者更准确的函数）来根据 JSON 标准转义字符串，尽可能轻量级，仅此而已。

I found a few libraries that are used to encode whole objects into JSON but having in mind my program is 900 line cpp file, I rather want to not rely on a library that is few times bigger then my program just to achieve something as simple as this.

我发现了一些用于将整个对象编码为 JSON 的库，但请记住我的程序是 900 行 cpp 文件，我宁愿不依赖比我的程序大几倍的库，只是为了实现一些简单的事情这个。

Answer 1

回答by vog

Caveat

警告

Whatever solution you take, keep in mind that the JSON standard requires that you escape all control characters. This seems to be a common misconception. Many developers get that wrong.

无论您采用何种解决方案，请记住 JSON 标准要求您转义所有控制字符。这似乎是一个普遍的误解。许多开发人员都弄错了。

All control charactersmeans everything from '\x00'to '\x1f', not just those with a short representation such as '\x0a'(also known as '\n'). For example, you must escapethe '\x02'character as \u0002.

所有控制字符都意味着从'\x00'到的所有字符'\x1f'，而不仅仅是那些具有简短表示的字符，例如'\x0a'（也称为'\n'）。例如，您必须将'\x02'字符转义为\u0002.

See also: ECMA-404 The JSON Data Interchange Format, Page 10

另请参阅：ECMA-404 JSON 数据交换格式，第 10 页

Simple solution

简单的解决方案

If you know for sure that your input string is UTF-8 encoded, you can keep things simple.

如果您确定您的输入字符串是 UTF-8 编码的，那么您可以保持简单。

Since JSON allows you to escape everything via \uXXXX, even "and \, a simple solution is:

由于 JSON 允许您通过\uXXXX, even"和转义所有内容\，因此一个简单的解决方案是：

#include <sstream>
#include <iomanip>

std::string escape_json(const std::string &s) {
    std::ostringstream o;
    for (auto c = s.cbegin(); c != s.cend(); c++) {
        if (*c == '"' || *c == '\' || ('\x00' <= *c && *c <= '\x1f')) {
            o << "\u"
              << std::hex << std::setw(4) << std::setfill('0') << (int)*c;
        } else {
            o << *c;
        }
    }
    return o.str();
}

Shortest representation

最短表示

For the shortest representation you may use JSON shortcuts, such as \"instead of \u0022. The following function produces the shortest JSON representation of a UTF-8 encoded string s:

对于最短的表示，您可以使用 JSON 快捷方式，例如\"代替\u0022. 以下函数生成 UTF-8 编码字符串的最短 JSON 表示s：

#include <sstream>
#include <iomanip>

std::string escape_json(const std::string &s) {
    std::ostringstream o;
    for (auto c = s.cbegin(); c != s.cend(); c++) {
        switch (*c) {
        case '"': o << "\\""; break;
        case '\': o << "\\"; break;
        case '\b': o << "\b"; break;
        case '\f': o << "\f"; break;
        case '\n': o << "\n"; break;
        case '\r': o << "\r"; break;
        case '\t': o << "\t"; break;
        default:
            if ('\x00' <= *c && *c <= '\x1f') {
                o << "\u"
                  << std::hex << std::setw(4) << std::setfill('0') << (int)*c;
            } else {
                o << *c;
            }
        }
    }
    return o.str();
}

Pure switch statement

纯switch语句

It is also possible to get along with a pure switch statement, that is, without ifand <iomanip>. While this is quite cumbersome, it may be preferable from a "security by simplicity and purity" point of view:

也可以与纯 switch 语句相处，即没有ifand <iomanip>。虽然这很麻烦，但从“简单性和纯度的安全性”的角度来看，它可能更可取：

#include <sstream>

std::string escape_json(const std::string &s) {
    std::ostringstream o;
    for (auto c = s.cbegin(); c != s.cend(); c++) {
        switch (*c) {
        case '\x00': o << "\u0000"; break;
        case '\x01': o << "\u0001"; break;
        ...
        case '\x0a': o << "\n"; break;
        ...
        case '\x1f': o << "\u001f"; break;
        case '\x22': o << "\\""; break;
        case '\x5c': o << "\\"; break;
        default: o << *c;
        }
    }
    return o.str();
}

Using a library

使用库

You might want to have a look at https://github.com/nlohmann/json, which is an efficient header-only C++ library (MIT License) that seems to be very well-tested.

您可能想看看https://github.com/nlohmann/json，这是一个高效的仅标头 C++ 库（MIT 许可证），似乎经过了很好的测试。

You can either call their escape_string()method directly, or you can take their implementation of escape_string()as a starting point for your own implementation:

您可以escape_string()直接调用他们的方法，也可以将他们的实现escape_string()作为您自己实现的起点：

https://github.com/nlohmann/json/blob/ec7a1d834773f9fee90d8ae908a0c9933c5646fc/src/json.hpp#L4604-L4697

Answer 2

回答by mariolpantunes

I have written a simple JSON escape and unescaped functions. The code is public available in GitHub. For anyone interested here is the code:

我编写了一个简单的 JSON 转义和未转义函数。该代码在GitHub 中公开可用。对于任何感兴趣的人，这里是代码：

enum State {ESCAPED, UNESCAPED};

std::string escapeJSON(const std::string& input)
{
    std::string output;
    output.reserve(input.length());

    for (std::string::size_type i = 0; i < input.length(); ++i)
    {
        switch (input[i]) {
            case '"':
                output += "\\"";
                break;
            case '/':
                output += "\/";
                break;
            case '\b':
                output += "\b";
                break;
            case '\f':
                output += "\f";
                break;
            case '\n':
                output += "\n";
                break;
            case '\r':
                output += "\r";
                break;
            case '\t':
                output += "\t";
                break;
            case '\':
                output += "\\";
                break;
            default:
                output += input[i];
                break;
        }

    }

    return output;
}

std::string unescapeJSON(const std::string& input)
{
    State s = UNESCAPED;
    std::string output;
    output.reserve(input.length());

    for (std::string::size_type i = 0; i < input.length(); ++i)
    {
        switch(s)
        {
            case ESCAPED:
                {
                    switch(input[i])
                    {
                        case '"':
                            output += '\"';
                            break;
                        case '/':
                            output += '/';
                            break;
                        case 'b':
                            output += '\b';
                            break;
                        case 'f':
                            output += '\f';
                            break;
                        case 'n':
                            output += '\n';
                            break;
                        case 'r':
                            output += '\r';
                            break;
                        case 't':
                            output += '\t';
                            break;
                        case '\':
                            output += '\';
                            break;
                        default:
                            output += input[i];
                            break;
                    }

                    s = UNESCAPED;
                    break;
                }
            case UNESCAPED:
                {
                    switch(input[i])
                    {
                        case '\':
                            s = ESCAPED;
                            break;
                        default:
                            output += input[i];
                            break;
                    }
                }
        }
    }
    return output;
}

Answer 3

回答by FeRD

You didn't say exactly where those strings you're cobbling together are coming from, originally, so this may not be of any use. But if they all happen to live in the code, as @isnullxbh mentioned in this commentto an answer on a different question, another option is to leverage a lovely C++11 feature: Raw string literals.

您最初没有确切说明您拼凑在一起的那些字符串来自哪里，因此这可能没有任何用处。但是，如果它们碰巧都存在于代码中，正如@isnullxbh 在此评论中对不同问题的答案所提到的那样，另一种选择是利用一个可爱的 C++11 特性：原始字符串文字。

I won't quote cppreference's long-winded, standards-based explanation, you can read it yourself there. Basically, though, R-strings bring to C++ the same sort of programmer-delimited literals, with absolutely norestrictions on content, that you get from here-docs in the shell, and which languages like Perl use so effectively. (Prefixed quoting using curly braces may be Perl's single greatest invention:)

我不会引用 cppreference 冗长的、基于标准的解释，你可以在那里自己阅读。不过，基本上，R 字符串为 C++ 带来了相同类型的程序员分隔文字，对内容绝对没有限制，您可以从此处获得的 shell 文档以及像 Perl 这样的语言使用得如此有效。（使用花括号的前缀引用可能是 Perl 最伟大的发明：）

my qstring = q{Quoted 'string'!};
my qqstring = qq{Double "quoted" 'string'!};
my replacedstring = q{Regexps that /totally/! get eaten by your parser.};
replacedstring =~ s{/totally/!}{(won't!)}; 
# Heh. I see the syntax highlighter isn't quite up to the challege, though.

In C++11 or later, a raw string literal is prefixed with a capital R before the double quotes, and inside the quotes the string is preceded by a free-form delimiter (one or multiple characters) followed by an opening paren.

在 C++11 或更高版本中，原始字符串文字在双引号前以大写 R 为前缀，在引号内，字符串前面是自由格式的分隔符（一个或多个字符），后跟一个左括号。

From there on, you can safely write literally anythingother than a closing paren followed by your chosen delimiter. That sequence (followed by a closing double quote) terminates the raw literal, and then you have a std::stringthat you can confidently trust will remain unmolested by any parsing or string processing.

从那时起，您可以安全地写除右括号后跟您选择的分隔符之外的任何内容。该序列（后跟结束双引号）终止原始文字，然后您就std::string可以放心地相信它不会受到任何解析或字符串处理的干扰。

"Raw"-ness is not lost in subsequent manipulations, either. So, borrowing from the chapter list for Crockford's How JavaScript Works, this is completely valid:

“原始”性也不会在后续操作中丢失。因此，从 Crockford 的How JavaScript Works的章节列表中借用，这是完全有效的：

std::string ch0_to_4 = R"json(
[
    {"number": 0, "chapter": "Read Me First!"},
    {"number": 1, "chapter": "How Names Work"},
    {"number": 2, "chapter": "How Numbers Work"},
    {"number": 3, "chapter": "How Big Integers Work"},
    {"number": 4, "chapter": "How Big Floating Point Works"},)json";

std::string ch5_and_6 = R"json(
    {"number": 5, "chapter": "How Big Rationals Work"},
    {"number": 6, "chapter": "How Booleans Work"})json";

std::string chapters = ch0_to_4 + ch5_and_6 + "\n]";
std::cout << chapters;

The string 'chapters' will emerge from std::coutcompletely intact:

字符串 'chapters' 将std::cout完全完好无损地出现：

[
    {"number": 0, "chapter": "Read Me First!"},
    {"number": 1, "chapter": "How Names Work"},
    {"number": 2, "chapter": "How Numbers Work"},
    {"number": 3, "chapter": "How Big Integers Work"},
    {"number": 4, "chapter": "How Big Floating Point Works"},
    {"number": 5, "chapter": "How Big Rationals Work"},
    {"number": 6, "chapter": "How Booleans Work"}
]

C++ 的简单 JSON 字符串转义？

提问by ddinchev

回答by vog

回答by mariolpantunes

回答by FeRD

相关推荐

最近更新

标签

C++ 的简单 JSON 字符串转义？

提问by ddinchev

回答by vog

回答by mariolpantunes

回答by FeRD

相关推荐

C++ 如果键不是映射中的初始化键，STL map[key] 会返回什么？

C++ cin char 逐个符号读取

在 C++ 中重置 ifstream 对象的文件结束状态

C++ 启用彻底和详细的 g++ 警告的标志

相关推荐

最近更新

标签