ruby 从字符串中删除反斜杠（转义字符）

Question

提问by Huy

I am trying to work on my own JSON parser. I have an input string that I want to tokenize:

我正在尝试使用我自己的 JSON 解析器。我有一个要标记的输入字符串：

input = "{ \"foo\": \"bar\", \"num\": 3}"

How do I remove the escape character \so that it is not a part of my tokens?

如何删除转义字符\，使其不属于我的令牌？

Currently, my solution using deleteworks:

目前，我使用的解决方案delete有效：

tokens = input.delete('\\"').split("")

=> ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]

However, when I try to use gsub, it fails to find any \".

但是，当我尝试使用时gsub，它找不到任何\".

tokens = input.gsub('\\"', '').split("")

=> ["{", " ", "\"", "f", "o", "o", "\"", ":", " ", "\"", "b", "a", "r", "\"", ",", " ", "\"", "n", "u", "m", "\"", ":", " ", "3", "}"]

I have two questions:

我有两个问题：

1. Why does gsub not work in this case?

1. 为什么 gsub 在这种情况下不起作用？

2. How do I remove the backslash (escape) character? I currently have to remove the backslash character with the quotes to make this work.

2. 如何去除反斜杠（转义）字符？我目前必须用引号删除反斜杠字符才能完成这项工作。

Answer 1

回答by Arie Xiao

When you write:

当你写：

input = "{ \"foo\": \"bar\", \"num\": 3}"

The actual string stored in input is:

存储在输入中的实际字符串是：

{ "foo": "bar", "num": 3}

The escape \"here is interpreted by Ruby parser, so that it can distinguish between the boundary of a string (the left most and the right most "), and a normal character "in a string (the escaped ones).

\"这里的转义是由Ruby解析器解释的，因此它可以区分字符串的边界（最左边和最右边"）和"字符串中的正常字符（转义的）。

String#deletedeletes a character setspecified the first parameter, rather than a pattern. All characters that is in the first parameter will be removed. So by writing

String#delete删除指定第一个参数的字符集，而不是模式。第一个参数中的所有字符都将被删除。所以通过写

input.delete('\"')

You got a string with all \and "removed from input, rather than a string with all \"sequence removed from input. This is wrong for your case. It may cause unexpected behavior some time later.

您得到了一个包含 all\并"从中删除input的字符串，而不是一个\"从中删除了所有序列的字符串input。这对你的情况是错误的。它可能会在一段时间后导致意外行为。

String#gsub, however, substitute a pattern (either regular expression or plain string).

String#gsub，但是，替换模式（正则表达式或纯字符串）。

input.gsub('\"', '')

means find all \"(two characters in a sequence) and replace them with empty string. Since there isn't \in input, nothing got replaced. What you need is actually:

表示查找所有\"（序列中的两个字符）并用空字符串替换它们。由于没有\in input，没有什么被替换。你真正需要的是：

input.gsub('"', '')

Answer 2

回答by Amadan

You do nothave backslashes in your string. You have quotes in your string, which need to be escaped when placed in a double-quoted string. Look:

您的字符串中没有反斜杠。您的字符串中有引号，当放在双引号字符串中时需要对其进行转义。看：

input = "{ \"foo\": \"bar\", \"num\": 3}"
puts input
# => { "foo": "bar", "num": 3}

You are removing - phantoms.

您正在移除 - 幻影。

input.delete('\"')

will delete any characters in its argument. Thus, you delete any non-existent backslashes, and also delete all quotes. Without quotes, the default display method (inspect) will not need to escape anything.

将删除其参数中的任何字符。因此，您删除任何不存在的反斜杠，并删除所有引号。如果没有引号，默认显示方法 ( inspect) 将不需要转义任何内容。

input.gsub('\"', '')

will try to delete the sequence \", which does not exist, so gsubends up doing nothing.

将尝试删除\"不存在的序列，因此gsub最终什么也不做。

Make sure you know what the difference between string representation (puts input.inspect) and string content (puts input) is, and note the backslashes as the artifacts of the representation.

确保您知道字符串表示 ( puts input.inspect) 和字符串内容 ( puts input)之间的区别是什么，并注意反斜杠是表示的产物。

That said, I have to echo emaillenin: writing a correct JSON parser is not simple, and you can't do it with regular expressions (or at least, not with regularregular expressions; it might be possible with Oniguruma). It needs a proper parser like treetop or rex/racc, since it has a lot of corner cases that are easy to miss (chief among them being, ironically, escaped characters).

也就是说，我必须回应 emaillenin：编写一个正确的 JSON 解析器并不简单，你不能用正则表达式（或者至少，不能用正则表达式；使用 Oniguruma 可能是可能的）。它需要一个合适的解析器，如 treetop 或 rex/racc，因为它有很多容易遗漏的极端情况（讽刺的是，其中最主要的是转义字符）。

Answer 3

回答by Lenin Raj Rajasekaran

Use regex pattern:

使用正则表达式：

> input = "{ \"foo\": \"bar\", \"num\": 3}"
> input.gsub(/"/,'').split("")

> => ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]

That is actually a double quote only. The slash is to escape it.

这实际上只是一个双引号。斜线是为了逃避它。

Answer 4

回答by Dan

input.gsub(/[\"]/,"")will also work.

input.gsub(/[\"]/,"")也会起作用。

ruby 从字符串中删除反斜杠（转义字符）

提问by Huy

回答by Arie Xiao

回答by Amadan

回答by Lenin Raj Rajasekaran

回答by Dan

相关推荐

最近更新

标签

ruby 从字符串中删除反斜杠（转义字符）

提问by Huy

回答by Arie Xiao

回答by Amadan

回答by Lenin Raj Rajasekaran

回答by Dan

相关推荐

如何检查 Ruby 哈希中是否存在键？

ruby 安装 RVM：“要求安装失败，状态：1。”

ruby ActionController::UnknownFormat

ruby 将带有十六进制 ASCII 代码的字符串转换为字符

相关推荐

最近更新

标签