Python 无法解析 JSON 文件中的 TAB
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19799006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unable to parse TAB in JSON files
提问by Josh
I am running into a parsing problem when loading JSON files that seem to have the TABcharacter in them.
在加载似乎包含TAB字符的JSON 文件时,我遇到了解析问题。
When I go to http://jsonlint.com/, and I enter the part with the TAB character:
当我转到http://jsonlint.com/ 时,我输入带有 TAB 字符的部分:
{
"My_String": "Foo bar. Bar foo."
}
The validator complains with:
验证器抱怨:
Parse error on line 2:
{ "My_String": "Foo bar. Bar foo."
------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['
This is literally a copy/paste of the offending JSON text.
这实际上是有问题的 JSON 文本的复制/粘贴。
I have tried loading this file with json
and simplejson
without success. How can I load this properly? Should I just pre-process the file and replace TAB by \t
or by a space? Or is there anything that I am missing here?
我曾尝试加载该文件,json
并simplejson
没有成功。我怎样才能正确加载它?我应该只预处理文件并用\t
空格替换 TAB吗?或者我在这里遗漏了什么?
Update:
更新:
Here is also a problematic example in simplejson
:
这里也是一个有问题的例子simplejson
:
foo = '{"My_string": "Foo bar.\t Bar foo."}'
simplejson.loads(foo)
JSONDecodeError: Invalid control character '\t' at: line 1 column 24 (char 23)
采纳答案by jfs
From JSON standard:
从JSON 标准:
Insignificant whitespace is allowed before or after any token. The whitespace characters are: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.
在任何标记之前或之后都允许存在无关紧要的空格。空白字符是:字符制表 (U+0009)、换行 (U+000A)、回车 (U+000D) 和空格 (U+0020)。任何标记中都不允许有空格,但字符串中允许有空格。
It means that a literal tab character is not allowed inside a JSON string. You need to escape it as \t
(in a .json-file):
这意味着 JSON 字符串中不允许使用文字制表符。您需要将其转义为\t
(在 .json 文件中):
{"My_string": "Foo bar.\t Bar foo."}
In addition if json text is provided inside a Python string literal then you need double escape the tab:
此外,如果在 Python 字符串文字中提供了 json 文本,那么您需要对选项卡进行双重转义:
foo = '{"My_string": "Foo bar.\t Bar foo."}' # in a Python source
Or use a Python raw string literal:
或者使用 Python 原始字符串文字:
foo = r'{"My_string": "Foo bar.\t Bar foo."}' # in a Python source
回答by Mark Reed
Tabs are legal as delimiting whitespace outside of values, but not within strings. Use \t
instead.
制表符作为分隔值之外的空格是合法的,但不能在字符串内。使用\t
来代替。
EDIT:Based on your comments, I see some confusion about what a tab actually is.. the tab character is just a normal character, like 'a' or '5' or '.' or any other character that you enter by pressing a key on your keyboard. It takes up a single byte, whose numeric value is 9. There are no backslashes or lowercase 't's involved.
编辑:根据您的评论,我看到一些关于制表符实际上是什么的困惑..制表符只是一个普通字符,如“a”或“5”或“。” 或通过按键盘上的键输入的任何其他字符。它占用单个字节,其数值为 9。不涉及反斜杠或小写 't。
What puts tab in a different category from 'a' or '5' or '.' is the fact that you, as a human using your eyeballs, generally can't look at a display of text and identify or count tab characters. Visually, a sequence of tabs is identical to a sequence of (a usually larger but still visually indeterminate number of) spaces.
是什么将 tab 放在与“a”或“5”或“.”不同的类别中 事实是,您作为一个使用眼球的人,通常无法查看文本显示并识别或计算制表符。在视觉上,一系列制表符与一系列(通常更大但在视觉上仍然不确定的)空格序列相同。
In order to unambiguously represent tabs inside text meant for computer processing, we have various syntactic methods to say "Hey, some piece of software! Replace this junk with a tab character later, OK?".
为了在用于计算机处理的文本中明确表示制表符,我们有各种句法方法来表示“嘿,一些软件!稍后用制表符替换这个垃圾,好吗?”。
In the history of programming languages there have been two main approaches; if you go back to the 1950's, you get both approaches existing side by side, one in each of two of the oldest high-level languages. Lisp had named character literals like #\Tab
; these were converted as soon as they were read from the program source. Fortran only had the CHAR
function, which was called at runtime and returned the character whose number matched the argument: CHAR(9)
returned a tab. (Of course, if it were really CHAR(9)
and not CHAR(
some expression that works out to 9)
, an optimizing compiler might notice that and replace the function call with a tab at compile time, putting us back over in the other camp.)
在编程语言的历史上,有两种主要的方法:如果您回到 1950 年代,您会发现两种方法并存,两种最古老的高级语言各有一种。Lisp 将字符文字命名为#\Tab
; 一旦从程序源中读取它们,它们就会被转换。Fortran 只有CHAR
函数,该函数在运行时被调用并返回数字与参数匹配的字符:CHAR(9)
返回一个制表符。(当然,如果它是真的,CHAR(9)
而不是CHAR(
某些表达式适用于 9)
,优化编译器可能会注意到这一点,并在编译时用选项卡替换函数调用,让我们回到另一个阵营。)
In general, with both solution types, if you wanted to stick the special character inside a larger string, you had to do the concatenation yourself; for instance, a kid hacking BASIC in the 80's might write something like this:
通常,对于这两种解决方案类型,如果您想将特殊字符放在更大的字符串中,您必须自己进行连接;例如,一个在 80 年代使用 BASIC 的孩子可能会写这样的东西:
10 PRINT "This is a tab ->"; CHR$(9); "<- That was a tab"
But some languages - most notably the family that began with the language B - introduced the ability to include these characters directly inside a string literal:
但是一些语言——尤其是以语言 B 开头的家族——引入了将这些字符直接包含在字符串文字中的能力:
printf("This is a tab -> *t <- That was a tab");
BCPL retained the *
syntax, but the next language in the series, C, replaced it with the backslash, probably because they needed to read and write literal asterisks a lot more often than literal backslashes.
BCPL 保留了*
语法,但该系列中的下一个语言 C 用反斜杠替换了它,可能是因为与文字反斜杠相比,它们需要更频繁地读写星号。
Anyway, a whole host of languages, including both Python and Javascript, have borrowed or inherited C's conventions here. So in both languages, the two expressions "\t"
and '\t'
each result in a one-character string where that one character is a tab.
无论如何,包括 Python 和 Javascript 在内的一整套语言在这里借用或继承了 C 的约定。因此,在这两种语言,两人的表情"\t"
和'\t'
每个结果在一个字符串,其中一个字符是一个标签。
JSON is based on Javascript's syntax, but it only allows a restricted subset of it. For example, strings have to be enclosed in double quotation marks ("
) instead of single ones ('
), and literal tabs are not allowed inside them.
JSON 基于 Javascript 的语法,但它只允许它的一个受限子集。例如,字符串必须用双引号 ( "
) 而不是单引号 ( )括起来'
,并且其中不允许有文字制表符。
That means that this Python string from your update:
这意味着您更新中的这个 Python 字符串:
foo = '{"My_string": "Foo bar.\t Bar foo."}'
is not valid JSON. The Python interpreter turns the \t
sequence into an actual tab character as soon as it reads the string - long before the JSON processor ever sees it.
不是有效的 JSON。Python 解释器在\t
读取字符串后立即将序列转换为实际的制表符 - 早在 JSON 处理器看到它之前。
You can tell Python to put a literal \t
in the string instead of a tab character by doubling the backslash:
您可以\t
通过将反斜杠加倍来告诉 Python在字符串中放入文字而不是制表符:
foo = '{"My_string": "Foo bar.\t Bar foo."}'
Or you can use the "raw" string syntax, which doesn't interpret the special backslash sequences at all:
或者您可以使用“原始”字符串语法,它根本不解释特殊的反斜杠序列:
foo = r'{"My_string": "Foo bar.\t Bar foo."}'
Either way, the JSON processor will see a string containing a backslash followed by a 't', rather than a string containing a tab.
无论哪种方式,JSON 处理器都会看到包含反斜杠后跟“t”的字符串,而不是包含制表符的字符串。
回答by mdml
You can include tabs withinvalues (instead of as whitespace) in JSON files by escaping them. Here's a working example with the json
module in Python2.7:
您可以包括标签内通过逃避他们在JSON文件中的值(而不是为空白)。这是json
Python2.7 中模块的工作示例:
>>> import json
>>> obj = json.loads('{"MY_STRING": "Foo\tBar"}')
>>> obj['MY_STRING']
u'Foo\tBar'
>>> print obj['MY_STRING']
Foo Bar
While not escaping the '\t'
causes an error:
虽然没有逃避'\t'
导致错误:
>>> json.loads('{"MY_STRING": "Foo\tBar"}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 19 (char 18)
回答by Kemin Zhou
Just to share my experience:
简单分享一下我的经验:
I am using snakemake and a config file written in Json. There are tabs in the json file for indentation. TAB are legal for this purpose. But I am getting error message: snakemake.exceptions.WorkflowError: Config file is not valid JSON or YAML. I believe this is a bug of snakemake; but I could be wrong. Please comment. After replacing all TABs with spaces the error message is gone.
我正在使用 snakemake 和一个用 Json 编写的配置文件。json 文件中有用于缩进的制表符。为此目的,TAB 是合法的。但我收到错误消息:snakemake.exceptions.WorkflowError:配置文件不是有效的 JSON 或 YAML。我相信这是snakemake的一个错误;但我可能是错的。请给出意见。用空格替换所有 TAB 后,错误消息消失了。
回答by KARTHIKEYAN.A
In node-red flow i facing same type of problem:
在节点红色流程中,我面临相同类型的问题:
flow.set("delimiter",'"\t"');
error:
错误:
{ "status": "ERROR", "result": "Cannot parse config: String: 1: in value for key 'delimiter': JSON does not allow unescaped tab in quoted strings, use a backslash escape" }
solution:
解决方案:
i added in just \\t
in the code.
我只是\\t
在代码中添加了。
flow.set("delimiter",'"\t"');