C语言 strtok() 问题:如果标记由分隔符分隔,为什么最后一个标记位于分隔符和空值 '\0' 之间?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16571060/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 06:20:57  来源:igfitidea点击:

strtok() issue: If tokens are delimited by delimiters,why is last token between a delimiter and the null '\0'?

ctokendelimiterstrtok

提问by Rüppell's Vulture

In the following program, strtok()works as expected in the major part but I just can't comprehend the reason behind one finding. I have read about strtok()that:

在下面的程序中,strtok()主要部分按预期工作,但我无法理解一个发现背后的原因。我已经阅读过strtok()

To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token.

Source: http://www.cplusplus.com/reference/cstring/strtok/

为了确定标记的开头和结尾,该函数首先从起始位置开始扫描未包含在分隔符中的第一个字符(成为标记的开头)。然后从标记的这个开头开始扫描分隔符中包含的第一个字符,它成为标记的结尾。

来源:http: //www.cplusplus.com/reference/cstring/strtok/

And as we know, strtok()places a \0at the end of each token. But in the following program, the last delimiter is a dot(.), after which there is Toadbetween that dot and the quotation mark ("). Now the dot is a delimiter in my program, but there is no delimiter after Toad, not even a white space (which is a delimiter in my program). Please clear the following confusion arising from this premise:

正如我们所知,在每个标记的末尾strtok()放置一个\0。但是在下面的程序中,最后一个分隔符是一个点( .),在该点和引号 ( )之间有一个Toad"。现在点是我程序中的分隔符,但是Toad之后没有分隔符,甚至没有空格(这是我程序中的分隔符)。请清除由此前提引起的以下混淆:

Why is strtok()considering Toadas a token even though it is not between 2 delimiters? This is what I read about strtok()when it encounters a NULL character (\0):

为什么strtok()考虑蟾蜍作为标记,即使它是不是2个分隔符之间?这是我strtok()在遇到 NULL 字符 ( \0)时读到的内容:

Once the terminating null character of str has been found in a call to strtok, all subsequent calls to this function with a null pointer as the first argument return a null pointer.

Source: http://www.cplusplus.com/reference/cstring/strtok/

一旦在对 strtok 的调用中找到 str 的终止空字符,所有后续调用此函数并以空指针作为第一个参数将返回空指针。

来源:http: //www.cplusplus.com/reference/cstring/strtok/

Nowhere does it say that once a null character is encountered,a pointer to the beginning of the token is returned (we don't even have a token here as we didn't get an end of the token as there was no delimiter character found after the scan begun from the beginning of the token (i.e. from 'T' of Toad), we only found a null character, not a delimiter). So why is the part between last delimiter and quotation mark of argument stringconsidered a token by strtok()? Please explain this.

没有任何地方说一旦遇到空字符,就会返回一个指向标记开头的指针(我们这里甚至没有标记,因为我们没有找到标记的结尾,因为没有找到分隔符在从令牌的开头(即从 Toad 的 'T' 开始)开始扫描后,我们只找到了一个空字符,而不是分隔符)。那么为什么最后一个分隔符和参数字符串的引号之间部分被认为是一个标记strtok()呢?请解释一下。

Code:

代码:

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] =" Falcon,eagle-hawk..;buzzard,gull..pigeon sparrow,hen;owl.Toad";
  char * pch=strtok(str," ;,.-");

    while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ;,.-");
  }

  return 0;
}

Output:

输出:

Falcon
eagle
hawk
buzzard
gull
pigeon
sparrow
hen
owl
Toad

猎鹰


秃鹰
海鸥
鸽子
麻雀
母鸡

蟾蜍

回答by Daniel Fischer

The standard's specification of strtok(7.24.5.8) is pretty clear. In particular paragraph 4 (emphasis added by me) is directly relevant to the question, if I understand that correctly:

strtok(7.24.5.8)的标准规范非常清楚。特别是第 4 段(我添加的重点)与问题直接相关,如果我理解正确的话:

3 The first call in the sequence searches the string pointed to by s1for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1and the strtokfunction returns a null pointer. If such a character is found, it is the start of the first token.

4 The strtokfunction then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtokfunction saves a pointer to the following character, from which the next search for a token will start.

3 序列中的第一个调用在 指向的字符串中搜索s1不包含在 指向的当前分隔符字符串中的第一个字符s2。如果找不到这样的字符,然后有串中没有标记的指向s1strtok函数返回一个空指针。如果找到这样的字符,则它是第一个标记的开始。

4 该strtok函数然后从那里搜索包含在当前分隔符字符串中的字符。如果没有找到这样的字符,当前标记会扩展到指向的字符串的末尾s1,后续对标记的搜索将返回一个空指针。如果找到这样的字符,它会被一个空字符覆盖,终止当前标记。该strtok函数保存一个指向下一个字符的指针,从该字符开始下一次对标记的搜索。

In a call

在通话中

char *where = strtok(string_or_NULL, delimiters);

the token (a pointer to which is) returned - if any - extends from the first non-delimiter character found from the starting position (inclusive) until the next delimiter character (exclusive), if one exists, or the end of the string, if no later delimiter character exists.

返回的标记(指向它的指针) - 如果有 - 从从起始位置(包含)找到的第一个非分隔符字符扩展到下一个分隔符(不包含),如果存在,或者字符串的结尾,如果不存在后面的分隔符。

The linked description doesn't explicitly mention the case of a token extending until the end of the string, as opposed to the standard, so it is incomplete in that respect.

与标准相反,链接描述没有明确提到令牌扩展到字符串末尾的情况,因此在这方面是不完整的。

回答by Jonathan Leffler

Going to the description in POSIX for strtok(), the description says:

转到 POSIX forstrtok()中的描述,描述说:

char *strtok(char *restrict s1, const char *restrict s2);

A sequence of calls to strtok()breaks the string pointed to by s1into a sequence of tokens, each of which is delimited by a byte from the string pointed to by s2. The first call in the sequence has s1as its first argument, and is followed by calls with a null pointer as their first argument. The separator string pointed to by s2may be different from call to call.

The first call in the sequence searches the string pointed to by s1for the first byte that is not contained in the current separator string pointed to by s2. If no such byte is found, then there are no tokens in the string pointed to by s1and strtok()shall return a null pointer. If such a byte is found, it is the start of the first token.

The strtok()function then searches from there for a byte that is contained in the current separator string. If no such byte is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token shall return a null pointer. If such a byte is found, it is overwritten by a NUL character, which terminates the current token. The strtok()function saves a pointer to the following byte, from which the next search for a token shall start.

char *strtok(char *restrict s1, const char *restrict s2);

strtok()指向的字符串s1分解为一系列标记的调用序列,每个标记都由 指向的字符串中的一个字节分隔s2。序列中的第一个调用s1作为它的第一个参数,然后是使用空指针作为第一个参数的调用。指向的分隔符字符串s2可能与调用不同。

序列中的第一个调用在 指向的字符串中搜索s1不包含在 指向的当前分隔符字符串中的第一个字节s2。如果没有找到这样的字节,那么指向的字符串中没有标记,s1并且strtok()应该返回一个空指针。如果找到这样的字节,则它是第一个标记的开始。

strtok()然后该函数从那里搜索包含在当前分隔符字符串中的字节。如果没有找到这样的字节,当前标记将扩展到指向的字符串的末尾s1,随后对标记的搜索将返回一个空指针。如果找到这样的字节,它将被一个 NUL 字符覆盖,从而终止当前标记。该strtok()函数保存一个指向下一个字节的指针,从该字节开始下一次对令牌的搜索。

Note the second sentence of the third paragraph:

注意第三段的第二句:

If no such byte is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token shall return a null pointer.

如果没有找到这样的字节,当前标记将扩展到指向的字符串的末尾s1,随后对标记的搜索将返回一个空指针。

This clearly states that in the example in the question, Toadis indeed a token. One way to think of it is that the list of delimiters always includes the NUL '\0'at the end of the delimiter string.

这清楚地表明,在问题的示例中,Toad确实是一个令牌。一种思考方式是分隔符列表总是'\0'在分隔符字符串的末尾包含 NUL 。



Having diagnosed that, note that strtok()is not a good function to use — it is not thread safe or reentrant. On Windows, you can use strtok_s()instead; on Unix, you can usually use strtok_r(). These are better functions because they don't store internally the pointer at which the search is to resume.

诊断后,请注意这strtok()不是一个好用的函数——它不是线程安全的或可重入的。在 Windows 上,您可以使用strtok_s();在 Unix 上,您通常可以使用strtok_r(). 这些是更好的函数,因为它们不在内部存储要恢复搜索的指针。

Because strtok()is not reentrant, you cannot call a function that uses strtok()from inside a function that itself uses strtok()while it is using strtok(). Also, any library function that uses strtok()must be clearly identified as doing so because it cannot be called from a function that is using strtok(). So, using strtok()makes life hard.

因为strtok()是不可重入的,所以你不能strtok()从它自己使用的函数内部调用使用strtok()它的函数strtok()。此外,任何使用的库函数都strtok()必须清楚地标识为这样做,因为它不能从使用strtok(). 所以,使用strtok()使生活变得艰难。

The other problem with the strtok()family of functions (and with strsep(), which is related) is that they overwrite the delimiter; you can't find out what the delimiter was after the tokenizer has tokenized the string. This can matter in some applications (such as parsing shell command lines; it matters whether the delimiter is a pipe or a semicolon or an ampersand (or ...). So shell parsers usually don't use strtok(), despite the number of questions on SO about shells where the parser does use strtok().

strtok()函数族(以及与strsep()相关)的另一个问题是它们覆盖了分隔符;在分词器对字符串进行分词后,您无法找出分隔符是什么。这在某些应用程序中可能很重要(例如解析 shell 命令行;分隔符是管道还是分号或与号(或...)很重要。因此strtok(),尽管有很多问题,shell 解析器通常不使用。所以关于解析器确实使用的外壳strtok()

Generally, you should steer clear of plain strtok(), and it is up to you to decide whether strtok_r()or strtok_s()is appropriate for your purposes.

通常,您应该避免使用 plain strtok(),由您决定是否适合strtok_r()strtok_s()适合您的目的。

回答by Oktalist

Because cplusplus.com isn't telling you the whole story. Cppreference.comhas a better description.

因为 cplusplus.com 并没有告诉你整个故事。Cppreference.com有更好的描述。

Cplusplus.com also fails to mention that strtokis not thread-safe, and only documents the strtokfunction of the C++ programming language, whereas cppreference.com does mention the thread safety issue and documents the strtokfunctions of both the Cand the C++programming languages.

Cplusplus.com 也没有提到strtok不是线程安全的,只记录了strtokC++ 编程语言的功能,而 cppreference.com 确实提到了线程安全问题,并记录strtokCC++编程语言的功能。

回答by gkovacs90

strtok breaks a string to a sequence of tokens, separated by the given delimeters. Delimeters only separate tokens, not necesarily terminate them on both side.

strtok 将字符串分解为由给定分隔符分隔的一系列标记。分隔符仅分隔令牌,不必在两侧终止它们。

回答by Matt Phillips

Are you perhaps just mis-reading the description?

你可能只是误读了描述?

Once the terminating null character of str has been found in a call to strtok, all subsequentcalls to this function with a null pointer as the first argument return a null pointer.

一旦在对 strtok 的调用中找到 str 的终止空字符,所有后续调用此函数并以空指针作为第一个参数将返回空指针。

Given 'subsequent', I'm reading this as every call to strtokafterthe one that discovered \0, not necessarily the current one itself. So, the definition is consistent with behavior (and with what you would expect from strtok).

鉴于“随后”,我将其理解为对发现的调用strtok之后的每次调用\0,而不一定是当前调用本身。因此,该定义与行为(以及您对 的期望strtok)一致。