C语言 C中strtok和strsep有什么区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7218625/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 09:30:37  来源:igfitidea点击:

What are the differences between strtok and strsep in C

cstrtokstrsep

提问by mizuki

Could someone explain me what differences there are between strtok()and strsep()? What are the advantages and disadvantages of them? And why would I pick one over the other one.

有人能解释一下strtok()和之间有什么区别strsep()吗?它们的优点和缺点是什么?为什么我会选择一个而不是另一个。

采纳答案by George Gaál

From The GNU C Library manual - Finding Tokens in a String:

来自 GNU C 库手册 -在字符串中查找标记

One difference between strsepand strtok_ris that if the input string contains more than one character from delimiter in a row strsepreturns an empty string for each pair of characters from delimiter. This means that a program normally should test for strsepreturning an empty string before processing it.

strsep和之间的一个区别strtok_r是,如果输入字符串在一行中包含多个来自定界符strsep的字符,则为来自定界符的每一对字符返回一个空字符串。这意味着程序通常应该strsep在处理之前测试返回空字符串。

回答by Jonathan Leffler

One major difference between strtok()and strsep()is that strtok()is standardized (by the C standard, and hence also by POSIX) but strsep()is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use strtok()than strsep().

之间的一个主要区别strtok(),并strsep()strtok()被标准化(C标准,并因此也通过POSIX),但strsep()不规范(由C或POSIX;它是GNU C库中可用的,和起源于BSD)。因此,可移植代码strtok()strsep().

Another difference is that calls to the strsep()function on different strings can be interleaved, whereas you cannot do that with strtok()(though you can with strtok_r()). So, using strsep()in a library doesn't break other code accidentally, whereas using strtok()in a library function must be documented because other code using strtok()at the same time cannot call the library function.

另一个区别是strsep()对不同字符串的函数调用可以交错,而您不能这样做strtok()(尽管您可以使用strtok_r())。因此,strsep()在库中使用不会意外破坏其他代码,而strtok()在库函数中使用必须记录在案,因为strtok()同时使用的其他代码无法调用库函数。

The manual page for strsep()at kernel.orgsays:

该手册strsep()kernel.org说:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.

strsep() 函数被引入作为 strtok(3) 的替代品,因为后者不能处理空字段。

Thus, the other major difference is the one highlighted by George Gaálin his answer; strtok()permits multiple delimiters between a single token, whereas strsep()expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.

因此,另一个主要区别是George Gaál在他的回答中强调的一个区别;strtok()允许单个标记之间有多个分隔符,而strsep()期望标记之间有一个分隔符,并将相邻的分隔符解释为空标记。

Both strsep()and strtok()modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\0'over the separator after the end of the token).

双方strsep()strtok()修改其输入字符串,既不让你识别哪些分隔符标志着令牌结束(因为这两个写一个NUL'\0'令牌结束后在分离器)。

When to use them?

什么时候使用它们?

  • You would use strsep()when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.
  • You would use strtok_r()when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).
  • You would only use strtok()when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own strtok_r()or strsep()than to use strtok().
  • strsep()当您想要空令牌而不是允许令牌之间有多个分隔符时,以及当您不介意可移植性时,您将使用。
  • 你会使用strtok_r(),当你想允许标记之间的多个分隔符,你不想空标记(和POSIX足够的便携式你)。
  • strtok()如果你不这样做,你只会在有人威胁你的生命时使用。而且你只能使用它足够长的时间来让你摆脱危及生命的境地;然后,您将再次放弃对它的所有使用。它有毒;不要使用它。最好自己编写strtok_r()strsep()使用strtok().

Why is strtok()poisonous?

为什么strtok()有毒?

The strtok()function is poisonous if used in a library function. If your library function uses strtok(), it must be documented clearly.

strtok()如果在库函数中使用该函数是有毒的。如果您的库函数使用strtok(),则必须清楚地记录下来。

That's because:

那是因为:

  1. If any calling function is using strtok()and calls your function that also uses strtok(), you break the calling function.
  2. If your function calls any function that calls strtok(), that will break your function's use of strtok().
  3. If your program is multithreaded, at most one thread can be using strtok()at any given time — across a sequence of strtok()calls.
  1. 如果任何调用函数正在使用strtok()并且调用您的函数也使用strtok(),则您破坏了调用函数。
  2. 如果您的函数调用任何调用 的函数,strtok()则会破坏您的函数对strtok().
  3. 如果您的程序是多线程的,则strtok()在任何给定时间(通过一系列strtok()调用)最多可以使用一个线程。

The root of this problem is the saved state between calls that allows strtok()to continue where it left off. There is no sensible way to fix the problem other than "do not use strtok()".

这个问题的根源是调用之间保存的状态,它允许strtok()从中断的地方继续。除了“不使用strtok()”之外,没有其他明智的方法来解决这个问题。

  • You can use strsep()if it is available.
  • You can use POSIX's strtok_r()if it is available.
  • You can use Microsoft's strtok_s()if it is available.
  • Nominally, you could use the ISO/IEC 9899:2011 Annex K.3.7.3.1 function strtok_s(), but its interface is different from both strtok_r()and Microsoft's strtok_s().
  • strsep()如果可用,您可以使用。
  • strtok_r()如果可用,您可以使用 POSIX 。
  • strtok_s()如果可用,您可以使用 Microsoft 。
  • 名义上,您可以使用 ISO/IEC 9899:2011 Annex K.3.7.3.1 功能strtok_s(),但其界面strtok_r()与 Microsoft 的strtok_s().

BSD strsep():

BSD strsep()

char *strsep(char **stringp, const char *delim);

POSIX strtok_r():

POSIX strtok_r():

char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);

Microsoft strtok_s():

微软strtok_s()

char *strtok_s(char *strToken, const char *strDelimit, char **context);

Annex K strtok_s():

附件 K strtok_s():

char *strtok_s(char * restrict s1, rsize_t * restrict s1max,
               const char * restrict s2, char ** restrict ptr);

Note that this has 4 arguments, not 3 as in the other two variants on strtok().

请注意,这有 4 个参数,而不是 3 上的其他两个变体strtok()

回答by H.S.

First difference in strtok()and strsep()is the way they handle contiguous delimiter characters in the input string.

在第一差strtok()strsep()为他们处理输入字符串的连续分隔符的方式。

Contiguous delimiter characters handling by strtok():

连续分隔符字符处理方式strtok()

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    printf ("Original String: %s\n", ptr);

    token = strtok (ptr, delims);
    while (token != NULL) {
        printf("%s\n", token);
        token = strtok (NULL, delims);
    }

    printf ("Original String: %s\n", ptr);
    free (ptr);
    return 0;
}

Output:

输出:

# ./example1_strtok
Original String: aaa-bbb --ccc-ddd
aaa
bbb
ccc
ddd
Original String: aaa

In the output, you can see the token "bbb"and "ccc"one after another. strtok()does not indicate the occurrence of contiguous delimiter characters. Also, the strtok()modify the input string.

在输出中,你可以看到令牌"bbb""ccc"一个又一个。strtok()不表示出现连续的分隔符。另外,strtok()修改输入字符串

Contiguous delimiter characters handling by strsep():

连续分隔符字符处理方式strsep()

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr1;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    ptr1 = ptr;

    printf ("Original String: %s\n", ptr);
    while ((token = strsep(&ptr1, delims)) != NULL) {
        if (*token == '
# ./example1_strsep
Original String: aaa-bbb --ccc-ddd
aaa
bbb
<empty>             <==============
<empty>             <==============
ccc
ddd
ptr1 is NULL
Original String: aaa
') { token = "<empty>"; } printf("%s\n", token); } if (ptr1 == NULL) // This is just to show that the strsep() modifies the pointer passed to it printf ("ptr1 is NULL\n"); printf ("Original String: %s\n", ptr); free (ptr); return 0; }

Output:

输出:

#include <stdio.h>
#include <string.h>

void another_function_callng_strtok(void)
{
    char str[] ="ttt -vvvv";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL) {
        printf ("%s\n", token);
        token = strtok (NULL, delims);
    }
    printf ("another_function_callng_strtok: I am done.\n");
}

void function_callng_strtok ()
{
    char str[] ="aaa --bbb-ccc";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL)
    {
        printf ("%s\n",token);
        another_function_callng_strtok();
        token = strtok (NULL, delims);
    }
}

int main(void) {
    function_callng_strtok();
    return 0;
}

In the output, you can see the two empty string (indicated through <empty>) between bbband ccc. Those two empty strings are for "--"between "bbb"and "ccc". When strsep()found a delimiter character ' 'after "bbb", it replaced delimiter character with '\0'character and returned "bbb". After this, strsep()found another delimiter character '-'. Then it replaced delimiter character with '\0'character and returned the empty string. Same is for the next delimiter character.

在输出中,你可以看到两个空字符串(指示通过<empty>之间)bbbccc。这两个空字符串用于"--"between"bbb""ccc"。当strsep()找到一个分隔符' '"bbb",它替换分隔符'\0'字符并返回"bbb"。在此之后,strsep()找到了另一个分隔符'-'。然后它用字符替换分隔符'\0'并返回空字符串。下一个分隔符也是如此。

Contiguous delimiter characters are indicated when strsep()returns a pointer to a null character(that is, a character with the value '\0').

strsep()返回指向空字符(即,值为 的字符)的指针时,指示连续的分隔符字符'\0'

The strsep()modify the input string as well as the pointerwhose address passed as first argument to strsep().

strsep()修改输入字符串以及指针,其地址作为第一个参数传递strsep()

Second difference is, strtok()relies on a static variable to keep track of the current parse location within a string. This implementation requires to completely parse one string before beginning a second string. But this is not the case with strsep().

第二个区别是,strtok()依赖于静态变量来跟踪字符串中的当前解析位置。此实现需要在开始第二个字符串之前完全解析一个字符串。但情况并非如此strsep()

Calling strtok()when another strtok()is not finished:

strtok()当另一个strtok()未完成时调用:

# ./example2_strtok
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
vvvv
another_function_callng_strtok: I am done.

Output:

输出:

#include <stdio.h>
#include <string.h>

void another_function_callng_strsep(void)
{
    char str[] ="ttt -vvvv";
    const char* delims = " -";
    char* token;
    char* ptr = str;

    printf ("Original String: %s\n", str);
    while ((token = strsep(&ptr, delims)) != NULL) {
        if (*token == '
# ./example2_strsep
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
bbb
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
ccc
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
') { token = "<empty>"; } printf("%s\n", token); } printf ("another_function_callng_strsep: I am done.\n"); } void function_callng_strsep () { char str[] ="aaa --bbb-ccc"; const char* delims = " -"; char* token; char* ptr = str; printf ("Original String: %s\n", str); while ((token = strsep(&ptr, delims)) != NULL) { if (*token == '##代码##') { token = "<empty>"; } printf("%s\n", token); another_function_callng_strsep(); } } int main(void) { function_callng_strsep(); return 0; }

The function function_callng_strtok()only print token "aaa"and does not print the rest of the tokens of input string because it calls another_function_callng_strtok()which in turn call strtok()and it set the static pointer of strtok()to NULLwhen it finishes with extracting all the tokens. The control comes back to function_callng_strtok()whileloop, strtok()returns NULLdue to the static pointer pointing to NULLand which make the loop condition falseand loop exits.

该函数function_callng_strtok()只打印标记"aaa",不打印输入字符串的其余标记,因为它会调用another_function_callng_strtok()它,然后调用strtok()它,strtok()NULL在完成提取所有标记时设置 的静态指针。控制返回function_callng_strtok()while循环,由于指向的静态指针strtok()返回,这使得循环条件和循环退出。NULLNULLfalse

Calling strsep()when another strsep()is not finished:

strsep()当另一个strsep()未完成时调用:

##代码##

Output:

输出:

##代码##

Here you can see, calling strsep()before completely parse one string doesn't makes any difference.

在这里你可以看到,strsep()在完全解析一个字符串之前调用没有任何区别。

So, the disadvantage of strtok()and strsep()is that both modify the input string but strsep()has couple of advantages over strtok()as illustrated above.

因此,的缺点strtok(),并strsep()为这两个修改输入字符串,但strsep()拥有几个优点strtok()之上,如图所示。

From strsep:

strsep

The strsep() function is intended as a replacement for the strtok() function. While the strtok() function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990 (``ISO C90'')) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. The strsep() function first appeared in 4.4BSD.

strsep() 函数旨在替代 strtok() 函数。虽然 strtok() 函数出于可移植性的原因应该是首选(它符合 ISO/IEC 9899:1990 (``ISO C90'')),但它无法处理空字段,即检测由两个相邻分隔符分隔的字段,或一次用于多个字符串。strsep() 函数首次出现在 4.4BSD 中。



For reference:

以供参考: