C语言 strtok() 如何将字符串拆分为 C 中的标记?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3889992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 06:38:01  来源:igfitidea点击:

How does strtok() split the string into tokens in C?

cstringsplittokenstrtok

提问by fuddin

Please explain to me the working of strtok()function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.

请向我解释strtok()函数的工作原理。手册说它将字符串分解为标记。我无法从手册中理解它的实际作用。

I added watches on strand *pchto check its working when the first while loop occurred, the contents of strwere only "this". How did the output shown below printed on the screen?

我添加了手表str*pch在第一个 while 循环发生时检查它的工作,内容str只是“这个”。下面显示的输出是如何打印在屏幕上的?

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

Output:

输出:

Splitting string "- This, a sample string." into tokens:
This
a
sample
string

采纳答案by Sachin Shanbhag

strtok()divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."

strtok()将字符串分成标记。即从任何一个分隔符开始到下一个将是您的一个标记。在您的情况下,起始标记将来自“-”并以下一个空格“”结束。然后下一个标记将从“”开始并以“,”结束。在这里,您将获得“This”作为输出。类似地,字符串的其余部分从一个空格到另一个空格拆分为标记,最后以“。”结束最后一个标记。

回答by AndersK

the strtok runtime function works like this

strtok 运行时函数是这样工作的

the first time you call strtok you provide a string that you want to tokenize

第一次调用 strtok 时,您提供了一个要标记的字符串

char s[] = "this is a string";

in the above string space seems to be a good delimiter between words so lets use that:

在上面的字符串空间中似乎是单词之间的一个很好的分隔符,所以让我们使用它:

char* p = strtok(s, " ");

what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)

现在发生的是搜索 's' 直到找到空格字符,返回第一个标记('this')并且 p 指向该标记(字符串)

in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointerto your previous passed string:

为了获得下一个标记并继续使用相同的字符串 NULL 作为第一个参数传递,因为 strtok 维护一个指向您之前传递的字符串的静态指针

p = strtok(NULL," ");

p now points to 'is'

p 现在指向“是”

and so on until no more spaces can be found, then the last string is returned as the last token 'string'.

依此类推,直到找不到更多空格,然后最后一个字符串作为最后一个标记“字符串”返回。

more conveniently you could write it like this instead to print out all tokens:

更方便的是,您可以这样写,而不是打印出所有标记:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
  puts(p);
}

EDIT:

编辑:

If you want to store the returned values from strtokyou need to copy the token to another buffer e.g. strdup(p);since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.

如果您想存储返回的值,strtok您需要将令牌复制到另一个缓冲区,例如strdup(p);因为原始字符串(由内部的静态指针指向strtok)在迭代之间被修改以返回令牌。

回答by John Bode

strtokmaintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.

strtok维护指向字符串中下一个可用标记的静态内部引用;如果您传递给它一个 NULL 指针,它将从该内部引用工作。

This is the reason strtokisn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.

这就是strtok不可重入的原因;一旦你传递一个新的指针,旧的内部引用就会被破坏。

回答by Mat

strtokdoesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points toin subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)

strtok不会更改参数本身 ( str)。它存储该指针(在局部静态变量中)。然后,它可以在后续调用中更改该参数指向的内容,而无需将参数传回。(并且它可以推进它保留的指针,但它需要执行其操作。)

From the POSIX strtokpage:

从 POSIXstrtok页面:

This function uses static storage to keep track of the current string position between calls.

此函数使用静态存储来跟踪调用之间的当前字符串位置。

There is a thread-safe variant (strtok_r) that doesn't do this type of magic.

有一个线程安全的变体 ( strtok_r) 不会做这种魔法。

回答by tibur

The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULLto that function, as long as it returns a non NULLpointer.

第一次调用它时,提供要标记为 的字符串strtok。然后,要获得以下标记,您只需给NULL该函数,只要它返回一个非NULL指针。

The strtokfunction records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)

strtok函数记录您在调用它时首先提供的字符串。(这对于多线程应用程序来说真的很危险)

回答by Ziffusion

strtok will tokenize a string i.e. convert it into a series of substrings.

strtok 将标记一个字符串,即将其转换为一系列子字符串。

It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.

它通过搜索分隔这些标记(或子字符串)的分隔符来实现。并且您指定分隔符。在您的情况下,您需要 ' ' 或 ',' 或 '.' 或“-”作为分隔符。

The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.

提取这些标记的编程模型是您手动 strtok 主字符串和分隔符集。然后你反复调用它,每次 strtok 都会返回它找到的下一个标记。直到它到达主字符串的末尾,当它返回空值时。另一个规则是你只在第一次传入字符串,随后的时候传入 NULL。这是一种告诉 strtok 是否正在使用新字符串开始新的标记化会话,或者您正在从先前的标记化会话中检索标记的方法。请注意, strtok 会记住其标记会话的状态。因此,它不是可重入的或线程安全的(您应该使用 strtok_r 代替)。要知道的另一件事是它实际上修改了原始字符串。它为找到的分隔符写入 '\0' 。

One way to invoke strtok, succintly, is as follows:

简单地说,调用 strtok 的一种方法如下:

char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;

for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
    printf("token=%s\n", token);
}

Result:

结果:

this
is
the
string
I
want
to
parse

回答by xpmatteo

strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.

strtok 修改其输入字符串。它在其中放置空字符 ('\0'),以便将原始字符串的位作为标记返回。实际上 strtok 不分配内存。如果将字符串绘制为一系列框,您可能会更好地理解它。

回答by fnisi

To understand how strtok()works, one first need to know what a static variableis. This linkexplains it quite well....

要了解它的strtok()工作原理,首先需要知道什么是静态变量这个链接很好地解释了它......

The key to the operation of strtok()is preserving the location of the last seperator between seccessive calls (that's why strtok()continues to parse the very original string that is passed to it when it is invoked with a null pointerin successive calls)..

操作的关键strtok()是在连续调用之间保留最后一个分隔符的位置(这就是为什么在连续调用中strtok()调用它时继续解析传递给它的原始字符串的原因null pointer)。

Have a look at my own strtok()implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()

看看我自己的strtok()实现,称为zStrtok(),它的功能与由提供的功能略有不同strtok()

char *zStrtok(char *str, const char *delim) {
    static char *static_str=0;      /* var to store last address */
    int index=0, strlength=0;           /* integers for indexes */
    int found = 0;                  /* check if delim is found */

    /* delimiter cannot be NULL
    * if no more char left, return NULL as well
    */
    if (delim==0 || (str == 0 && static_str == 0))
        return 0;

    if (str == 0)
        str = static_str;

    /* get length of string */
    while(str[strlength])
        strlength++;

    /* find the first occurance of delim */
    for (index=0;index<strlength;index++)
        if (str[index]==delim[0]) {
            found=1;
            break;
        }

    /* if delim is not contained in str, return str */
    if (!found) {
        static_str = 0;
        return str;
    }

    /* check for consecutive delimiters
    *if first char is delim, return delim
    */
    if (str[0]==delim[0]) {
        static_str = (str + 1);
        return (char *)delim;
    }

    /* terminate the string
    * this assignmetn requires char[], so str has to
    * be char[] rather than *char
    */
    str[index] = '
  Example Usage
      char str[] = "A,B,,,C";
      printf("1 %s\n",zStrtok(s,","));
      printf("2 %s\n",zStrtok(NULL,","));
      printf("3 %s\n",zStrtok(NULL,","));
      printf("4 %s\n",zStrtok(NULL,","));
      printf("5 %s\n",zStrtok(NULL,","));
      printf("6 %s\n",zStrtok(NULL,","));

  Example Output
      1 A
      2 B
      3 ,
      4 ,
      5 C
      6 (null)
'; /* save the rest of the string */ if ((str + index + 1)!=0) static_str = (str + index + 1); else static_str = 0; return str; }

And here is an example usage

这是一个示例用法

#include "stdafx.h"
#include <iostream>
using namespace std;

char* mystrtok(char str[],char filter[]) 
{
    if(filter == NULL) {
        return str;
    }
    static char *ptr = str;
    static int flag = 0;
    if(flag == 1) {
        return NULL;
    }
    char* ptrReturn = ptr;
    for(int j = 0; ptr != '##代码##'; j++) {
        for(int i=0 ; filter[i] != '##代码##' ; i++) {
            if(ptr[j] == '##代码##') {
                flag = 1;
                return ptrReturn;
            }
            if( ptr[j] == filter[i]) {
                ptr[j] = '##代码##';
                ptr+=j+1;
                return ptrReturn;
            }
        }
    }
    return NULL;
}

int _tmain(int argc, _TCHAR* argv[])
{
    char str[200] = "This,is my,string.test";
    char *ppt = mystrtok(str,", .");
    while(ppt != NULL ) {
        cout<< ppt << endl;
        ppt = mystrtok(NULL,", ."); 
    }
    return 0;
}

The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :) https://github.com/fnoyanisi/zString

代码来自我在 Github 上维护的字符串处理库,称为 zString。看看代码,甚至贡献:) https://github.com/fnoyanisi/zString

回答by Dipak

This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.

这就是我实现 strtok 的方式,不是很好,但在它工作了 2 小时后终于开始工作了。它确实支持多个分隔符。

##代码##

回答by Vaibhav

strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable .

strtok() 将指针存储在您上次离开的静态变量中,因此在第二次调用时,当我们传递 null 时,strtok() 从静态变量中获取指针。

If you provide the same string name , it again starts from beginning.

如果您提供相同的字符串 name ,它将再次从头开始。

Moreover strtok() is destructive i.e. it make changes to the orignal string. so make sure you always have a copy of orignal one.

此外, strtok() 是破坏性的,即它会更改原始字符串。所以请确保您始终拥有一份原始副本。

One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error. For this use strtok_r().

使用 strtok() 的另一个问题是,由于它将地址存储在静态变量中,因此在多线程编程中多次调用 strtok() 会导致错误。为此使用 strtok_r()。