C++ 将 strtok 与 std::string 一起使用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/289347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 14:22:23  来源:igfitidea点击:

Using strtok with a std::string

c++strtok

提问by Chris Blackwell

I have a string that I would like to tokenize. But the C strtok()function requires my string to be a char*. How can I do this simply?

我有一个想要标记的字符串。但是 Cstrtok()函数要求我的字符串是char*. 我怎么能简单地做到这一点?

I tried:

我试过:

token = strtok(str.c_str(), " "); 

which fails because it turns it into a const char*, not a char*

失败是因为它把它变成了一个const char*,而不是一个char*

回答by Chris Blackwell

#include <iostream>
#include <string>
#include <sstream>
int main(){
    std::string myText("some-text-to-tokenize");
    std::istringstream iss(myText);
    std::string token;
    while (std::getline(iss, token, '-'))
    {
        std::cout << token << std::endl;
    }
    return 0;
}

Or, as mentioned, use boost for more flexibility.

或者,如前所述,使用 boost 以获得更大的灵活性。

回答by Todd Gamblin

  1. If boostis available on your system (I think it's standard on most Linux distros these days), it has a Tokenizerclass you can use.

  2. If not, then a quick Google turns up a hand-rolled tokenizerfor std::string that you can probably just copy and paste. It's very short.

  3. And, if you don't like either of those, then here's a split() function I wrote to make my life easier. It'll break a string into pieces using any of the chars in "delim" as separators. Pieces are appended to the "parts" vector:

    void split(const string& str, const string& delim, vector<string>& parts) {
      size_t start, end = 0;
      while (end < str.size()) {
        start = end;
        while (start < str.size() && (delim.find(str[start]) != string::npos)) {
          start++;  // skip initial whitespace
        }
        end = start;
        while (end < str.size() && (delim.find(str[end]) == string::npos)) {
          end++; // skip to end of word
        }
        if (end-start != 0) {  // just ignore zero-length strings.
          parts.push_back(string(str, start, end-start));
        }
      }
    }
    
  1. 如果您的系统上可以使用boost(我认为它是当今大多数 Linux 发行版的标准配置),它就有一个您可以使用的Tokenizer类。

  2. 如果没有,那么 Google 会快速为 std::string 提供一个手动标记器,您可能只需复制和粘贴即可。它很短。

  3. 而且,如果您不喜欢其中任何一个,那么这是我编写的 split() 函数,以使我的生活更轻松。它将使用“delim”中的任何字符作为分隔符将字符串分成几部分。件被附加到“零件”向量:

    void split(const string& str, const string& delim, vector<string>& parts) {
      size_t start, end = 0;
      while (end < str.size()) {
        start = end;
        while (start < str.size() && (delim.find(str[start]) != string::npos)) {
          start++;  // skip initial whitespace
        }
        end = start;
        while (end < str.size() && (delim.find(str[end]) == string::npos)) {
          end++; // skip to end of word
        }
        if (end-start != 0) {  // just ignore zero-length strings.
          parts.push_back(string(str, start, end-start));
        }
      }
    }
    

回答by DocMax

Duplicate the string, tokenize it, then free it.

复制字符串,标记它,然后释放它。

char *dup = strdup(str.c_str());
token = strtok(dup, " ");
free(dup);

回答by Martin Dimitrov

There is a more elegant solution.

有一个更优雅的解决方案。

With std::string you can use resize() to allocate a suitably large buffer, and &s[0] to get a pointer to the internal buffer.
使用 std::string 您可以使用 resize() 分配一个适当大的缓冲区,并使用 &s[0] 获取指向内部缓冲区的指针。

At this point many fine folks will jump and yell at the screen. But this is the fact. About 2 years ago

此时,许多优秀的人会跳起来对着屏幕大喊大叫。但这是事实。大约 2 年前

the library working group decided (meeting at Lillehammer) that just like for std::vector, std::string should also formally, not just in practice, have a guaranteed contiguous buffer.
图书馆工作组决定(在利勒哈默尔开会),就像 std::vector 一样,std::string 也应该正式地,而不仅仅是在实践中,有一个有保证的连续缓冲区。

The other concern is does strtok() increases the size of the string. The MSDN documentation says:

另一个问题是 strtok() 会增加字符串的大小。MSDN 文档说:

Each call to strtok modifies strToken by inserting a null character after the token returned by that call.
每次调用 strtok 都会通过在该调用返回的标记后插入一个空字符来修改 strToken。

But this is not correct. Actually the function replaces the firstoccurrence of a separator character with \0. No change in the size of the string. If we have this string:

但这是不正确的。实际上,该函数用\0替换第一次出现的分隔符。字符串的大小没有变化。如果我们有这个字符串:

one-two---three--four
一二三四

we will end up with

我们最终会得到

one\0two\0--three\0-four
一\0二\0--三\0-四

So my solution is very simple:

所以我的解决方案很简单:


std::string str("some-text-to-split");
char seps[] = "-";
char *token;

token = strtok( &str[0], seps );
while( token != NULL )
{
   /* Do your thing */
   token = strtok( NULL, seps );
}

Read the discussion on http://www.archivum.info/comp.lang.c++/2008-05/02889/does_std::string_have_something_like_CString::GetBuffer

阅读讨论 http://www.archivum.info/comp.lang.c++/2008-05/02889/does_std::string_have_something_like_CString::GetBuffer

回答by philant

EDIT: usage of const cast is onlyused to demonstrate the effect of strtok()when applied to a pointer returned by string::c_str().

编辑:const cast 的使用用于演示strtok()应用于 string::c_str() 返回的指针时的效果。

You should not use strtok()since it modifies the tokenized string which may lead to undesired, if not undefined, behaviour as the C string "belongs" to the string instance.

不应该使用 strtok()它,因为它修改了标记化的字符串,这可能会导致不希望的(如果不是未定义的)行为,因为 C 字符串“属于”字符串实例。

#include <string>
#include <iostream>

int main(int ac, char **av)
{
    std::string theString("hello world");
    std::cout << theString << " - " << theString.size() << std::endl;

    //--- this cast *only* to illustrate the effect of strtok() on std::string 
    char *token = strtok(const_cast<char  *>(theString.c_str()), " ");

    std::cout << theString << " - " << theString.size() << std::endl;

    return 0;
}

After the call to strtok(), the space was "removed" from the string, or turned down to a non-printable character, but the length remains unchanged.

调用 之后strtok(),空格从字符串中“删除”,或变为不可打印的字符,但长度保持不变。

>./a.out
hello world - 11
helloworld - 11

Therefore you have to resort to native mechanism, duplication of the string or an third party library as previously mentioned.

因此,您必须求助于本机机制、重复字符串或如前所述的第三方库。

回答by PhiLho

I suppose the language is C, or C++...

我想语言是 C 或 C++...

strtok, IIRC, replace separators with \0. That's what it cannot use a const string. To workaround that "quickly", if the string isn't huge, you can just strdup() it. Which is wise if you need to keep the string unaltered (what the const suggest...).

strtok,IIRC,用 \0 替换分隔符。这就是它不能使用 const 字符串的原因。为了“快速”解决这个问题,如果字符串不是很大,你可以只使用 strdup() 它。如果您需要保持字符串不变(常量建议......),这是明智的。

On the other hand, you might want to use another tokenizer, perhaps hand rolled, less violent on the given argument.

另一方面,您可能想要使用另一个标记器,也许是手卷的,对给定的论点不那么暴力。

回答by Sherm Pendley

Assuming that by "string" you're talking about std::string in C++, you might have a look at the Tokenizerpackage in Boost.

假设您通过“字符串”谈论 C++ 中的 std::string,您可能会查看Boost中的Tokenizer包。

回答by Martin York

First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.

首先我会说使用boost tokenizer。
或者,如果您的数据是空格分隔的,那么字符串流库非常有用。

But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.

但是以上两个都已经涵盖了。
因此,作为第三个类似 C 的替代方案,我建议将 std::string 复制到缓冲区中进行修改。

std::string   data("The data I want to tokenize");

// Create a buffer of the correct length:
std::vector<char>  buffer(data.size()+1);

// copy the string into the buffer
strcpy(&buffer[0],data.c_str());

// Tokenize
strtok(&buffer[0]," ");

回答by Scott Yeager

If you don't mind open source, you could use the subbuffer and subparser classes from https://github.com/EdgeCast/json_parser. The original string is left intact, there is no allocation and no copying of data. I have not compiled the following so there may be errors.

如果您不介意开源,则可以使用https://github.com/EdgeCast/json_parser 中的 subbuffer 和 subparser 类。原始字符串保持不变,没有分配也没有复制数据。我没有编译以下内容,所以可能有错误。

std::string input_string("hello world");
subbuffer input(input_string);
subparser flds(input, ' ', subparser::SKIP_EMPTY);
while (!flds.empty())
{
    subbuffer fld = flds.next();
    // do something with fld
}

// or if you know it is only two fields
subbuffer fld1 = input.before(' ');
subbuffer fld2 = input.sub(fld1.length() + 1).ltrim(' ');

回答by user7860670

With C++17 str::stringreceives data()overload that returns a pointer to modifieable buffer so string can be used in strtokdirectly without any hacks:

使用 C++17str::string接收data()重载,该重载返回指向可修改缓冲区的指针,因此可以直接使用字符串strtok而无需任何技巧:

#include <string>
#include <iostream>
#include <cstring>
#include <cstdlib>

int main()
{
    ::std::string text{"pop dop rop"};
    char const * const psz_delimiter{" "};
    char * psz_token{::std::strtok(text.data(), psz_delimiter)};
    while(nullptr != psz_token)
    {
        ::std::cout << psz_token << ::std::endl;
        psz_token = std::strtok(nullptr, psz_delimiter);
    }
    return EXIT_SUCCESS;
}

output

输出

pop
dop
rop

流行
DOP
ROP