C语言 好的 C 字符串库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4688041/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 07:31:38  来源:igfitidea点击:

Good C string library

cstringsearchio

提问by chamakits

I recently got inspired to start up a project I've been wanting to code for a while. I want to do it in C, because memory handling is key this application. I was searching around for a good implementation of strings in C, since I know me doing it myself could lead to some messy buffer overflows, and I expect to be dealing with a fairly big amount of strings.

我最近受到启发,开始了一个我一直想编写代码的项目。我想用 C 来做,因为内存处理是这个应用程序的关键。我正在四处寻找 C 中字符串的良好实现,因为我知道我自己这样做可能会导致一些混乱的缓冲区溢出,并且我希望处理相当多的字符串。

I found thisarticle which gives details on each, but they each seem like they have a good amount of cons going for them (don't get me wrong, this article is EXTREMELY helpful, but it still worries me that even if I were to choose one of those, I wouldn't be using the best I can get). I also don't know how up to date the article is, hence my current plea.

我发现这篇文章详细介绍了每一个,但他们每个人似乎都有很多缺点(不要误会我的意思,这篇文章非常有帮助,但它仍然让我担心,即使我要选择其中之一,我不会使用我能得到的最好的)。我也不知道这篇文章的最新情况,因此我目前的请求。

What I'm looking for is something that may hold a large amount of characters, and simplifies the process of searching through the string. If it allows me to tokenize the string in any way, even better. Also, it should have some pretty good I/O performance. Printing, and formatted printing isn't quite a top priority. I know I shouldn't expect a library to do all the work for me, but was just wandering if there was a well documented string function out there that could save me some time and some work.

我正在寻找的是可能包含大量字符的东西,并简化了搜索字符串的过程。如果它允许我以任何方式标记字符串,那就更好了。此外,它应该有一些相当不错的 I/O 性能。打印和格式化打印并不是重中之重。我知道我不应该指望一个库为我做所有的工作,但是如果有一个记录良好的字符串函数可以为我节省一些时间和一些工作,我只是在徘徊。

Any help is greatly appreciated. Thanks in advance!

任何帮助是极大的赞赏。提前致谢!

EDIT: I was asked about the license I prefer. Any sort of open source license will do, but preferably GPL (v2 or v3).

编辑:我被问到我喜欢的许可证。任何类型的开源许可证都可以,但最好是 GPL(v2 或 v3)。

EDIt2: I found betterString (bstring) library and it looks pretty good. Good documentation, small yet versatile amount of functions, and easy to mix with c strings. Anyone have any good or bad stories about it? The only downside I've read about it is that it lacks Unicode (again, read about this, haven't seen it face to face just yet), but everything else seems pretty good.

EDit2:我找到了 BetterString (bstring) 库,它看起来很不错。良好的文档,小而通用的函数数量,并且易于与 c 字符串混合。大家有什么关于它的好或坏的故事吗?我读过的唯一缺点是它缺少 Unicode(再次阅读这个,还没有面对面看过它),但其他一切似乎都很好。

EDIT3: Also, preferable that its pure C.

EDIT3:另外,最好是它的纯 C。

回答by Steinway Wu

It's an old question, I hope you have already found a useful one. In case you didn't, please check out the Simple Dynamic Stringlibrary on github. I copy&pastethe author's description here:

这是一个老问题,我希望你已经找到了一个有用的问题。如果没有,请查看github上的Simple Dynamic String库。我在这里复制并粘贴作者的描述:

SDS is a string library for C designed to augment the limited libc string handling functionalities by adding heap allocated strings that are:

SDS 是 C 的字符串库,旨在通过添加堆分配的字符串来增强有限的 libc 字符串处理功能,这些字符串是:

  • Simpler to use.
  • Binary safe.
  • Computationally more efficient.
  • But yet... Compatible with normal C string functions.
  • 使用更简单。
  • 二进制安全。
  • 计算效率更高。
  • 但是...与普通的 C 字符串函数兼容。

This is achieved using an alternative design in which instead of using a C structure to represent a string, we use a binary prefix that is stored before the actual pointer to the string that is returned by SDS to the user.

这是使用替代设计实现的,在该设计中,我们不使用 C 结构来表示字符串,而是使用二进制前缀,该前缀存储在 SDS 返回给用户的实际字符串指针之前。

+--------+-------------------------------+-----------+
| Header | Binary safe C alike string... | Null term |
+--------+-------------------------------+-----------+
         |
         `-> Pointer returned to the user.

Because of meta data stored before the actual returned pointer as a prefix, and because of every SDS string implicitly adding a null term at the end of the string regardless of the actual content of the string, SDS strings work well together with C strings and the user is free to use them interchangeably with real-only functions that access the string in read-only.

由于元数据存储在实际返回的指针之前作为前缀,并且由于每个 SDS 字符串在字符串末尾隐式添加一个空项而不管字符串的实际内容,SDS 字符串与 C 字符串和用户可以自由地将它们与以只读方式访问字符串的仅实函数互换使用。

回答by R.. GitHub STOP HELPING ICE

I would suggest not using any library aside from malloc, free, strlen, memcpy, and snprintf. These functions give you all of the tools for powerful, safe, and efficient string processing in C. Just stay away from strcpy, strcat, strncpy, and strncat, all of which tend to lead to inefficiency and exploitable bugs.

我建议不要使用除了malloc, free, strlen, memcpy, 和之外的任何库snprintf。这些功能给你所有的工具,功能强大,安全,高效的字符串处理C.刚刚远离strcpystrcatstrncpy,和strncat,这一切都会导致效率低下和利用的bug。

Since you mentioned searching, whatever choice of library you make, strchrand strstrare almost certainly going to be what you want to use. strspnand strcspncan also be useful.

既然你提到的搜索,你做任何图书馆的选择,strchr并且strstr几乎肯定会是您要使用的东西。strspn并且strcspn也很有用。

回答by archimedes

Please check milkstrings.
Sample code :

请检查乳绳
示例代码:

int main(int argc, char * argv[]) {
  tXt s = "123,456,789" ;
  s = txtReplace(s,"123","321") ; // replace 123 by 321
  int num = atoi(txtEat(&s,',')) ; // pick the first number
  printf("num = %d s = %s \n",num,s) ;
  s = txtPrintf("%s,%d",s,num) ; // printf in new string
  printf("num = %d s = %s \n",num,s) ;
  s = txtConcat(s,"<-->",txtFlip(s),NULL) ; // concatenate some strings
  num = txtPos(s,"987") ; // find position of substring
  printf("num = %d s = %s \n",num,s) ;
  if (txtAnyError()) { //check for errors
    printf("%s\n",txtLastError()) ;
    return 1 ; }
  return 0 ;
  }

回答by DevSolar

If you really want to get it right from the beginning, you should look at ICU, i.e. Unicode support, unless you are sureyour strings will never hold anything but plain ASCII-7... Searching, regular expressions, tokenization is all in there.

如果你真的想从一开始就正确,你应该看看ICU,即 Unicode 支持,除非你确定你的字符串除了普通的 ASCII-7 之外永远不会保存任何东西......搜索,正则表达式,标记化都在那里.

Of course, going C++ would make things mucheasier, but even then my recommendation of ICU would stand.

当然,使用 C++ 会使事情变得容易得多,但即便如此,我对 ICU 的推荐仍然有效。

回答by Pedro Vicente

I faced this problem recently, the need for appending a string with millions of characters. I ended up doing my own.

我最近遇到了这个问题,需要附加一个包含数百万个字符的字符串。我最终做了我自己的。

It is simply a C array of characters, encapsulated in a class that keeps track of array size and number of allocated bytes.

它只是一个 C 字符数组,封装在一个类中,该类跟踪数组大小和分配的字节数。

The performance compared to SDS and std::string is 10 times faster with the benchmark below

与 SDS 和 std::string 相比,性能比以下基准快 10 倍

at

https://github.com/pedro-vicente/table-string

https://github.com/pedro-vicente/table-string

Benchmarks

基准

For Visual Studio 2015, x86 debug build:

对于 Visual Studio 2015,x86 调试版本:

| API                   | Seconds           
| ----------------------|----| 
| SDS                   | 19 |  
| std::string           | 11 |  
| std::string (reserve) | 9  |  
| table_str_t           | 1  |  

clock_gettime_t timer;
const size_t nbr = 1000 * 1000 * 10;
const char* s = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";
size_t len = strlen(s);
timer.start();
table_str_t table(nbr *len);
for (size_t idx = 0; idx < nbr; ++idx)
{
  table.add(s, len);
}
timer.now("end table");
timer.stop();

EDIT Maximum performance is achieved by allocating the string all at start (constructor parameter size). If a fraction of total size is used, performance drops. Example with 100 allocations:

编辑通过在开始时分配所有字符串(构造函数参数大小)来实现最大性能。如果使用总大小的一小部分,则性能会下降。100 个分配的示例:

std::string benchmark append string of size 33, 10000000 times
end str:        11.0 seconds    11.0 total
std::string reserve benchmark append string of size 33, 10000000 times
end str reserve:        10.0 seconds    10.0 total
table string benchmark with pre-allocation of 330000000 elements
end table:      1.0 seconds     1.0 total
table string benchmark with pre-allocation of ONLY 3300000 elements, allocation is MADE 100 times...patience...
end table:      9.0 seconds     9.0 total

回答by SomethingSomething

I also found a need for an external C string library, as I find the <string.h>functions very unefficient, for example:

我还发现需要一个外部 C 字符串库,因为我发现这些<string.h>函数效率很低,例如:

  • strcat()can be very expensive in performance, as it has to find the '\0' char each time you concatenate a string
  • strlen()is expensive, as again, it has to find the '\0' char instead of just reading a maintained lengthvariable
  • The char array is of course not dynamic and can cause very dangerous bugs (a crash on segmentation fault can be the good scenario when you overflow your buffer)
  • strcat()性能可能非常昂贵,因为每次连接字符串时它都必须找到 '\0' 字符
  • strlen()很昂贵,同样,它必须找到 '\0' 字符,而不仅仅是读取维护的length变量
  • char 数组当然不是动态的,可能会导致非常危险的错误(当缓冲区溢出时,分段错误崩溃可能是一个很好的场景)

The solution should be a library that does not contain only functions, but also contains a struct that wraps the string and that enables to store important fields such as lengthand buffer-size

解决方案应该是一个库,它不仅包含函数,而且还包含一个包装字符串的结构,并且能够存储重要的字段,例如lengthbuffer-size

I looked for such libraries over the web and found the following:

我在网上寻找这样的库,发现以下内容:

  1. GLib String library (should be best standard solution) - https://developer.gnome.org/glib/stable/glib-Strings.html
  2. http://locklessinc.com/articles/dynamic_cstrings/
  3. http://bstring.sourceforge.net/
  1. GLib 字符串库(应该是最好的标准解决方案) - https://developer.gnome.org/glib/stable/glib-Strings.html
  2. http://locklessinc.com/articles/dynamic_cstrings/
  3. http://bstring.sourceforge.net/

Enjoy

享受