C语言如何有效地计算C中字符串的长度？

Question

提问by Carla álvarez

How to calculate the length of a string in C efficiently (in time)?

如何有效地（及时）计算C中字符串的长度？

Right now I'm doing:

现在我正在做：

int calculate_length(char *string) {
    int length = 0;
    while (string[length] != 'size_t
strlen(const char *str)
{
    const char *s;
    for (s = str; *s; ++s);
    return(s - str);
}
') {
        length++;
    }
    return length;
}

But it's very slow compared to strlen() for example, is there any other way to do it?

但是与 strlen() 相比，它非常慢，有没有其他方法可以做到？

Thanks.

谢谢。

EDIT: I'm working in a freestanding environment, I'm not allowed to use any external lib including "string.h".

编辑：我在独立环境中工作，不允许使用任何外部库，包括“string.h”。

Answer 1

回答by Andomar

From the FreeBSD source code:

从FreeBSD 源代码：

size_t
DEFUN(strlen, (str), CONST char *str)
{
  int cnt;

  asm("cld\n"                   /* Search forward.  */
      /* Some old versions of gas need `repne' instead of `repnz'.  */
      "repnz\n"                 /* Look for a zero byte.  */
      "scasb" /* %0, %1, %3 */ :
      "=c" (cnt) : "D" (str), "0" (-1), "a" (0));

  return -2 - cnt;
}

Compared to your code, this probably maps very nicely to an assembler instruction, which can explain a big performance difference.

与您的代码相比，这可能很好地映射到汇编指令，这可以解释巨大的性能差异。

Answer 2

回答by Sudhanshu

Take a look at the source code of strlen in the standard libc. Functions in standard libraries are generally highly optimized. Check it out here(coded in assembly) - this is from the GNU libc.

看一下标准libc中strlen的源码。标准库中的函数通常是高度优化的。在这里查看（在汇编中编码） - 这是来自 GNU libc。

size_t my_strlen(const char *s)
{
  const char *anchor = s;

  while(*s)
   s++;

  return s - anchor;
}

Answer 3

回答by aib

strlen(). Odds are, if somebody had found a better, faster generic method, strlen would have been replaced with that.

strlen(). 很有可能，如果有人找到了更好、更快的通用方法， strlen 就会被替换。

Answer 4

回答by Michael Burr

Take a look at GNU C library's strlen()source.

查看GNU C 库的strlen()源代码。

It uses a number of non-obvious tricks to gain speed without dropping to assembly, including:

它使用了许多非显而易见的技巧来提高速度而不需要组装，包括：

getting to a character that's properly aligned
reading those aligned parts of the string into an int (or some larger datatype) to read several chars at a time
using bit twiddling tricks to check if one of the chars embedded in that block of chars is zero

找到一个正确对齐的角色
将字符串的那些对齐部分读入 int（或一些更大的数据类型）以一次读取多个字符
使用位操作技巧检查嵌入在该字符块中的字符之一是否为零

etc.

等等。

Answer 5

回答by Clifford

C strings are intrinsically inefficient, there are two reasons for using the ASCIZ convention:

C 字符串本质上是低效的，使用 ASCII 约定有两个原因：

The standard C library uses it
The compiler uses it for literal string constants

标准 C 库使用它
编译器将它用于文字字符串常量

The first of these is academic in this instance since you are not using the standard library, the second is easily overcome by creating functions or macros that provide conversions from C strings to a more efficient convention such as Pascal strings. The point is you need not be a slave to the C convention if you are not using the C library.

在这种情况下，第一个是学术性的，因为您没有使用标准库，第二个很容易通过创建函数或宏来克服，这些函数或宏提供从 C 字符串到更有效的约定（如 Pascal 字符串）的转换。关键是如果您不使用 C 库，则不必成为 C 约定的奴隶。

Answer 6

回答by unwind

The easiest way is to call strlen(). Seriously. It's already optimized by your compiler and/or library vendors, to be as fast as possible for your architecture.

最简单的方法是调用strlen(). 严重地。您的编译器和/或库供应商已经对其进行了优化，以尽可能快地适应您的架构。

One common optimization is to remove the need to increase a counter, and compute the length from the pointer:

一种常见的优化是不需要增加计数器，并从指针计算长度：

#include <stdio.h>
size_t strlenNew(char *s);

int main(int argc, char* argv[])
{
    printf("Size of \"Hello World\" is ::\t%d",strlenNew("Hello World"));
    return 0;
}

size_t strlenNew(char *s)
{
    register int i=0;
    while(s[i]!='int longitud(char cad[]){

    int i, cont;

    cont = 0;

    for(i = 0; i < 30 && cad[i] != '##代码##'; i++){
        if(cad[i] != '##代码##'){
            if(cad[i] != ' '){
                cont++;
            }
        }
    }
    cont--;
    return cont;
}
') i++;
    return i;
}

Answer 7

回答by qbitty

Yet another way to speed up char counting is to use vectorization!

另一种加快字符计数的方法是使用矢量化！

Here's an example of how to do this with respect to UTF8-encoded strings:

以下是如何针对 UTF8 编码字符串执行此操作的示例：

Even faster UTF-8 character counting,

更快的 UTF-8 字符计数，

http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

Answer 8

回答by bortzmeyer

On i386 processors, libc often use an ultra-optimized version of strlen, often written in assembly language. The paper "String Length" explains how they work.

在 i386 处理器上，libc 通常使用的超优化版本strlen，通常用汇编语言编写。论文“字符串长度”解释了它们的工作原理。

Here is one optimized version for OpenBSD. (They also have a portable version.) Here is the version for the GNU libc.

这是OpenBSD 的一个优化版本。（他们也有一个便携版。）这是GNU libc的版本。

Answer 9

回答by Binayaka Chakraborty

Some of the above answers are very good, and this is my take. There is a keyword known as "register"

上面的一些答案非常好，这是我的看法。有一个关键字叫做“注册”

##代码##

Read here: http://gustedt.wordpress.com/2010/08/17/a-common-misconsception-the-register-keyword/and http://msdn.microsoft.com/en-us/library/482s4fy9(v=vs.80).aspx

在这里阅读：http: //gustedt.wordpress.com/2010/08/17/a-common-misconsception-the-register-keyword/和http://msdn.microsoft.com/en-us/library/482s4fy9( v=vs.80).aspx

From the first link:

从第一个链接：

This can be particularly useful for array variables. An array variable is easily confounded with a pointer variable. Unless it is followed by a [expr] or with a sizeof it evaluates to the address of the first element. If you declare the array register all these uses are forbidden; we only access individual elements or ask for the total size. Such an register-array then may be much easier used as if it just were a set of variable by the optimizer. No aliasing (accessing the same variable through different pointers) may occur.

这对于数组变量特别有用。数组变量很容易与指针变量混淆。除非它后面跟有 [expr] 或 sizeof，否则它会评估为第一个元素的地址。如果您声明数组寄存器，则禁止所有这些用途；我们只访问单个元素或要求总大小。这样的寄存器数组可能更容易使用，就好像它只是优化器的一组变量一样。不会出现别名（通过不同的指针访问同一个变量）。

Thus, sometimes, there may be performance fluctuations. Personally, this is one of my fav implementations, but Sudhanshu and Andomar also provide a good implementation :)

因此，有时可能会出现性能波动。就个人而言，这是我最喜欢的实现之一，但 Sudhanshu 和 Andomar 也提供了一个很好的实现:)

Answer 10

回答by Victor26567

I had the same problem, and I resolved it. The key is the 2nd condition of the for loop:

我有同样的问题，我解决了它。关键是 for 循环的第二个条件：

##代码##

C语言如何有效地计算C中字符串的长度？

提问by Carla álvarez

回答by Andomar

回答by Sudhanshu

回答by aib

回答by Michael Burr

回答by Clifford

回答by unwind

回答by qbitty

回答by bortzmeyer

回答by Binayaka Chakraborty

回答by Victor26567

相关推荐

最近更新

标签

C语言 如何有效地计算C中字符串的长度？

提问by Carla álvarez

回答by Andomar

回答by Sudhanshu

回答by aib

回答by Michael Burr

回答by Clifford

回答by unwind

回答by qbitty

回答by bortzmeyer

回答by Binayaka Chakraborty

回答by Victor26567

相关推荐

C语言 如何将（纬度，经度）转换为（x，y）坐标？

C语言 C 编程：如何从二进制文件中读取和打印一个字节？

C语言 在 C 中读取 .CSV 文件

C语言 将文本文件读入 C 缓冲区的正确方法？

相关推荐

最近更新

标签

C语言如何有效地计算C中字符串的长度？

C语言如何将（纬度，经度）转换为（x，y）坐标？

C语言在 C 中读取 .CSV 文件

C语言将文本文件读入 C 缓冲区的正确方法？