在 C/C++ 中高效地在十六进制、二进制和十进制之间转换

Question

提问by Igor Oks

I have 3 base representations for positive integer numbers:

我有 3 个正整数的基本表示：

Decimal, in unsigned long variable (e.g. unsigned long int NumDec = 200).
Hex, in string variable (e.g. string NumHex = "C8")
Binary, in string variable (e.g. string NumBin = "11001000")

十进制，在 unsigned long 变量中（例如unsigned long int NumDec = 200）。
十六进制，在字符串变量中（例如字符串 NumHex = "C8"）
二进制，在字符串变量中（例如字符串 NumBin = "11001000"）

I want to be able to convert between numbers in all 3 representations in the most efficient way. I.e. to implement the following 6 functions:

我希望能够以最有效的方式在所有 3 种表示中的数字之间进行转换。即实现以下6个功能：

unsigned long int Binary2Dec(const string & Bin) {}
unsigned long int Hex2Dec(const string & Hex) {}
string Dec2Hex(unsigned long int Dec) {}
string Binary2Hex(const string & Bin) {}
string Dec2Binary(unsigned long int Dec) {}
string Hex2Binary(const string & Hex) {}

What is the most efficient approach for each of them? I can use C and C++, but not boost.

对他们来说最有效的方法是什么？我可以使用 C 和 C++，但不能使用 boost。

Edit: By "efficiency" I mean time efficiency: Shortest execution time.

编辑：“效率”是指时间效率：最短的执行时间。

Answer 1

采纳答案by coryan

As others have pointed out, I would start with sscanf(), printf()and/or strtoul(). They are fast enough for most applications, and they are less likely to have bugs. I will say, however, that these functions are more generic than you might expect, as they have to deal with non-ASCII character sets, with numbers represented in any base and so forth. For some domains it is possible to beat the library functions.

正如其他人指出的那样，我将从sscanf(),printf()和/或开始strtoul()。它们对于大多数应用程序来说足够快，并且它们不太可能出现错误。然而，我要说的是，这些函数比你想象的更通用，因为它们必须处理非 ASCII 字符集，数字以任何基数表示等等。对于某些领域，可以击败库函数。

So, measure first, and if the performance of these conversion is really an issue, then:

所以，先衡量一下，如果这些转换的性能真的是一个问题，那么：

1) In some applications / domains certain numbers appear very often, for example zero, 100, 200, 19.95, may be so common that it makes sense to optimize your functions to convert such numbers with a bunch of if() statements, and then fall back to the generic library functions. 2) Use a table lookup if the most common 100 numbers, and then fall back on a library function. Remember that large tables may not fit in your cache and may require multiple indirections for shared libraries, so measure these things carefully to make sure you are not decreasing performance.

1) 在某些应用程序/域中，某些数字经常出现，例如 0、100、200、19.95，可能非常常见，因此优化您的函数以使用一堆 if() 语句转换这些数字是有意义的，然后回到通用库函数。2）如果最常见的100个数字使用表查找，然后回退到库函数。请记住，大表可能不适合您的缓存，并且可能需要共享库的多个间接访问，因此请仔细衡量这些内容以确保不会降低性能。

You may also want to look at boost lexical_cast functions, though in my experience the latter are relatively compared to the good old C functions.

您可能还想查看 boost lexical_cast 函数，但根据我的经验，后者与旧的 C 函数相对比较。

Tough many have said it, it is worth repeating over and over: do not optimize these conversions until you have evidence that they are a problem. If you do optimize, measure your new implementation to make sure it is faster andmake sure you have a ton of unit tests for your own version, because you will introduce bugs :-(

很多人都说过，值得一再重复：在您有证据表明它们存在问题之前，不要优化这些转换。如果您进行了优化，请衡量您的新实现以确保它更快，并确保您对自己的版本进行大量单元测试，因为您会引入错误:-(

Answer 2

回答by Robert S. Barnes

I would suggest just using sprintfand sscanf.

我建议只使用sprintf和sscanf。

Also, if you're interested in how it's implemented you can take a look at the source codefor glibc, the GNU C Library.

另外，如果你有兴趣它是如何实现的，你可以看看的源代码进行的glibc，GNU C库。

Answer 3

回答by David Thornley

Why do these routines have to be so time-efficient? That sort of claim always makes me wonder. Are you sure the obvious conversion methods like strtol() are too slow, or that you can do better? System functions are usually pretty efficient. They are sometimes slower to support generality and error-checking, but you need to consider what to do with errors. If a binargument has characters other than '0' and '1', what then? Abort? Propagate massive errors?

为什么这些例程必须如此省时？这种说法总是让我感到疑惑。您确定像 strtol() 这样明显的转换方法太慢，还是可以做得更好？系统功能通常非常有效。它们有时支持通用性和错误检查的速度较慢，但您需要考虑如何处理错误。如果一个bin参数有除“0”和“1”以外的字符，然后呢？中止？传播大量错误？

Why are you using "Dec" to represent the internal representation? Dec, Hex, and Bin should be used to refer to the string representations. There's nothing decimal about an unsigned long. Are you dealing with strings showing the number in decimal? If not, you're confusing people here and are going to confuse many more.

为什么使用“Dec”来表示内部表示？Dec、Hex 和 Bin 应该用于指代字符串表示。没有小数点关于unsigned long. 您是否正在处理以十进制显示数字的字符串？如果不是，您会在这里混淆人们，并且会混淆更多人。

The transformation between binary and hex text formats can be done quickly and efficiently, with lookup tables, but anything involving decimal text format will be more complicated.

二进制和十六进制文本格式之间的转换可以通过查找表快速有效地完成，但任何涉及十进制文本格式的内容都会更加复杂。

Answer 4

回答by unwind

That depends on what you're optimizing for, what do you mean by "efficient"? Is it important that the conversions be fast, use little memory, little programmer time, fewer WTFsfrom other programmers reading the code, or what?

这取决于您要优化什么，您所说的“高效”是什么意思？转换速度快、占用内存少、程序员时间少、其他程序员阅读代码的WTF少还是什么重要？

For readability and ease of implementation, you should at least implement both Dec2Hex()and Dec2Binary()by just calling strotul(). That makes them into one-liners, which is very efficient for at least some of the above interpretations of the word.

对于可读性和易于实施的，你至少应该同时实现Dec2Hex()，并Dec2Binary()通过只调用strotul()。这使它们成为单行，这对于至少对这个词的一些上述解释是非常有效的。

Answer 5

回答by Dima

Sounds very much like a homework problem, but what the heck...

听起来很像家庭作业问题，但到底是什么……

The short answer is for converting from long int to your strings use two lookup tables. Each table should have 256 entries. One maps a byte to a hex string: 0 -> "00", 1 -> "01", etc. The other maps a byte to a bit string: 0 -> "00000000", 1 -> "00000001".

简短的回答是使用两个查找表将 long int 转换为您的字符串。每个表应该有 256 个条目。一个将字节映射到十六进制字符串：0 -> "00", 1 -> "01" 等。另一个将字节映射到位字符串：0 -> "00000000", 1 -> "00000001"。

Then for each byte in your long int you just have to look up the correct string, and concatenate them.

然后对于 long int 中的每个字节，您只需查找正确的字符串，并将它们连接起来。

To convert from strings back to long you can simply convert the hex string and the bit string back to a decimal number by multiplying the numeric value of each character by the appropriate power of 16 or 2, and summing up the results.

要将字符串转换回 long，您可以简单地将十六进制字符串和位字符串转换回十进制数，方法是将每个字符的数值乘以 16 或 2 的适当幂，然后总结结果。

EDIT: You can also use the same lookup tables for backwards conversion by doing binary search to find the right string. This would take log(256) = 8 comparisons of your strings. Unfortunately I don't have time to do the analysis whether comparing strings would be much faster than multiplying and adding integers.

编辑：您还可以使用相同的查找表进行向后转换，通过二分搜索找到正确的字符串。这需要对字符串进行 log(256) = 8 次比较。不幸的是，我没有时间分析比较字符串是否比整数相乘和相加快得多。

Answer 6

回答by plinth

Let's think about half of task for a moment - converting from a string-ized base n to unsigned long, where n is a power of 2 (base 2 for binary and base 16 for hex).

让我们考虑一下任务的一半 - 从字符串化的基数 n 转换为无符号长整型，其中 n 是 2 的幂（二进制为基数 2，十六进制为基数 16）。

If your input is sane, then this work is nothing more than a compare, a subract, a shift and an or per digit. If your input is not sane, well, that's where it gets ugly, doesn't it? Doing the conversion superfast is not hard. Doing it well under all circumstances is the challenge.

如果您的输入是理智的，那么这项工作只不过是一个比较、一个减法、一个移位和一个或每个数字。如果你的输入不理智，那么它就会变得丑陋，不是吗？进行超快速转换并不难。在任何情况下都做得好是一项挑战。

So let's assume that your input is sane, then the heart of your conversion is this:

因此，让我们假设您的输入是理智的，那么您转换的核心是：

unsigned long PowerOfTwoFromString(char *input, int shift)
{
    unsigned long val = 0;
    char upperLimit = 'a' + (1 << shift)
    while (*input) {
        char c = tolower(*input++);
        unsigned long digit = (c > 'a' && c < upperLimit) ? c - 'a' + 10 : c - '0';
        val = (val << shift) | digit;
    }
    return val;
 }

 #define UlongFromBinaryString(str) PowerOfTwoFromString(str, 1)
 #define UlongFromHexString(str) PowerOfTwoFromString(str, 4)

See how easy that is? And it will fail on non-sane inputs. Most of your work is going to go into making your input sane, not performance.

看看这有多容易？它会在非理智的输入上失败。您的大部分工作都将用于使您的输入合理，而不是性能。

Now, this code takes advantage of power of two shifting. It's easy to extend to base 4, base 8, base 32, etc. It won't work on non-power of two bases. For those, your math has to change. You get

现在，这段代码利用了两次移位的能力。它很容易扩展到基数 4、基数 8、基数 32 等。它不适用于两个基数的非幂。对于那些，你的数学必须改变。你得到

val = (val * base) + digit

which is conceptually the same for this set of operations. The multiplication by the base is going to be equivalent to the shift. So I'd be as likely to use a fully general routine instead. And sanitize the code while sanitizing the inputs. And at that point, strtoul is probably your best bet. Here's a link to a versionof strtoul. Nearly all the work is handling edge conditions - that should clue you in on where you energies should be focused: correct, resilient code. The savings for using bit shifts is going to be minimal compared to the savings of say, not crashing on bad input.

这组操作在概念上是相同的。乘以基数将等同于移位。所以我可能会使用完全通用的例程来代替。并在清理输入的同时清理代码。在这一点上，strtoul 可能是你最好的选择。这是 strtoul版本的链接。几乎所有的工作都是处理边缘条件——这应该让你知道你的精力应该集中在哪里：正确的、有弹性的代码。与说的节省相比，使用位移的节省将是最小的，而不是在错误输入时崩溃。

Answer 7

回答by eaanon01

Why not just use a Macro to also take the format as an input. If you are in C at least.

为什么不只使用宏来将格式作为输入。如果你至少在 C 中。

#define TO_STRING( string, format, data) \
sprintf( string, "##format##", data)
// Int
TO_STRING(buf,%d,i);
// Hex ( Two char representation )
TO_STRING(buf,%02x,i);
// Binary
TO_STRING(buf,%b,i);

Or you can use sprintf directly: Or you can have multiple macroes.

或者你可以直接使用 sprintf：或者你可以有多个宏。

#define INT_STRING( buf, data) \
sprintf( buf, "%d", data)
#define HEX_STRING( buf, data) \
sprintf( buf, "%x", data)
#define BIN_TO_STRING( buf, data) \
sprintf( buf, "%b", data)

BIN_TO_STRING( loc_buf, my_bin );

在 C/C++ 中高效地在十六进制、二进制和十进制之间转换

提问by Igor Oks

采纳答案by coryan

回答by Robert S. Barnes

回答by David Thornley

回答by unwind

回答by Dima

回答by plinth

回答by eaanon01

相关推荐

最近更新

标签

在 C/C++ 中高效地在十六进制、二进制和十进制之间转换

提问by Igor Oks

采纳答案by coryan

回答by Robert S. Barnes

回答by David Thornley

回答by unwind

回答by Dima

回答by plinth

回答by eaanon01

相关推荐

如何为 Linux 构建 Visual C++ 项目？

C++ 使用类模板需要模板参数列表

C++ const unsigned char * 到 std::string

cvtColor 断言失败（OpenCV with C++）

相关推荐

最近更新

标签