C++ boost::lexical_cast 性能很差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1250795/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Very poor boost::lexical_cast performance
提问by Naveen
Windows XP SP3. Core 2 Duo 2.0 GHz. I'm finding the boost::lexical_cast performance to be extremely slow. Wanted to find out ways to speed up the code. Using /O2 optimizations on visual c++ 2008 and comparing with java 1.6 and python 2.6.2 I see the following results.
视窗 XP SP3。酷睿 2 双核 2.0 GHz。我发现 boost::lexical_cast 性能非常慢。想找出加快代码速度的方法。在 Visual c++ 2008 上使用 /O2 优化并与 java 1.6 和 python 2.6.2 进行比较我看到以下结果。
Integer casting:
整数转换:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(i);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
s = new Integer(i).toString();
}
python:
for i in xrange(1,10000000):
s = str(i)
The times I'm seeing are
我看到的时代是
c++: 6700 milliseconds
C++:6700 毫秒
java: 1178 milliseconds
java:1178 毫秒
python: 6702 milliseconds
蟒蛇:6702 毫秒
c++ is as slow as python and 6 times slower than java.
c++ 和 python 一样慢,比 java 慢 6 倍。
Double casting:
双铸:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(d);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
double d = i*1.0;
s = new Double(d).toString();
}
python:
for i in xrange(1,10000000):
d = i*1.0
s = str(d)
The times I'm seeing are
我看到的时代是
c++: 56129 milliseconds
C++:56129 毫秒
java: 2852 milliseconds
java:2852 毫秒
python: 30780 milliseconds
蟒蛇:30780 毫秒
So for doubles c++ is actually half the speed of python and 20 times slower than the java solution!!. Any ideas on improving the boost::lexical_cast performance? Does this stem from the poor stringstream implementation or can we expect a general 10x decrease in performance from using the boost libraries.
所以对于 doubles,c++ 实际上是 python 速度的一半,比 java 解决方案慢 20 倍!!。关于提高 boost::lexical_cast 性能的任何想法?这是否源于糟糕的 stringstream 实现,或者我们是否可以期望使用 boost 库将性能普遍降低 10 倍。
回答by paercebal
Edit 2012-04-11
编辑 2012-04-11
rvequite rightly commented about lexical_cast's performance, providing a link:
rve非常正确地评论了 lexical_cast 的性能,并提供了一个链接:
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html
I don't have access right now to boost 1.49, but I do remember making my code faster on an older version. So I guess:
我现在无法提升 1.49,但我确实记得在旧版本上使我的代码更快。所以我猜:
- the following answer is still valid (if only for learning purposes)
- there was probably an optimization introduced somewhere between the two versions (I'll search that)
- which means that boost is still getting better and better
- 以下答案仍然有效(如果仅用于学习目的)
- 可能在两个版本之间引入了优化(我会搜索)
- 这意味着boost仍然越来越好
Original answer
原答案
Just to add info on Barry's and Motti's excellent answers:
只是添加有关巴里和莫蒂出色答案的信息:
Some background
一些背景
Please remember Boost is written by the best C++ developers on this planet, and reviewed by the same best developers. If lexical_cast
was so wrong, someone would have hacked the library either with criticism or with code.
请记住 Boost 是由这个星球上最好的 C++ 开发人员编写的,并由相同的最好的开发人员进行。如果lexical_cast
真的错了,有人会用批评或代码来黑掉这个库。
I guess you missed the point of lexical_cast
's real value...
我猜你错过了它lexical_cast
的真正价值点...
Comparing apples and oranges.
比较苹果和橙子。
In Java, you are casting an integer into a Java String. You'll note I'm not talking about an array of characters, or a user defined string. You'll note, too, I'm not talking about your user-defined integer. I'm talking about strict Java Integer and strict Java String.
在 Java 中,您将整数转换为 Java 字符串。您会注意到我不是在谈论字符数组或用户定义的字符串。你也会注意到,我不是在谈论你的用户定义的整数。我说的是严格的 Java 整数和严格的 Java 字符串。
In Python, you are more or less doing the same.
在 Python 中,您或多或少也在做同样的事情。
As said by other posts, you are, in essence, using the Java and Python equivalents of sprintf
(or the less standard itoa
).
正如其他帖子所说,本质上,您使用的是 Java 和 Python 等价物sprintf
(或较不标准的itoa
)。
In C++, you are using a very powerful cast. Not powerful in the sense of raw speed performance (if you want speed, perhaps sprintf
would be better suited), but powerful in the sense of extensibility.
在 C++ 中,您使用的是非常强大的强制转换。在原始速度性能意义上并不强大(如果你想要速度,也许sprintf
会更适合),但在可扩展性意义上强大。
Comparing apples.
比较苹果。
If you want to compare a Java Integer.toString
method, then you should compare it with either C sprintf
or C++ ostream
facilities.
如果要比较 JavaInteger.toString
方法,则应将其与 Csprintf
或 C++ostream
工具进行比较。
The C++ stream solution would be 6 times faster (on my g++) than lexical_cast
, and quite less extensible:
C++ 流解决方案将比 快 6 倍(在我的 g++ 上)lexical_cast
,并且可扩展性更差:
inline void toString(const int value, std::string & output)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", value) ;
output = buffer ;
}
The C sprintf
solution would be 8 times faster (on my g++) than lexical_cast
but a lot less safe:
Csprintf
解决方案将比(在我的 g++ 上)快 8 倍,lexical_cast
但安全性要低得多:
inline void toString(const int value, char * output)
{
sprintf(output, "%i", value) ;
}
Both solutions are either as fast or faster than your Java solution (according to your data).
这两种解决方案都与您的 Java 解决方案一样快或更快(根据您的数据)。
Comparing oranges.
比较橙子。
If you want to compare a C++ lexical_cast
, then you should compare it with this Java pseudo code:
如果要比较 C++ lexical_cast
,则应将其与此 Java 伪代码进行比较:
Source s ;
Target t = Target.fromString(Source(s).toString()) ;
Source and Target being of whatever type you want, including built-in types like boolean
or int
, which is possible in C++ because of templates.
Source 和 Target 是您想要的任何类型,包括内置类型,例如boolean
or int
,由于模板,这在 C++ 中是可能的。
Extensibility? Is that a dirty word?
可扩展性?这是脏话吗?
No, but it has a well known cost: When written by the same coder, general solutions to specific problems are usually slower than specific solutions written for their specific problems.
不,但它有一个众所周知的成本:当由同一个编码器编写时,特定问题的通用解决方案通常比为其特定问题编写的特定解决方案慢。
In the current case, in a naive viewpoint, lexical_cast
will use the stream facilities to convert from a type A
into a string stream, and then from this string stream into a type B
.
在当前情况下,从幼稚的角度来看,lexical_cast
将使用流设施将类型A
转换为字符串流,然后从这个字符串流转换为类型B
。
This means that as long as your object can be output into a stream, and input from a stream, you'll be able to use lexical_cast
on it, without touching any single line of code.
这意味着只要您的对象可以输出到流中,并从流中输入,您就可以使用lexical_cast
它,而无需触及任何一行代码。
So, what are the uses of lexical_cast
?
那么,有什么用lexical_cast
呢?
The main uses of lexical casting are:
词法转换的主要用途是:
- Ease of use (hey, a C++ cast that works for everything being a value!)
- Combining it with template heavy code, where your types are parametrized, and as such you don't want to deal with specifics, and you don't want to know the types.
- Still potentially relatively efficient, if you have basic template knowledge, as I will demonstrate below
- 易于使用(嘿,C++ 类型转换,适用于所有值!)
- 将它与模板繁重的代码相结合,其中您的类型被参数化,因此您不想处理细节,也不想知道类型。
- 如果您有基本的模板知识,仍然可能相对高效,我将在下面演示
The point 2 is very very important here, because it means we have one and only one interface/function to cast a value of a type into an equal or similar value of another type.
第 2 点在这里非常重要,因为这意味着我们只有一个接口/函数可以将一种类型的值转换为另一种类型的相等或相似的值。
This is the real point you missed, and this is the point that costs in performance terms.
这是您错过的真正要点,也是性能方面的代价。
But it's so slooooooowwww!
但它是如此slooooooowwww!
If you want raw speed performance, remember you're dealing with C++, and that you have a lot of facilities to handle conversion efficiently, and still, keep the lexical_cast
ease-of-use feature.
如果您想要原始速度性能,请记住您正在处理 C++,并且您有很多工具可以有效地处理转换,并且仍然保持lexical_cast
易于使用的特性。
It took me some minutes to look at the lexical_cast source, and come with a viable solution. Add to your C++ code the following code:
我花了几分钟查看 lexical_cast 源代码,并提出了一个可行的解决方案。将以下代码添加到您的 C++ 代码中:
#ifdef SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
namespace boost
{
template<>
std::string lexical_cast<std::string, int>(const int &arg)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", arg) ;
return buffer ;
}
}
#endif
By enabling this specialization of lexical_cast for strings and ints (by defining the macro SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
), my code went 5 time faster on my g++ compiler, which means, according to your data, its performance should be similar to Java's.
通过为字符串和整数启用 lexical_cast 的这种特殊化(通过定义宏SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
),我的代码在我的 g++ 编译器上运行速度提高了 5 倍,这意味着,根据您的数据,它的性能应该与 Java 相似。
And it took me 10 minutes of looking at boost code, and write a remotely efficient and correct 32-bit version. And with some work, it could probably go faster and safer (if we had direct write access to the std::string
internal buffer, we could avoid a temporary external buffer, for example).
我花了 10 分钟查看 boost 代码,并编写了一个远程高效且正确的 32 位版本。通过一些工作,它可能会变得更快、更安全(例如,如果我们可以直接写入std::string
内部缓冲区,我们可以避免临时外部缓冲区)。
回答by Kirill V. Lyadvinsky
You could specialize lexical_cast
for int
and double
types. Use strtod
and strtol
in your's specializations.
你可以专注lexical_cast
于int
和double
类型。在您的专长中使用strtod
和strtol
。
namespace boost {
template<>
inline int lexical_cast(const std::string& arg)
{
char* stop;
int res = strtol( arg.c_str(), &stop, 10 );
if ( *stop != 0 ) throw_exception(bad_lexical_cast(typeid(int), typeid(std::string)));
return res;
}
template<>
inline std::string lexical_cast(const int& arg)
{
char buffer[65]; // large enough for arg < 2^200
ltoa( arg, buffer, 10 );
return std::string( buffer ); // RVO will take place here
}
}//namespace boost
int main(int argc, char* argv[])
{
std::string str = "22"; // SOME STRING
int int_str = boost::lexical_cast<int>( str );
std::string str2 = boost::lexical_cast<std::string>( str_int );
return 0;
}
This variant will be faster than using default implementation, because in default implementation there is construction of heavy stream objects. And it is should be little faster than printf
, because printf
should parse format string.
此变体将比使用默认实现更快,因为在默认实现中存在重流对象的构造。它应该比 快一点printf
,因为printf
应该解析格式字符串。
回答by Barry Kelly
lexical_cast
is more general than the specific code you're using in Java and Python. It's not surprising that a general approach that works in many scenarios (lexical cast is little more than streaming out then back in to and from a temporary stream) ends up being slower than specific routines.
lexical_cast
比您在 Java 和 Python 中使用的特定代码更通用。在许多情况下工作的通用方法(词法转换只不过是从临时流中流出然后再流入和流出)最终比特定例程慢,这并不奇怪。
(BTW, you may get better performance out of Java using the static version, Integer.toString(int)
. [1])
(顺便说一句,您可以使用静态版本从 Java 中获得更好的性能Integer.toString(int)
。[1])
Finally, string parsing and deparsing is usually not that performance-sensitive, unless one is writing a compiler, in which case lexical_cast
is probably too general-purpose, and integers etc. will be calculated as each digit is scanned.
最后,字符串解析和解解析通常对性能不是那么敏感,除非有人正在编写编译器,在这种情况下lexical_cast
可能过于通用,并且将在扫描每个数字时计算整数等。
[1] Commenter "stepancheg" doubted my hint that the static version may give better performance. Here's the source I used:
[1] 评论者“stepancheg”怀疑我暗示静态版本可能会提供更好的性能。这是我使用的来源:
public class Test
{
static int instanceCall(int i)
{
String s = new Integer(i).toString();
return s == null ? 0 : 1;
}
static int staticCall(int i)
{
String s = Integer.toString(i);
return s == null ? 0 : 1;
}
public static void main(String[] args)
{
// count used to avoid dead code elimination
int count = 0;
// *** instance
// Warmup calls
for (int i = 0; i < 100; ++i)
count += instanceCall(i);
long start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += instanceCall(i);
long finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
// *** static
// Warmup calls
for (int i = 0; i < 100; ++i)
count += staticCall(i);
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += staticCall(i);
finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
if (count == 42)
System.out.println("bad result"); // prevent elimination of count
}
}
The runtimes, using JDK 1.6.0-14, server VM:
运行时,使用 JDK 1.6.0-14,服务器 VM:
10MM Time taken: 688 ms
10MM Time taken: 547 ms
And in client VM:
在客户端虚拟机中:
10MM Time taken: 687 ms
10MM Time taken: 610 ms
Even though theoretically, escape analysis may permit allocation on the stack, and inlining may introduce all code (including copying) into the local method, permitting elimination of redundant copying, such analysis may take quite a lot of time and result in quite a bit of code space, which has other costs in code cache that don't justify themselves in real code, as opposed to microbenchmarks like seen here.
尽管理论上逃逸分析可能允许在栈上分配,内联可能将所有代码(包括复制)引入本地方法,允许消除冗余复制,但这种分析可能需要相当多的时间并导致相当多的代码空间,它在代码缓存中具有其他成本,这些成本在实际代码中无法证明自己的合理性,与此处看到的微基准测试相反。
回答by Barry Kelly
What lexical cast is doing in your code can be simplified to this:
您的代码中的词法转换可以简化为:
string Cast( int i ) {
ostringstream os;
os << i;
return os.str();
}
There is unfortunately a lot going on every time you call Cast():
不幸的是,每次调用 Cast() 时都会发生很多事情:
- a string stream is created possibly allocating memory
- operator << for integer i is called
- the result is stored in the stream, possibly allocating memory
- a string copy is taken from the stream
- a copy of the string is (possibly) created to be returned.
- memory is deallocated
- 创建一个字符串流可能分配内存
- 运算符 << 用于整数 i 被调用
- 结果存储在流中,可能会分配内存
- 从流中获取字符串副本
- 字符串的副本(可能)被创建以返回。
- 内存被释放
Thn in your own code:
然后在您自己的代码中:
s = Cast( i );
the assignment involves further allocations and deallocations are performed. You may be able to reduce this slightly by using:
分配涉及进一步的分配和解除分配。您可以使用以下方法稍微减少这种情况:
string s = Cast( i );
instead.
反而。
However, if performance is really importanrt to you, you should considerv using a different mechanism. You could write your own version of Cast() which (for example) creates a static stringstream. Such a version would not be thread safe, but that might not matter for your specific needs.
但是,如果性能对您来说真的很重要,您应该考虑使用不同的机制。您可以编写自己的 Cast() 版本,它(例如)创建一个静态字符串流。这样的版本不是线程安全的,但这对于您的特定需求可能无关紧要。
To summarise, lexical_cast is a convenient and useful feature, but such convenience comes (as it always must) with trade-offs in other areas.
总而言之, lexical_cast 是一个方便且有用的特性,但这种便利伴随着其他领域的权衡(它总是必须的)。
回答by dhardy
Unfortunately I don't have enough rep yet to comment...
不幸的是,我还没有足够的代表发表评论......
lexical_cast
is not primarily slow because it's generic (template lookups happen at compile-time, so virtual function calls or other lookups/dereferences aren't necessary). lexical_cast
is, in my opinion, slow, because it builds on C++ iostreams, which are primarily intended for streaming operations and not single conversions, and because lexical_cast
must check for and convert iostream error signals. Thus:
lexical_cast
主要不是很慢,因为它是通用的(模板查找发生在编译时,因此不需要虚拟函数调用或其他查找/取消引用)。lexical_cast
在我看来,它很慢,因为它建立在 C++ iostreams 上,它主要用于流操作而不是单个转换,并且因为lexical_cast
必须检查和转换 iostream 错误信号。因此:
- a stream object has to be created and destroyed
- in the string output case above, note that C++ compilers have a hard time avoiding buffer copies (an alternative is to format directly to the output buffer, like
sprintf
does, thoughsprintf
won't safely handle buffer overruns) lexical_cast
has to check forstringstream
errors (ss.fail()
) in order to throw exceptions on conversion failures
- 必须创建和销毁流对象
- 在上面的字符串输出情况下,请注意 C++ 编译器很难避免缓冲区复制(另一种方法是直接格式化到输出缓冲区,就像
sprintf
这样,但sprintf
不会安全地处理缓冲区溢出) lexical_cast
必须检查stringstream
错误 (ss.fail()
) 以便在转换失败时抛出异常
lexical_cast
is nice because (IMO) exceptions allow trapping all errors without extra effort and because it has a uniform prototype. I don't personally see why either of these properties necessitate slow operation (when no conversion errors occur), though I don't know of such C++ functions which are fast (possibly Spirit or boost::xpressive?).
lexical_cast
很好,因为(IMO)异常允许捕获所有错误而无需额外的努力,并且因为它具有统一的原型。我个人不明白为什么这些属性中的任何一个都需要慢速操作(当没有发生转换错误时),尽管我不知道这种快速的 C++ 函数(可能是 Spirit 或 boost::xpressive?)。
Edit: I just found a message mentioning the use of BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
to enable an "itoa" optimisation: http://old.nabble.com/lexical_cast-optimization-td20817583.html. There's also a linked articlewith a bit more detail.
编辑:我刚刚发现一条消息,提到使用BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
启用“itoa”优化:http: //old.nabble.com/lexical_cast-optimization-td20817583.html。还有一个链接文章,其中包含更多细节。
回答by jeff slesinger
lexical_castmay or may not be as slow in relation to Java and Python as your bencharks indicate because your benchmark measurements may have a subtle problem. Any workspace allocations/deallocations done by lexical cast or the iostream methods it uses are measured by your benchmarks because C++ doesn't defer these operations. However, in the case of Java and Python, the associated deallocations may in fact have simply been deferred to a future garbage collection cycle and missed by the benchmark measurements. (Unless a GC cycle by chance occurs while the benchmark is in progress and in that case you'd be measuring too much). So it's hard to know for sure without examining specifics of the Java and Python implementations how much "cost" should be attributed to the deferred GC burden that may (or may not) be eventually imposed.
lexical_cast相对于 Java 和 Python,可能会或可能不会像您的基准测试所表明的那样慢,因为您的基准测量可能有一个微妙的问题。由词法转换或它使用的 iostream 方法完成的任何工作区分配/解除分配都由您的基准测试衡量,因为 C++ 不会推迟这些操作。然而,在 Java 和 Python 的情况下,相关的释放实际上可能只是推迟到未来的垃圾收集周期,并被基准测量遗漏。(除非在进行基准测试时偶然发生 GC 循环,在这种情况下您会测量太多)。因此,如果不检查 Java 和 Python 实现的细节,就很难确定到底有多少“成本”应该归因于最终可能(或可能不会)施加的延迟 GC 负担。
This kind of issue obviously may apply to many other C++ vs garbage collected language benchmarks.
这种问题显然可能适用于许多其他 C++ 与垃圾收集语言基准测试。
回答by Motti
回答by t.g.
if speed is a concern, or you are just interested in how fast such casts can be with C++, there's an interested threadregarding it.
如果速度是一个问题,或者您只是对 C++ 的这种类型转换的速度感兴趣,那么有一个关于它的感兴趣的线程。
Boost.Spirit 2.1(which is to be released with Boost 1.40) seems to be very fast, even faster than the C equivalents(strtol(), atoi() etc. ).
Boost.Spirit 2.1(将与 Boost 1.40 一起发布)似乎非常快,甚至比 C 等价物(strtol()、atoi() 等)还要快。
回答by David Larner
I use this very fast solution for POD types...
我将这个非常快速的解决方案用于 POD 类型......
namespace DATATYPES {
typedef std::string TString;
typedef char* TCString;
typedef double TDouble;
typedef long THuge;
typedef unsigned long TUHuge;
};
namespace boost {
template<typename TYPE>
inline const DATATYPES::TString lexical_castNumericToString(
const TYPE& arg,
const DATATYPES::TCString fmt) {
enum { MAX_SIZE = ( std::numeric_limits<TYPE>::digits10 + 1 ) // sign
+ 1 }; // null
char buffer[MAX_SIZE] = { 0 };
if (sprintf(buffer, fmt, arg) < 0) {
throw_exception(bad_lexical_cast(typeid(TYPE),
typeid(DATATYPES::TString)));
}
return ( DATATYPES::TString(buffer) );
}
template<typename TYPE>
inline const TYPE lexical_castStringToNumeric(const DATATYPES::TString& arg) {
DATATYPES::TCString end = 0;
DATATYPES::TDouble result = std::strtod(arg.c_str(), &end);
if (not end or *end not_eq 0) {
throw_exception(bad_lexical_cast(typeid(DATATYPES::TString),
typeid(TYPE)));
}
return TYPE(result);
}
template<>
inline DATATYPES::THuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::THuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::THuge& arg) {
return (lexical_castNumericToString<DATATYPES::THuge>(arg,"%li"));
}
template<>
inline DATATYPES::TUHuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TUHuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TUHuge& arg) {
return (lexical_castNumericToString<DATATYPES::TUHuge>(arg,"%lu"));
}
template<>
inline DATATYPES::TDouble lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TDouble>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TDouble& arg) {
return (lexical_castNumericToString<DATATYPES::TDouble>(arg,"%f"));
}
} // end namespace boost