将 C 源代码转换为 C++

Question

提问by Barry Kelly

How would you go about converting a reasonably large (>300K), fairly mature C codebase to C++?

您将如何将相当大（> 300K）、相当成熟的 C 代码库转换为 C++？

The kind of C I have in mind is split into files roughly corresponding to modules (i.e. less granular than a typical OO class-based decomposition), using internal linkage in lieu private functions and data, and external linkage for public functions and data. Global variables are used extensively for communication between the modules. There is a very extensive integration test suite available, but no unit (i.e. module) level tests.

CI 的类型被分成大致对应于模块的文件（即比典型的基于 OO 类的分解粒度更小），使用内部链接代替私有函数和数据，以及公共函数和数据的外部链接。全局变量广泛用于模块之间的通信。有一个非常广泛的集成测试套件可用，但没有单元（即模块）级别的测试。

I have in mind a general strategy:

我想到了一个总体策略：

Compile everything in C++'s C subset and get that working.
Convert modules into huge classes, so that all the cross-references are scoped by a class name, but leaving all functions and data as static members, and get that working.
Convert huge classes into instances with appropriate constructors and initialized cross-references; replace static member accesses with indirect accesses as appropriate; and get that working.
Now, approach the project as an ill-factored OO application, and write unit tests where dependencies are tractable, and decompose into separate classes where they are not; the goal here would be to move from one working program to another at each transformation.

编译 C++ 的 C 子集中的所有内容并使其工作。
将模块转换为巨大的类，以便所有交叉引用都由类名限定范围，但将所有函数和数据保留为静态成员，并使其正常工作。
使用适当的构造函数和初始化的交叉引用将庞大的类转换为实例；适当地用间接访问替换静态成员访问；并开始工作。
现在，将项目作为一个不良因素的 OO 应用程序来处理，并在依赖项易于处理的地方编写单元测试，并在它们不易于处理的地方分解成单独的类；此处的目标是在每次转换时从一个工作程序转移到另一个工作程序。

Obviously, this would be quite a bit of work. Are there any case studies / war stories out there on this kind of translation? Alternative strategies? Other useful advice?

显然，这将是相当多的工作。有没有关于这种翻译的案例研究/War故事？替代策略？其他有用的建议？

Note 1: the program is a compiler, and probably millions of other programs rely on its behaviour not changing, so wholesale rewriting is pretty much not an option.

注 1：该程序是一个编译器，可能有数百万个其他程序依赖于它的行为不会改变，因此大规模重写几乎不是一种选择。

Note 2: the source is nearly 20 years old, and has perhaps 30% code churn (lines modified + added / previous total lines) per year. It is heavily maintained and extended, in other words. Thus, one of the goals would be to increase mantainability.

注 2：源代码已有近 20 年的历史，每年可能有 30% 的代码流失（修改的行数 + 添加的行数/以前的总行数）。换句话说，它得到了大量维护和扩展。因此，目标之一是提高可维护性。

[For the sake of the question, assume that translation into C++is mandatory, and that leaving it in C is notan option. The point of adding this condition is to weed out the "leave it in C" answers.]

[为了这个问题，假设翻译成C++是强制性的，而将它留在 C 中不是一种选择。添加此条件的目的是清除“将其保留在 C 中”的答案。]

Answer 1

采纳答案by Head Geek

Having just started on pretty much the same thing a few months ago (on a ten-year-old commercial project, originally written with the "C++ is nothing but C with smart structs" philosophy), I would suggest using the same strategy you'd use to eat an elephant: take it one bite at a time. :-)

几个月前刚刚开始几乎相同的事情（在一个有十年历史的商业项目中，最初是用“C++ 只不过是带有智能structs 的C ”哲学编写的），我建议使用与您相同的策略d 曾经吃过一头大象：一次吃一口。:-)

As much as possible, split it up into stages that can be done with minimal effects on other parts. Building a facade system, as Federico Ramponisuggested, is a good start -- once everything has a C++ facade and is communicating through it, you can change the internals of the modules with fair certainty that they can't affect anything outside them.

尽可能将其分成几个阶段，这些阶段可以在对其他部分影响最小的情况下完成。正如Federico Ramponi 所建议的那样，构建一个外观系统是一个好的开始——一旦所有东西都有一个 C++ 外观并通过它进行通信，您就可以很确定地更改模块的内部结构，确保它们不会影响外部的任何东西。

We already had a partial C++ interface system in place (due to previous smaller refactoring efforts), so this approach wasn't difficult in our case. Once we had everything communicating as C++ objects (which took a few weeks, working on a completely separate source-code branch and integrating all changes to the main branch as they were approved), it was very seldom that we couldn't compile a totally working version before we left for the day.

我们已经有一个部分的 C++ 接口系统（由于之前较小的重构工作），所以这种方法在我们的例子中并不困难。一旦我们将所有内容都作为 C++ 对象进行通信（这需要几周的时间，在一个完全独立的源代码分支上工作，并在批准时将所有更改集成到主分支中），我们就很少不能完全编译在我们离开前一天的工作版本。

The change-over isn't complete yet -- we've paused twice for interim releases (we aim for a point-release every few weeks), but it's well on the way, and no customer has complained about any problems. Our QA people have only found one problem that I recall, too. :-)

转换还没有完成——我们已经暂停了两次临时发布（我们的目标是每几周发布一次），但进展顺利，没有客户抱怨任何问题。我们的 QA 人员也只发现了一个我记得的问题。:-)

Answer 2

回答by Federico A. Ramponi

What about:

关于什么：

Compiling everything in C++'s C subset and get that working, and
Implementing a set of facadesleaving the C code unaltered?

编译 C++ 的 C 子集中的所有内容并使其工作，以及
实现一组外观而不改变 C 代码？

Why is "translation into C++ mandatory"? You can wrap the C code without the pain of converting it into huge classes and so on.

为什么“必须翻译成 C++”？您可以包装 C 代码，而无需将其转换为大型类等。

Answer 3

回答by Ira Baxter

Your application has lots of folks working on it, and a need to not-be-broken. If you are serious about large scale conversion to an OO style, what you need is massive transformation tools to automate the work.

您的应用程序有很多人在处理它，并且需要不被破坏。如果您对大规模转换为 OO 风格很认真，那么您需要的是大量转换工具来自动化工作。

The basic idea is to designate groups of data as classes, and then get the tool to refactor the code to move that data into classes, move functions on just that data into those classes, and revise all accesses to that data to calls on the classes.

基本思想是将数据组指定为类，然后使用工具重构代码以将数据移到类中，将仅对该数据的函数移到这些类中，并将对该数据的所有访问修改为对类的调用.

You can do an automated preanalysis to form statistic clusters to get some ideas, but you'll still need an applicaiton aware engineer to decide what data elements should be grouped.

您可以进行自动预分析以形成统计集群以获得一些想法，但您仍然需要具有应用意识的工程师来决定应该对哪些数据元素进行分组。

A tool that is capable of doing this task is our DMS Software Reengineering Toolkit. DMS has strong C parsers for reading your code, captures the C code as compiler abstract syntax trees, (and unlike a conventional compiler) can compute flow analyses across your entire 300K SLOC. DMS has a C++ front end that can be used as the "back" end; one writes transformations that map C syntax to C++ syntax.

能够完成此任务的工具是我们的DMS 软件再造工具包。DMS 具有强大的 C 解析器来读取您的代码，将 C 代码捕获为编译器抽象语法树，（与传统编译器不同）可以计算整个 300K SLOC 的流分析。DMS 有一个 C++ 前端，可以用作“后端”；一种编写将 C 语法映射到 C++ 语法的转换。

A major C++ reengineering task on a large avionics system gives some idea of what using DMS for this kind of activity is like. See technical papers at www.semdesigns.com/Products/DMS/DMSToolkit.html, specifically Re-engineering C++ Component Models Via Automatic Program Transformation

大型航空电子系统上的一项主要 C++ 重新设计任务提供了一些关于将 DMS 用于此类活动的想法。请参阅 www.semdesigns.com/Products/DMS/DMSToolkit.html 上的技术论文，特别是 Re-engineering C++ Component Models Via Automatic Program Transformation

This process is not for the faint of heart. But than anybody that would consider manual refactoring of a large application is already not afraid of hard work.

这个过程不适合胆小的人。但比任何考虑手动重构大型应用程序的人都不怕辛苦。

Yes, I'm associated with the company, being its chief architect.

是的，我与这家公司有关联，是它的首席架构师。

Answer 4

回答by Ira Baxter

I would write C++ classes over the C interface. Not touching the C code will decrease the chance of messing up and quicken the process significantly.

我会通过 C 接口编写 C++ 类。不接触 C 代码将减少搞砸的机会并显着加快进程。

Once you have your C++ interface up; then it is a trivial task of copy+pasting the code into your classes. As you mentioned - during this step it is vital to do unit testing.

一旦你有了你的 C++ 接口；那么将代码复制+粘贴到您的类中是一项微不足道的任务。正如您所提到的 - 在此步骤中，进行单元测试至关重要。

Answer 5

回答by Paul Biggar

GCC is currently in midtransition to C++ from C. They started by moving everything into the common subset of C and C++, obviously. As they did so, they added warnings to GCC for everything they found, found under -Wc++-compat. That should get you on the first part of your journey.

GCC 目前正处于从 C 向 C++ 过渡的中期阶段。显然，他们首先将所有内容都转移到 C 和 C++ 的公共子集中。当他们这样做时，他们为 GCC 中的所有发现添加了警告，在-Wc++-compat. 这应该会让你踏上旅程的第一部分。

For the latter parts, once you actually have everything compiling with a C++ compiler, I would focus on replacing things that have idiomatic C++ counterparts. For example, if you're using lists, maps, sets, bitvectors, hashtables, etc, which are defined using C macros, you will likely gain a lot by moving these to C++. Likewise with OO, you'll likely find benefits where you are already using a C OO idiom (like struct inheritence), and where C++ will afford greater clarity and better type checking on your code.

对于后面的部分，一旦您真正使用 C++ 编译器编译了所有内容，我将专注于替换具有惯用 C++ 对应项的内容。例如，如果您正在使用使用 C 宏定义的列表、映射、集合、位向量、哈希表等，那么通过将它们移动到 C++，您可能会受益匪浅。与 OO 类似，您可能会发现已经在使用 C OO 习惯用法（如结构继承）的好处，并且 C++ 将为您的代码提供更高的清晰度和更好的类型检查。

Answer 6

回答by andreas buykx

Probably two things to consider besides how you want to start are on what you want to focus, and where you want to stop.

除了您想如何开始之外，可能还有两件事要考虑，即您想关注什么，以及您想在哪里停下来。

You state that there is a large code churn, this may be a key to focusyour efforts. I suggest you pick the parts of your code where a lot of maintenance is needed, the mature/stable parts are apparently working well enough, so it is better to leave them as they are, except probably for some window dressing with facades etc.

你说有大量的代码改动，这可能是你集中精力的关键。我建议你选择需要大量维护的代码部分，成熟/稳定的部分显然工作得很好，所以最好保持原样，除了一些带有外墙的窗户装饰等。

Where you want to stop depends on what the reason is for wanting to convert to C++. This can hardly be a goal in itself. If it is due to some 3rd party dependency, focus your efforts on the interface to that component.

您想在哪里停止取决于想要转换为 C++ 的原因。这本身很难成为目标。如果是由于某些 3rd 方依赖性，请将精力集中在该组件的接口上。

The software I work on is a huge, old code base which has been 'converted' from C to C++ years ago now. I think it was because the GUI was converted to Qt. Even now it still mostly looks like a C program with classes. Breaking the dependencies caused by public data members, and refactoring the huge classes with procedural monster methods into smaller methods and classes never has really taken off, I think for the following reasons:

我工作的软件是一个巨大的旧代码库，几年前它已经从 C '转换'到 C++。我认为这是因为 GUI 已转换为 Qt。即使是现在，它仍然主要看起来像一个带有类的 C 程序。打破公共数据成员造成的依赖，用过程怪物方法将庞大的类重构为更小的方法和类从未真正起飞，我认为有以下原因：

There is no need to change code that is working and that does not need to be enhanced. Doing so introduces new bugs without adding functionality, and end users don't appreciate that;
It is very, very hard to do refactor reliably. Many pieces of code are so large and also so vital that people hardly dare touching it. We have a fairly extensive suite of functional tests, but sufficient code coverage information is hard to get. As a result, it is difficult to establish whether there are already sufficient tests in place to detect problems during refactoring;
The ROI is difficult to establish. The end user will not benefit from refactoring, so it must be in reduced maintenance cost, which will increase initially because by refactoring you introduce new bugs in mature, i.e. fairly bug-free code. And the refactoring itself will be costly as well ...

无需更改正在运行且无需增强的代码。这样做会在不添加功能的情况下引入新的错误，而最终用户并不欣赏这一点；
可靠地进行重构非常非常困难。许多代码段如此之大，而且如此重要，以至于人们几乎不敢接触它。我们有一套相当广泛的功能测试，但很难获得足够的代码覆盖率信息。因此，很难确定是否已经有足够的测试来检测重构过程中的问题；
投资回报率很难确定。最终用户不会从重构中受益，所以它必须降低维护成本，最初会增加，因为通过重构你会在成熟的，即相当无错误的代码中引入新的错误。重构本身也将是昂贵的......

NB. I suppose you know the "Working effectively with Legacy code" book?

注意。我想你知道“有效地使用遗留代码”这本书吗？

Answer 7

回答by Paul Nathan

Your list looks okay except I would suggest reviewing the test suite first and trying to get that as tight as possible before doing any coding.

您的列表看起来不错，除非我建议先查看测试套件并在进行任何编码之前尝试尽可能地紧凑。

Answer 8

回答by Federico A. Ramponi

Let's throw another stupid idea:

让我们抛出另一个愚蠢的想法：

Compile everything in C++'s C subset and get that working.
Start with a module, convert it in a huge class, then in an instance, and build a C interface (identical to the one you started from) out of that instance. Let the remaining C code work with that C interface.
Refactor as needed, growing the OO subsystem out of C code one module at a time, and drop parts of the C interface when they become useless.

编译 C++ 的 C 子集中的所有内容并使其工作。
从一个模块开始，在一个巨大的类中转换它，然后在一个实例中，并从该实例中构建一个 C 接口（与你开始的那个接口相同）。让剩余的 C 代码与该 C 接口一起工作。
根据需要重构，从 C 代码一次一个模块中增加 OO 子系统，并在 C 接口的一部分变得无用时删除它们。

Answer 9

回答by Paul Biggar

You mention that your tool is a compiler, and that: "Actually, pattern matching, not just type matching, in the multiple dispatch would be even better".

您提到您的工具是一个编译器，并且：“实际上，模式匹配，而不仅仅是类型匹配，在多重分派中会更好”。

You might want to take a look at maketea. It provides pattern matching for ASTs, as well as the AST definition from an abstract grammar, and visitors, tranformers, etc.

你可能想看看maketea。它为 AST 提供模式匹配，以及来自抽象语法的 AST 定义，以及访问者、转换器等。

Answer 10

回答by Sridhar Iyer

Here's what I would do:

这是我会做的：

Since the code is 20 years old, scrap down the parser/syntax analyzer and replace it with one of the newer lex/yacc/bison(or anything similar) etc based C++ code, much more maintainable and easier to understand. Faster to develop too if you have a BNF handy.
Once this is retrofitted to the old code, start wrapping modules into classes. Replace global/shared variables with interfaces.
Now what you have will be a compiler in C++ (not quite though).
Draw a class diagram of all the classes in your system, and see how they are communicating.
Draw another one using the same classes and see how they ought to communicate.
Refactor the code to transform the first diagram to the second. (this might be messy and tricky)
Remember to use C++ code for all new code added.
If you have some time left, try replacing data structures one by one to use the more standardized STL or Boost.

由于代码已有 20 年的历史，因此请废弃解析器/语法分析器并将其替换为较新的基于 lex/yacc/bison（或任何类似）等的 C++ 代码之一，这样更易于维护且更易于理解。如果您手边有 BNF，开发速度也会更快。
一旦将其改造成旧代码，就开始将模块包装到类中。用接口替换全局/共享变量。
现在你将拥有一个 C++ 编译器（虽然不完全是）。
绘制系统中所有类的类图，看看它们是如何通信的。
使用相同的类绘制另一个，看看它们应该如何通信。
重构代码以将第一个图转换为第二个图。（这可能是混乱和棘手的）
请记住对添加的所有新代码使用 C++ 代码。
如果您还有一些时间，请尝试一一替换数据结构以使用更标准化的 STL 或 Boost。

将 C 源代码转换为 C++

提问by Barry Kelly

采纳答案by Head Geek

回答by Federico A. Ramponi

回答by Ira Baxter

回答by Ira Baxter

回答by Paul Biggar

回答by andreas buykx

回答by Paul Nathan

回答by Federico A. Ramponi

回答by Paul Biggar

回答by Sridhar Iyer

相关推荐

最近更新

标签

将 C 源代码转换为 C++

提问by Barry Kelly

采纳答案by Head Geek

回答by Federico A. Ramponi

回答by Ira Baxter

回答by Ira Baxter

回答by Paul Biggar

回答by andreas buykx

回答by Paul Nathan

回答by Federico A. Ramponi

回答by Paul Biggar

回答by Sridhar Iyer

相关推荐

如何在 C++ 中将字符串解析为 int？

编写一个函数来复制 C++ 中的链表

C++ 字符串 c_str() 与 data()

C++ 读取访问冲突错误

相关推荐

最近更新

标签