C++ 家谱软件中的循环
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6163683/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cycles in family tree software
提问by Partick H?se
I am the developer of some family tree software (written in C++ and Qt). I had no problems until one of my customers mailed me a bug report. The problem is that the customer has two children with their own daughter, and, as a result, he can't use my software because of errors.
我是一些家谱软件(用 C++ 和 Qt 编写)的开发人员。在我的一位客户寄给我一份错误报告之前,我没有遇到任何问题。问题是客户有两个孩子和他们自己的女儿,结果他因为错误而无法使用我的软件。
Those errors are the result of my various assertions and invariants about the family graph being processed (for example, after walking a cycle, the program states that X can't be both father and grandfather of Y).
这些错误是我关于正在处理的家庭图的各种断言和不变量的结果(例如,在走了一个周期后,程序声明 X 不能同时是 Y 的父亲和祖父)。
How can I resolve those errors without removing all data assertions?
如何在不删除所有数据断言的情况下解决这些错误?
采纳答案by Bert Goethals
It seems you (and/or your company) have a fundamental misunderstanding of what a family tree is supposed to be.
您(和/或您的公司)似乎对家谱应该是什么有根本的误解。
Let me clarify, I also work for a company that has (as one of its products) a family tree in its portfolio, and we have been struggling with similar problems.
让我澄清一下,我也在一家公司工作,该公司的产品组合中有(作为其产品之一)家谱,我们一直在努力解决类似的问题。
The problem, in our case, and I assume your case as well, comes from the GEDCOMformat that is extremely opinionated about what a family should be. However this format contains some severe misconceptions about what a family tree really looks like.
问题,在我们的情况下,我也假设你的情况,来自GEDCOM格式,该格式对家庭应该是什么非常固执己见。然而,这种格式包含一些对家谱真正是什么样子的严重误解。
GEDCOM has many issues, such as incompatibility with same sex relations, incest, etc... Which in real life happens more often than you'd imagine (especially when going back in time to the 1700-1800).
GEDCOM 有很多问题,例如与同性关系不相容、乱伦等……在现实生活中,这种情况比您想象的更频繁(尤其是回到 1700-1800 年时)。
We have modeled our family tree to what happens in the real world: Events (for example, births, weddings, engagement, unions, deaths, adoptions, etc.). We do not put any restrictions on these, except for logically impossible ones (for example, one can't be one's own parent, relations need two individuals, etc...)
我们已经将我们的家谱建模为现实世界中发生的事情:事件(例如,出生、婚礼、订婚、结合、死亡、收养等)。我们对这些没有任何限制,除了逻辑上不可能的那些(例如,一个人不能是自己的父母,关系需要两个人,等等......)
The lack of validations gives us a more "real world", simpler and more flexible solution.
缺乏验证为我们提供了一个更“真实的世界”、更简单和更灵活的解决方案。
As for this specific case, I would suggest removing the assertions as they do not hold universally.
至于这个特定案例,我建议删除断言,因为它们并不普遍适用。
For displaying issues (that will arise) I would suggest drawing the same node as many times as needed, hinting at the duplication by lighting up all the copies on selecting one of them.
为了显示问题(会出现),我建议根据需要多次绘制相同的节点,通过在选择其中一个时点亮所有副本来暗示重复。
回答by Ben Voigt
Relax your assertions.
放松你的主张。
Not by changing the rules, which are mostly likely very helpful to 99.9% of your customers in catching mistakes in entering their data.
不是通过更改规则,这很可能对 99.9% 的客户在输入数据时发现错误非常有帮助。
Instead, change it from an error "can't add relationship" to a warning with an "add anyway".
相反,将其从错误“无法添加关系”更改为带有“仍然添加”的警告。
回答by exDM69
Here's the problem with family trees: they are not trees. They are directed acyclic graphs or DAGs. If I understand the principles of the biology of human reproduction correctly, there will not be any cycles.
这是家谱的问题:它们不是树。它们是有向无环图或 DAG。如果我正确理解了人类生殖生物学的原理,就不会有任何循环。
As far as I know, even the Christians accept marriages (and thus children) between cousins, which will turn the family tree into a family DAG.
据我所知,即使是基督徒也接受表亲之间的婚姻(以及孩子),这将把家谱变成家庭 DAG。
The moral of the story is: choose the right data structures.
这个故事的寓意是:选择正确的数据结构。
回答by Eduard Thamm
I guess that you have some value that uniquely identifies a person on which you can base your checks.
我猜您有一些值可以唯一标识一个人,您可以根据该值进行检查。
This is a tricky one. Assuming you want to keep the structure a tree, I suggest this:
这是一个棘手的问题。假设您想将结构保持为一棵树,我建议这样做:
Assume this: A
has kids with his own daughter.
假设:A
有孩子和他自己的女儿。
A
adds himself to the program as A
and as B
. Once in the role of father, let's call it boyfriend.
A
将自己添加到程序A
中B
。一旦扮演父亲的角色,我们就称其为男朋友。
Add a is_same_for_out()
function which tells the output generating part of your program that all links going to B
internally should be going to A
on presentation of data.
添加一个is_same_for_out()
函数,它告诉程序的输出生成部分,所有指向B
内部的链接都应该指向A
数据的呈现。
This will make some extra work for the user, but I guess IT would be relatively easy to implement and maintain.
这将为用户带来一些额外的工作,但我想 IT 会相对容易实现和维护。
Building from that, you could work on code synching A
and B
to avoid inconsistencies.
以此为基础,您可以进行代码同步A
并B
避免不一致。
This solution is surely not perfect, but is a first approach.
这个解决方案肯定不是完美的,但它是第一种方法。
回答by christopheml
You should focus on what really makes value for your software. Is the time spent on making it work for ONE consumer worth the price of the license ? Likely not.
您应该专注于对您的软件真正有价值的东西。花在让它为一个消费者服务上所花费的时间是否值得许可证的价格?可能不是。
I advise you to apologize to this customer, tell him that his situation is out of scope for your software and issue him a refund.
我建议你向这个客户道歉,告诉他他的情况超出了你的软件的范围,并给他退款。
回答by user779752
回答by Tim Post
This is one of the reasons why languages like "Go" do not have assertions. They are used to handle cases that you probably didn't think about, all too often. You should only assert the impossible, not simply the unlikely. Doing the latter is what gives assertions a bad reputation. Every time you type assert(
, walk away for ten minutes and reallythink about it.
这就是像“Go”这样的语言没有断言的原因之一。它们用于处理您可能没有经常考虑的情况。你应该只断言不可能的,而不是简单的不可能的。做后者是使断言名声不佳的原因。每次你打字assert(
,走开十分钟,好好想想。
In your particularly disturbing case, it is both conceivable and appalling that such an assertion would be bogus under rare but possible circumstances. Hence, handle it in your app, if only to say "This software was not designed to handle the scenario that you presented".
在你特别令人不安的情况下,这种断言在极少数但可能的情况下是虚假的,这是可以想象的和令人震惊的。因此,请在您的应用程序中处理它,如果只是说“该软件不是为处理您所呈现的场景而设计的”。
Asserting that your great, great, great grandfather being your father as impossible is a reasonable thing to do.
断言你的曾、曾、曾祖父不可能成为你的父亲是一件合理的事情。
If I was working for a testing company that was hired to test your software, of course I would have presented that scenario. Why? Every juvenile yet intelligent 'user' is going to do the exact same thingand relish in the resulting 'bug report'.
如果我在受雇来测试您的软件的测试公司工作,我当然会提出这种情况。为什么?每个幼稚但聪明的“用户”都会做完全相同的事情,并津津乐道于由此产生的“错误报告”。
回答by Sean
I hate commenting on such a screwed up situation, but the easiest way to not rejigger all of your invariants is to create a phantom vertex in your graph that acts as a proxy back to the incestuous dad.
我讨厌评论这种搞砸了的情况,但不重新调整所有不变量的最简单方法是在您的图中创建一个幻影顶点,作为回到乱伦父亲的代理。
回答by tfinniga
So, I've done some work on family tree software. I think the problem you're trying to solve is that you need to be able to walk the tree without getting in infinite loops - in other words, the tree needs to be acyclical.
所以,我已经做了一些关于家谱软件的工作。我认为您要解决的问题是您需要能够在不陷入无限循环的情况下遍历树 - 换句话说,树需要是非循环的。
However, it looks like you're asserting that there is only one path between a person and one of their ancestors. That will guarantee that there are no cycles, but is too strict. Biologically speaking, descendancy is a directed acyclic graph(DAG). The case you have is certainly a degenerate case, but that type of thing happens all the time on larger trees.
但是,您似乎在断言一个人和他们的一位祖先之间只有一条路。这将保证没有循环,但过于严格。从生物学上讲,后代是一个有向无环图(DAG)。你的情况肯定是一个退化的情况,但这种事情总是发生在较大的树上。
For example, if you look at the 2^n ancestors you have at generation n, if there was no overlap, then you'd have more ancestors in 1000 AD than there were people alive. So, there's got to be overlap.
例如,如果您查看第 n 代的 2^n 个祖先,如果没有重叠,那么公元 1000 年您的祖先将比活着的人还多。所以,必须有重叠。
However, you also do tend to get cycles that are invalid, just bad data. If you're traversing the tree, then cycles must be dealt with. You can do this in each individual algorithm, or on load. I did it on load.
但是,您也确实会得到无效的周期,只是坏数据。如果您正在遍历树,则必须处理循环。您可以在每个单独的算法中或在加载时执行此操作。我是带负载做的。
Finding true cycles in a tree can be done in a few ways. The wrong way is to mark every ancestor from a given individual, and when traversing, if the person you're going to step to next is already marked, then cut the link. This will sever potentially accurate relationships. The correct way to do it is to start from each individual, and mark each ancestor with the path to that individual. If the new path contains the current path as a subpath, then it's a cycle, and should be broken. You can store paths as vector<bool> (MFMF, MFFFMF, etc.) which makes the comparison and storage very fast.
可以通过几种方式在树中找到真正的循环。错误的方法是标记来自给定个体的每个祖先,并且在遍历时,如果您要下一个步骤的人已经被标记,则切断链接。这将切断潜在的准确关系。正确的做法是从每个人开始,用通向那个人的路径标记每个祖先。如果新路径包含当前路径作为子路径,那么它是一个循环,应该被打破。您可以将路径存储为 vector<bool>(MFMF、MFFFMF 等),这使得比较和存储非常快。
There are a few other ways to detect cycles, such as sending out two iterators and seeing if they ever collide with the subset test, but I ended up using the local storage method.
还有其他几种检测循环的方法,例如发送两个迭代器并查看它们是否与子集测试发生冲突,但我最终使用了本地存储方法。
Also note that you don't need to actually sever the link, you can just change it from a normal link to a 'weak' link, which isn't followed by some of your algorithms. You will also want to take care when choosing which link to mark as weak; sometimes you can figure out where the cycle should be broken by looking at birthdate information, but often you can't figure out anything because so much data is missing.
另请注意,您不需要实际切断链接,您只需将其从正常链接更改为“弱”链接,您的某些算法不会遵循该链接。在选择将哪个链接标记为弱链接时,您还需要注意;有时您可以通过查看出生日期信息来找出应该在哪里打破循环,但通常您无法弄清楚任何事情,因为缺少太多数据。
回答by clvrmnky
Another mock serious answer for a silly question:
另一个愚蠢的问题的模拟严肃回答:
The real answer is, use an appropriate data structure. Human genealogy cannot fully be expressed using a pure tree with no cycles. You should use some sort of graph. Also, talk to an anthropologist before going any further with this, because there are plenty of other places similar errors could be made trying to model genealogy, even in the most simple case of "Western patriarchal monogamous marriage."
真正的答案是,使用适当的数据结构。使用没有循环的纯树无法完全表达人类谱系。你应该使用某种图表。此外,在进一步讨论这个问题之前,请先咨询人类学家,因为在尝试建立谱系模型时,还有很多其他地方可能会犯类似的错误,即使在最简单的“西方父权制一夫一妻制婚姻”的情况下也是如此。
Even if we want to ignore locally taboo relationships as discussed here, there are plenty of perfectly legal and completely unexpected ways to introduce cycles into a family tree.
即使我们想忽略这里讨论的局部禁忌关系,也有很多完全合法且完全出乎意料的方法可以将循环引入家谱。
For example: http://en.wikipedia.org/wiki/Cousin_marriage
例如:http: //en.wikipedia.org/wiki/Cousin_marriage
Basically, cousin marriage is not only common and expected, it is the reason humans have gone from thousands of small family groups to a worldwide population of 6 billion. It can't work any other way.
基本上,表亲结婚不仅是普遍的和意料之中的,这也是人类从数千个小家庭群体发展到全球 60 亿人口的原因。它不能以任何其他方式工作。
There really are very few universals when it comes to genealogy, family and lineage. Almost any strict assumption about norms suggesting who an aunt can be, or who can marry who, or how children are legitimized for the purpose of inheritance, can be upset by some exception somewhere in the world or history.
当涉及到家谱、家庭和血统时,真的很少有共通之处。几乎任何关于规范的严格假设,暗示谁可以成为阿姨,或者谁可以嫁给谁,或者为了继承的目的如何使孩子合法化,都可能被世界某个地方或历史上的某些例外所打乱。