C语言将按字母顺序排列的列表分成相等的块

Question

提问by mforrest

I've been tasked with splitting a directory of names into four (approximately) equal chunks. The directory is effectively a phonebook that is already alphebatised. The solution has to be generic and work for any directory not just one specific one. If it helps the directory is an array of strings.

我的任务是将名称目录分成四个（大约）相等的块。该目录实际上是一个已经字母化的电话簿。该解决方案必须是通用的，并且适用于任何目录，而不仅仅是一个特定的目录。如果它有帮助，目录是一个字符串数组。

For example the four chunks for one directory could be:

例如，一个目录的四个块可以是：

A-E, F-L, M-S and T-Z

Whilst another could be

虽然另一个可能是

A-B, C-D, E-F and G-Z

I've already considered just splitting the size of the directory in 4 and then counting upwards until reaching that number and noting the letter that entry starts with but that isn't particularly elegant.

我已经考虑过将目录的大小分成 4 份，然后向上计数直到达到该数字并注意到条目开头的字母，但这并不是特别优雅。

What I mean by this is: take the directory to be 100 entries. I could divide this by 4 to get 25 (how many entries should be in each chunk). Going through the entries to 25 and then taking that entry should give me the last entry in the first chunk. However, this doesn't work when the number of entries for each letter in the alphabet vary greatly. A-J could all have one entry and K could have 32 entries which would make my process useless.

我的意思是：将目录设为 100 个条目。我可以将它除以 4 得到 25（每个块中应该有多少条目）。将条目遍历到 25，然后获取该条目应该会给我第一个块中的最后一个条目。但是，当字母表中每个字母的条目数差异很大时，这不起作用。AJ 可以有一个条目，而 K 可以有 32 个条目，这将使我的过程无用。

It would be helpful to have pseudocode instead of a specific C implementation but really a point in the right direction would be a great help.

使用伪代码而不是特定的 C 实现会很有帮助，但真正指向正确方向的点会很有帮助。

Answer 1

回答by ecatmur

This is an optimization problem in three variables; the boundaries between the four chunks. If we denote the boundaries x, y, zwith the chunks half-open intervals A-x, x-y, y-z, z-Zthen the only further constraint is that x <= y <= z, giving 3276 possibilities for x, y, zwhich is trivial to search exhaustively.

这是一个三变量的优化问题；四个块之间的边界。如果我们表示的边界X，ÿ，Ž与大块半开区间A- X，XY，YZ，ž-Z那么唯一的进一步限制是，X <= Y <= Z，给予3276种可能性X，Y，Z穷尽搜索是微不足道的。

Then all you need is a way to score one configuration as better or worse than another; I'd suggest using sum of squared error e.g. for chunk lengths 20, 26, 32, 24the squared error would be (20-25)^2 + (26-25)^2 + (32-25)^2 + (24-25)^2 = 76.

那么你所需要的只是一种方法来评估一种配置比另一种更好或更差；我建议使用平方误差的总和，例如对于块长度20, 26, 32, 24，平方误差为(20-25)^2 + (26-25)^2 + (32-25)^2 + (24-25)^2 = 76。

Putting this together, you can write the exhaustive search with nested loops:

将它们放在一起，您可以使用嵌套循环编写详尽的搜索：

best, best_error = Nothing, +Inf
for A <= x <= Z:
    for x <= y <= Z:
        for y <= z <= Z:
            error = (sum(lengths[i] for A < i <= x) - 25)^2
                  + (sum(lengths[i] for x < i <= y) - 25)^2
                  + (sum(lengths[i] for y < i <= z) - 25)^2
                  + (sum(lengths[i] for z < i <= Z) - 25)^2
            if error < best_error:
                 best, best_error = (x, y, z), error

Answer 2

回答by Tamil Maran

The directory is already sorted. so you can easily split them into four if you consider extra alphabets as keys like (A-Ae, Af-Az etc.) The basic idea is

目录已经排序。因此，如果您将额外的字母视为（A-Ae、Af-Az 等）等键，则可以轻松地将它们分成四个。基本思想是

Store your dictionary in some data structure (say array) in the sorted order
Now divide the length of the array by four and make four indices respectively.
At each index check the letters with the previous index word.
- like if 1st index word is "Abandon" and second index word is "Ascension" then first key would be "Ab" and second would be "As".
- So all the words within the range (Ab-As) will be present between the keys.
- If first two letters are the same like u expect the dictionary to be partially distributed, then go for an additional letter for the key. (like Aba - Abs)

按排序顺序将字典存储在某种数据结构（比如数组）中
现在将数组的长度除以四并分别制作四个索引。
在每个索引处检查带有前一个索引词的字母。
- 就像如果第一个索引词是“Abandon”，第二个索引词是“Ascension”，那么第一个键是“Ab”，第二个是“As”。
- 因此，范围 (Ab-As) 内的所有单词都将出现在键之间。
- 如果前两个字母相同，就像您希望字典部分分布一样，那么为键寻找一个额外的字母。（如Aba - Abs）

C语言将按字母顺序排列的列表分成相等的块

提问by mforrest

回答by ecatmur

回答by Tamil Maran

相关推荐

最近更新

标签

C语言 将按字母顺序排列的列表分成相等的块

提问by mforrest

回答by ecatmur

回答by Tamil Maran

相关推荐

C语言 如何在 gcc 中打印 UINT64_t？

C语言 C 中的 sizeof(char[])

C语言 在c中检查文件是否为空

C语言 gdb 无法访问内存地址错误

相关推荐

最近更新

标签

C语言将按字母顺序排列的列表分成相等的块

C语言如何在 gcc 中打印 UINT64_t？

C语言在c中检查文件是否为空