java 哈夫曼树编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3311432/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 01:14:52  来源:igfitidea点击:

Huffman Tree Encoding

javatraversalhuffman-code

提问by Zieklecknerizer

My Huffman tree which I had asked about earlier has another problem! Here is the code:

我之前问过的霍夫曼树有另一个问题!这是代码:

package huffman;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.PriorityQueue;
import java.util.Scanner;

public class Huffman {

    public ArrayList<Frequency> fileReader(String file)
    {
        ArrayList<Frequency> al = new ArrayList<Frequency>();
        Scanner s;
        try {

            s = new Scanner(new FileReader(file)).useDelimiter("");
            while (s.hasNext())
            {
                boolean found = false;
                int i = 0;
                String temp = s.next();
                while(!found)
                {


                    if(al.size() == i && !found)
                    {
                        found = true;
                        al.add(new Frequency(temp, 1));
                    }
                    else if(temp.equals(al.get(i).getString()))
                    {
                        int tempNum = al.get(i).getFreq() + 1;
                        al.get(i).setFreq(tempNum);
                        found = true;
                    }
                    i++;

                }



            }
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return al;
    }
    public Frequency buildTree(ArrayList<Frequency> al)
    {
        Frequency r = al.get(1);
        PriorityQueue<Frequency> pq = new PriorityQueue<Frequency>();
        for(int i = 0; i < al.size(); i++)
        {
            pq.add(al.get(i));          
        }
        /*while(pq.size() > 0)
        {
            System.out.println(pq.remove().getString());
        }*/

        for(int i = 0; i < al.size() - 1; i++)
        {   
           Frequency p = pq.remove(); 
           Frequency q = pq.remove();
           int temp = p.getFreq() + q.getFreq();
           r = new Frequency(null, temp);
           r.left = p; 
           r.right = q;
           pq.add(r);  // put in the correct place in the priority queue
        }
        pq.remove(); // leave the priority queue empty
        return(r); // this is the root of the tree built

    }

    public void inOrder(Frequency top)
    {
        if(top == null)
        {
            return;
        }
        else
        {
            inOrder(top.left);
            System.out.print(top.getString() +", ");
            inOrder(top.right);
            return;
        }
    }
    public void printFreq(ArrayList<Frequency> al)
    {
        for(int i = 0; i < al.size(); i++)
        {
            System.out.println(al.get(i).getString() + "; " + al.get(i).getFreq());
        }
    }

}

What needs to be done now is I need to create a method that will search through the tree to find the binary code (011001 etc) to the specific character. What is the best way to do this? I thought maybe I would do a normal search through the tree as if it were an AVL tree going to the right if its bigger or left if it's smaller.

现在需要做的是我需要创建一个方法来搜索树以找到特定字符的二进制代码(011001 等)。做这个的最好方式是什么?我想也许我会在树中进行正常搜索,就好像它是一棵 AVL 树,如果它变大就向右,如果变小则向左。

But because the nodes don't use ints doubles etc. but only using objects that contain characters as strings or null to signify its not a leaf but only a root. The other option would be to do an in-order run through to find the leaf that I'm looking for but at the same time how would I determine if I went right so many times or left so many times to get the character.

但是因为节点不使用 ints doubles 等,而只使用包含字符作为字符串或 null 的对象来表示它不是叶子而是根。另一种选择是按顺序运行以找到我正在寻找的叶子,但同时我将如何确定我是向右走了多少次还是离开了这么多次来获得角色。

package huffman;

public class Frequency implements Comparable  {
    private String s;
    private int n;
    public Frequency left;
    public Frequency right;

    Frequency(String s, int n)
    {
        this.s = s;
        this.n = n;
    }
    public String getString()
    {
        return s;
    }
    public int getFreq()
    {
        return n;
    }
    public void setFreq(int n)
    {
        this.n = n;
    }
    @Override
    public int compareTo(Object arg0) {
        Frequency other = (Frequency)arg0;

        return n < other.n ? -1 : (n == other.n ? 0 : 1);
    }
}

What I'm trying to do is find the binary code to actually get to each character. So if I were trying to encode aabbbcccchow would I create a string holding the binary code for a going left is 0 and going right is 1.

我想要做的是找到二进制代码来实际到达每个字符。因此,如果我尝试编码,aabbbcccc我将如何创建一个字符串,其中包含向左移动的二进制代码为 0,向右移动为 1。

What has me confused is because you can't determine where anything is because the tree is obviously unbalanced and there is no determining if a character is right or left of where you are. So you have to search through the whole tree but if you get to a node that isn't what you are looking for, you have backtrack to another root to get to the other leaves.

让我感到困惑的是因为你无法确定任何东西在哪里,因为树显然是不平衡的,并且无法确定角色是在你所在位置的右边还是左边。因此,您必须搜索整个树,但是如果您到达的节点不是您要查找的节点,则必须回溯到另一个根以到达其他叶子。

回答by corsiKa

Remember, if you have 1001, you will never have a 10010 or 10011. So your basic method looks like this (in pseudocode):

记住,如果你有 1001,你永远不会有 10010 或 10011。所以你的基本方法看起来像这样(伪代码):

if(input == thisNode.key) return thisNode.value
if(input.endsWith(1)) return search(thisNode.left)
else return search(thisNode.right)

I didn't read your program to figure out how to integrate it, but that's a key element of huffman encoding in a nutshell

我没有阅读你的程序来弄清楚如何集成它,但简而言之,这是霍夫曼编码的一个关键元素

Try something like this - you're trying to find token. So if you wanted to find the String for "10010", you'd do search(root,"10010")

尝试这样的事情 - 你试图找到令牌。所以如果你想找到“10010”的字符串,你会做 search(root,"10010")

String search(Frequency top, String token) {
    return search(top,token,0);
}

// depending on your tree, you may have to switch top.left and top.right
String search(Frequency top, String token, int depth) {
    if(token.length() == depth) return "NOT FOUND";
    if(token.length() == depth - 1) return top.getString();
    if(token.charAt(depth) == '0') return search(top.left,token,depth+1);
    else return search(top.right,token,depth+1);

}

回答by Cheery

Traverse through the huffman tree nodes to get a map like {'a': "1001", 'b': "10001"} etc. You can use this map to get the binary code to a specific character.

遍历霍夫曼树节点,得到一个像 {'a': "1001", 'b': "10001"} 等的映射。你可以使用这个映射来获取特定字符的二进制代码。

If you need to do in reverse, just handle it as a state machine:

如果需要反向操作,只需将其作为状态机处理即可:

state = huffman_root
for each bit
    if (state.type == 'leaf')
        output(state.data);
        state = huffman_root
    state = state.leaves[bit]

Honestly said, I didn't look into your code. It ought be pretty obvious what to do with the fancy tree.

老实说,我没有仔细研究你的代码。如何处理这棵花式树应该很明显了。

回答by quiet_penguin

I considered two options when I was having a go at Huffman coding encoding tree.

当我尝试使用 Huffman 编码编码树时,我考虑了两种选择。

option 1: use pointer based binary tree. I coded most of this and then felt that, to trace up the tree from the leaf to find an encoding, I needed parent pointers. other wise, like mentioned in this post, you do a search of the tree which is not a solution to finding the encoding straight away. The disadvantage of the pointer based tree is that, I have to have 3 pointers for every node in the tree which I thought was too much. The code to follow the pointers is simple but more complicated that in option 2.

选项 1:使用基于指针的二叉树。我编写了大部分代码,然后觉得要从叶子跟踪树以找到编码,我需要父指针。否则,就像这篇文章中提到的那样,您搜索树并不是立即找到编码的解决方案。基于指针的树的缺点是,我认为树中的每个节点都必须有 3 个指针,我认为这太多了。跟随指针的代码很简单,但比选项 2 中的更复杂。

option 2: use an array based tree to represent the encoding tree that you will use on the run to encode and decode. so if you want the encoding of a character, you find the character in the array. Pretty straight forward, I use a table so smack right and there I get the leaf. now I trace up to the root which is at index 1 in the array. I do a (current_index / 2) for the parent. if child index is parent /2 it is a left and otherwise right.

选项 2:使用基于数组的树来表示您将在运行中用于编码和解码的编码树。所以如果你想要一个字符的编码,你可以在数组中找到这个字符。很直接,我用了一张桌子,所以我得到了叶子。现在我追踪到数组中索引 1 处的根。我为父母做了一个 (current_index / 2)。如果子索引是父/2,则为左索引,否则为右索引。

option 2 was pretty easy to code up and although the array can have a empty spaces. I thought it was better in performance than a pointer based tree. Besides identifying the root and leaf now is a matter of indices rather than object type. ;) This will also be very usefull if you have to send your tree!?

选项 2 很容易编码,尽管数组可以有空格。我认为它的性能比基于指针的树更好。除了识别根和叶现在是索引而不是对象类型的问题。;) 如果您必须发送您的树,这也将非常有用!?

also, you dont search (root, 10110) while decoding the Huffman code. You just walk the tree through the stream of encoded bitstream, take a left or right based on your bit and when you reach the leaf, you output the character.

此外,您在解码霍夫曼代码时不搜索 (root, 10110)。您只需在编码的比特流流中遍历树,根据您的位向左或向右走,当您到达叶子时,您输出字符。

Hope this was helpful.

希望这是有帮助的。

Harisankar Krishna Swamy (example)

Harisankar Krishna Swamy(示例

回答by Kevin Coulombe

I guess your homework is either done or very late by now, but maybe this will help someone else.

我想你的作业现在要么完成要么很晚,但也许这对其他人有帮助。

It's actually pretty simple. You create a tree where 0 goes right and 1 goes left. Reading the stream will navigate you through the tree. When you hit a leaf, you found a letter and start over from the beginning. Like glowcoder said, you will never have a letter on a non-leaf node. The tree also covers every possible sequence of bits. So navigating in this way always works no matter the encoded input.

其实很简单。您创建一棵树,其中 0 向右,1 向左。阅读流将引导您浏览树。当你击中一片叶子时,你找到了一封信,然后从头开始。就像glowcoder 说的,你永远不会在非叶节点上有一个字母。该树还涵盖了所有可能的位序列。因此,无论编码输入如何,以这种方式导航始终有效。

I had an assignment to write an huffman encoder/decoder just like you a while ago and I wrote a blog post with the code in Java and a longer explanation : http://www.byteauthor.com/2010/09/huffman-coding-in-java/

不久前我有一个任务要像你一样编写一个霍夫曼编码器/解码器,我写了一篇博客文章,其中包含 Java 代码和更长的解释:http: //www.byteauthor.com/2010/09/huffman-coding -in-java/

PS. Most of the explanation is on serializing the huffman tree with the least possible number of bits but the encoding/decoding algorithms are pretty straightforward in the sources.

附注。大多数解释是用尽可能少的位数序列化霍夫曼树,但编码/解码算法在源代码中非常简单。

回答by Wilfred Springer