Hello friends, welcome to Gate Smashers. In this video, we are going to discuss about Huffman coding. And in this video, we are going to discuss
all the important points related to Huffman coding which will be very beneficial for your competitive
exams, even college or university level exams. So guys, like the video quickly, subscribe the
channel if you haven't done it yet. And please press the bell button so
that you get all the latest notifications. So let's start with Huffman coding. The first important point is, it is a greedy technique. And we use this to encode data and compress the data.
Now actually, what is data encode or data compression? How do we do it? Why do we use Huffman coding? I want to explain all these points with a simple example. So here, what I have in the example is, a message M which is made of 100 characters. Now in these 100 characters, we have some alphabets A, B, C, D, E, F. And we have a frequency that is
A is 50, B is 10, C is 30, D is 5, E is 3, F is 2. We have this frequency, so you will
total how many characters will be made? 100. Now what I have to do is, encode it. Encoding means I have to digitize it, write it in the form of 0 and 1. Obviously, our computer system works in digital form. Even if I have to communicate, means if I have to send it to someone, obviously, then what I will do is, I will
convert it into encoding and send it.
So here, the most basic encoding technique is, ASCII encoding. Now in ASCII encoding, the range we have is 0 to 127. Means in total, we have 128 characters. And if I have to represent 0 to 127 numbers in binary, then how many bits do we use? 7 bits. How did you know? In the maximum range, in this whole
range, what is the maximum element? 127. And what is 127? 1111111. Means 7 times 1. So what will be 0? 7 times 0. So here if I have to represent A, then how many bits will I need to represent A? 7. For B, C, D, how many bits will it take for all? 7. Means to represent one character. To represent one character, A or B, to represent anyone, if it is 7 bits, then to represent 100 characters, then my total will be 700 bits.
So means, you put 700 bits in total, encode this whole message in digital format. If we talk about the second technique, now we are not talking about Huffman coding, from this example, you will be clear
about the benefit of Huffman coding. Let's say, if we know that there are only 6 characters in my message, A, B, C, D, E, F. 6 characters means 6 types of characters. So how can I represent it here? A is 000, B is 001, C is 010, D is 011, E is 100, F is 101. Because if I know in the message that there are 6 types of alphabets, then I can represent them in 3 bits only. Because my motive here is to uniquely identify each alphabet. So see, we have represented it uniquely, so to represent each one, I took 3 bits,
to represent each one, I took 3 bits. Means, if one character is taking 3 bits to represent, then to represent 100 characters,
300 bits will be consumed here. So means, 300 bits will be consumed,
then your message will be encoded.
So means, we have reduced its comparatively, but now comes Huffman encoding. Huffman coding says that these bits are more, you have to represent them in a minimum way. So how does Huffman encoding work? In all the frequencies given to you, in that whole frequency, you have to pick up the 2 minimum elements. So which are the 2 minimum elements? First of all, we have F. So we have written F. How many Fs are there? 2 Fs. And how many E's are there? E is 3.
Pick up the minimum 2 of them. So here we saw 50, 10, 13, all of them. So pick up minimum 2, F and E. So we have written the frequency of these two here. Now what I have to do is, now if I combine these two, means how many characters will come in total? Obviously 5 will come. Here you have to use a simple formula, that the smaller element, keep it on the left side. Keep the larger or bigger element on the right side. Now see, how does it work? Pick up the minimum 2 elements, picked up 2 and 3, and combined it, and what comes after that? 5. Now you add this 5 to this list, means add it to all these characters again, and then pick up 2 minimum elements. Now see, these two are cancelled in a way. Now see, the remaining 4 of these, 5 and 5 obviously, who is the minimum? These two are 5. 1 is already written, D is also 5, so here we write D.
If both are equal, then you can write B on the left side, you can write B on the right side too, but if you follow only one pattern, then you will benefit. So here we represented D, means 5. If I merge these two, if I combine these two, then in total, how many characters will be there? 10 characters. Now what you do, what you do to this 10, represent it.
So this 5 is also cancelled, this 5 is also cancelled. So out of 50, 10, 30, 10, which minimum 2 is there? 10, which is already written, which is the second 10? B. So B is also equal, so I write B here, 10. So see, how many will be there in total? Total will be 20 here. So what we are doing to this 20, we write this 20 here again. If we write this 20 again, then see, out of 50, 30, and 20, which are the two minimums? 20 and 30. 20 is already written, write 30 here. See 30, the element is greater than 20, so what we wrote on the right side, C that is 30. So what we did to these two, we merged them, so in total, there will be 50 elements. Now next we put 50 here, so see all the elements are already consumed, which are left here? 2, 50 and 50 are left here, 50 is already written, A we write here, so we wrote A here, how many A? 50. So when we merged these two, how many characters will be there in total? There will be 100 characters here.
How many were we having in total? There were only 100, so see we have 100 in total. This is the Huffman Tree, or you can call it the Huffman Graph, we have this. Now what we have to do to this Huffman Tree, this is where my answer will come. What you do is, to the left side, means the left tree, you give it 0 number, give the right one 1 number, the edge on the left, we gave it 0 number, the right one 1, 0, 1. The left one 0, the right one 1. Corresponding to each node, if we talk about this node, then 0 in the left, 1 in the right. If we talk about this node, then 0 in the left, 1 in the right. You can also do the opposite, 1 0, that is up to you.
But you follow the same pattern, so that whatever question you get, in the competitive exam, even in your college or university, you can easily solve it. So as soon as we represented the left with 0 and the right, the first thing you got, to represent A, how many bits will you need, and which coding will be used here. So see, we have to start with the root node, which is the root, we have to go from 100 to A, how many bits will be used? Only one bit will be used, that is 0. What is the value of that bit? 0. From here, go from 100 to A, this whole edge, or the name of this whole distance, is 0. Similarly, let's say B, you have to reach B, here it is.
To reach B, from the root node, 1, 0, 0. See the distance, or see which path is it, what is the path name? 1, 0, 0. So write it here. C, you have to reach C, where is C? Here it is. To reach C, 1, 1. Write 1, 1 here, simply. D, next D, D is here, so how will you reach here from root? Which path will it be? 1, 0, 1, 0. 1, 0, 1, 0. E, E is here, to reach E, 1, 0, 1, 1, 1. So write it here. To reach F, 1, 0, 1, 1, 0. 1, 0, 1, 1, 0. So here you will get the answer of the first question, that to represent A, which code will be used? Which code will be used to represent D, C, D, E, F? And see here all the codes are different. And here the second part you will get to know, how many bits did it take to represent A? 1 bit. The value of that bit is 0, but how many bits did it take? 1.
So that means 50 x 1. Next, how many Bs do you have? B is 10, and to represent 10, means how many bits did it take to represent B? Total bits took 3, that is, 3rd. Next we have C. How many Cs do we have? C is 30. So 30 x, how many bits did it take to represent C? 2, so that means 60. Next D. How many Ds do we have? 5. 5 x 4. Don't convert this into decimal by mistake. This is just a value telling that to represent D, you can use code 1, 0, 1, 0. But how many total bits? 4. So 5 x 4, 20. See how many E's are there? E is 3. 3 x, to represent E, 1, 2, 3, 4, 5 bits we will have, so 15. And F is 2. 2 x, how many bits did it take? 1, 2, 3, 4, 5, so total is 10.
So you will total this 80, 140, 160, 70, 180, 5. That means your 185 bits total will be taken to convert this whole message into Huffman encoding. So see, in the first method there were 700 bits, 300 bits, 185 bits were taken here. And what is the logic here? The number of bits, means the one with the highest frequency, we wrote that above all, so that I can represent it in less number of bits. And the one whose frequency is the lowest, see E and F were the lowest, 2 of them, we represented that with more bits, so that overall answer is composite, we will have the number of bits, less than less. The second point you can ask here, that how many bits I took to represent each character. Or we can ask you average bits, average bits required to represent each character. So see, you know that how many bits you took in total, 185, and how many total characters you have, total characters are 100 in this question. So the answer to this question is, 1.85 bits per character. You can also ask this point, like here, simple means, total bits you took were 700, how many characters were there? 100, so means 7 bits per character were taken here, so we gave that here, because in the sky code, it takes 7 bits, but here, our 7, or even here, how many are there, 3, because if you do 300 divided by 100, so 3 bits per character will be taken here.
But here, it took only 1.85 bits per character, which is the minimum number of bits required to represent the whole message. So what is the benefit here? Now if you are compressing the data, then obviously, if you are representing the data in a small number of bits, then your storage will be utilized in an efficient way. Second point, if you are transmitting the data, to transmit the data, we know what is the transmission time, message divided by bandwidth, message size divided by bandwidth. If you do this whole message, then you will get 700 bits, 700 bits divided by whatever bandwidth you get. But in this case, what will come? 185 divided by bandwidth. So see, the transmission time directly proportional to what? Message size.
That means if your message is bigger, means if the size is bigger, then you will take more time for transmission. If your message is less, then transmission time will be less. So obviously, we want that we take less transmission time, so that means you use Huffman encoding, and code the whole message, using the Huffman encoding. Thank You..