Compression

Homepage

Why Data Files Are Compressed:

  • Files will take up less storage space on a device.
  • Streaming and downloading files from the internet is faster because they take up less bandwidth.
  • Web pages load faster in web browsers.
  • Many messaging services have limits to the size of a file being sent therefore, compressing a file will allow you to send the same file.

Lossy Compression:

Description:

  • Lossy compression Permanently removes data from a file.
  • This permanent removal of data limits the number of bits the file needs reducing the size of the file.

Advantages:

  • Reduces the file size immensely so more files can be stored on that same device.
  • Lossy compressed files can be downloaded more quickly because they take up less bandwidth.

Disadvantages:

  • Lossy compression PERMANENTLY loses data that can never be restored.
  • Lossy compression only does not work on files that cannot lose data permanently like text or software files.
  • Lossy files are worse quality to the original but tis quality change is usually unoticeable.

Examples:

  • JPEG (Images)
  • MP3 (Audio)

lossless Compression:

Description:

  • Lossless compression makes a file smaller by temporarily removing data.
  • This temporary removal of data occurs when the file is being stored but is then restored to the original file when it is opened.

Advantages:

  • Data is only temporarily removed so there will be no quality loss when the file is opened.
  • Lossless files can be decompressed unlike lossy file.
  • Lossless compression can take place on all types of files including text and software files unlike lossy compression.

Disadvantages:

  • There is only a slight reduction in file size so files that were lossy compressed will take up less space than lossless compressed files.

Examples:

  • PNG (Image)
  • FLAC (Audio)

Run Length Encoding:

Description:

  • RLE is a form of lossless compression so it reduces a file's size without losing any data or quality.
  • It works by carrying out Runs. A run is where it looks for consecutive repeating data in a file.
  • Instead of storing each piece of data seperately, it stores the number of times a piece of data is repeated.

For Example:

Run length encoding Bitmap
  • For example, if the image above uses an 8-bit colour depth, the file size would be 480 bits. This can be reduced by using RLE.
  • First of all, count the number of times the same data is repeated in each run and store this:
  • Run length encoding RLE
  • Next, store this information is pairs in the format: (Number of pixels, Colour)
  • In this example, there are 20 pairs to store. Each pair contains a number that is stored as an 8-bit binary number and an 8-bit colour (8-bit colour depth) - so 8+8 = 16 bits in total
  • The final compressed image would be 20 X 16 = 320 Bits
  • So,140 bits would be saved by compressing the image using RLE

Disadvantages:

  • Sometimes however, RLE can increase the file size of an image, this can happen when the data does not have many repeated runs of data.

Huffman Coding:

Description:

  • Data in regular files take up the same amount of space which is often inefficient.
  • Huffman coding stops this by giving each data value in a file a unique binary code but with varied lengths where values that appear less frequently will have shorter binary codes.

For Example:

  • If we wanted to encode "He sells seashells".
  • We would start off by counting the number of times each character occurs starting with most to least frequent and put this information into a table:
  • Run length encoding RLE
  • Next, take the 2 least frequent characters and put them into an ordered list where the smaller frequency is ALWAYS on the lEFT. Add the frequencies and place the sum on top of them.
  • Repeat this process with the remaining characters. At the end, label the final huffman tree branches by labeling all left branches "0" and all the right branches "1":
  • Run length encoding RLE
  • So, "He sells seashells" would be encoded as:
  • 1011001001100010 11110011001010111 01100010111 = 44 bits