How to compress tar archives using gzip and bzip2


In our last post we discussed how to stack multiple files and directories to create a single tar archive. In most cases these archives needs to be compressed  and in order to perform compression additional compression programs need to be used. In this post we are discussing two such programs, gzip and bzip.


tar file compression/de-compression with gzip

gzip is a program used to compress/de-compress a single file and is usually used along with tar command to compress an archive . tar command uses -z flag to use gzip process. A tar file compressed with gzip will have the filename extension .tar.gz or .tgz. Syntax of using tar with gzip is
tar -z [option] archive_name.tar.gz source_file

 

examples:

#1) To create a gzip compressed tar archive, execute
tar -cvzf archive.tar.gz file1 file2

#2) To extract the contents of the gzip compressed tar archive, execute
tar -xvzf archive.tar.gz

#3) To extract the contents of the gzip compressed tar archive to a specific location /home/calypso/data/, execute
tar -xvzf archive.tar.gz -C /home/calypso/data/

tar file compression/de-compression with bzip2

Similar to gzip , bzip2 also compress a single file. tar uses -j flag to use bzip2 as the compression/de-compression process. A tar file compressed with bzip will have the filename extension .tar.bz2 or .tbz2. Syntax of using tar with bzip2 is
tar -j [option] archive_name.tar.bz2 source_file

examples:

#4) To create a bzip2 compressed tar archive, execute
tar -cvjf archive.tar.bz2 file1 file2

#5) To extract the contents of the bzip2 compressed tar archive, execute
tar -xvjf archive.tar.bz2

#6) To extract the contents of the bzip2 compressed tar archive to a specific location /home/calypso/data/, execute
tar -xvjf archive.tar.bz2 -C /home/calypso/data/

Comparison between tar, tar.gz and tar.bz2

Let's create 3 archive files with tar, tar.gz and tar.bz2 format. Source files used for this example are stdout.log and stderr.log







In the above output you can see that the file size of stderr.log and stdout.log are 14MB and 276MB respectively. The combined size of both files are 290 MB. Now lets create the archive file. In doing so lets find out the time taken by each formats to complete.


























From the above output it is understood that the files archive.tar, archive.tar.gz and archive.tar.bz2 took 3.66sec, 6.55sec and 1mnt 14 sec respectively to complete execution. Now lets find out the file sizes of the archive files created.







The file size of archive.tar, archive.tar.gz and archive.tar.bz2 are 290MB, 21MB and 14 MB respectively. The size of archive.tar is similar to the combined file size of stderr.log and stdout.log file. Which means there is no compression in a simple tar and is faster than the other two methods.
For tar compression with gzip, time taken to create archive is slower than simple tar but faster than bzip2. At the same time archive compression ratio is greater than the tar file but lesser than the bzip2 format.
bzip2 has the highest compression ratio but slower than the other two methods

Comments

Popular posts from this blog

Understanding awk command with examples

what is an inode?

Understanding sed command with example -Part 1