What Does .tar.gz and .tar.bz2 Mean?
Files that have a .tar.gz or a .tar.bz2 extension are compressed archive files. A file with just a .tar extension is uncompressed, but those will be very rare.
The .tar portion of the file extension stands for tape archive, and is the reason that both of these file types are called tar files. Tar files date all the way back to 1979 when the tar command was created to allow system administrators to archive files onto tape. Forty years later we are still using the tar command to extract tar files on to our hard drives. Someone somewhere is probably still using tar with tape.
The .gz or .bz2 extension suffix indicates that the archive has been compressed, using either the gzip or bzip2 compression algorithm. The tar command will work happily with both types of file, so it doesn’t matter which compression method was used—and it should be available everywhere you have a Bash shell. You just need to use the appropriate tar command line options.
It’s worth noting that everything in this article also works on the Windows Subsystem for Linux, which allows you to install the Bash shell inside of Windows 10 or Windows 11, although there are other ways to open tar.gz files on Windows as well.
Extracting Files from Tar Files
Let’s say you’ve downloaded two files of sheet music. One file is called ukulele_songs.tar.gz , the other is called guitar_songs.tar.bz2. These files are in the Downloads directory.
Let’s extract the ukulele songs:
As the files are extracted, they are listed in the terminal window.
The command line options we used are:
-x: Extract, retrieve the files from the tar file. -v: Verbose, list the files as they are being extracted. -z: Gzip, use gzip to decompress the tar file. -f: File, the name of the tar file we want tar to work with. This option must be followed by the name of the tar file.
List the files in the directory with ls and you’ll see that a directory has been created called Ukulele Songs. The extracted files are in that directory. Where did this directory come from? It was contained in the tar file, and was extracted along with the files.
Now let’s extract the guitar songs. To do this we’ll use almost exactly the same command as before but with one important difference. The .bz2 extension suffix tells us it has been compressed using the bzip2 command. Instead of using the-z (gzip) option, we will use the -j (bzip2) option.
Once again, the files are listed to the terminal as they are extracted. To be clear, the command line options we used with tar for the .tar.bz2 file were:
-x: Extract, retrieve the files from of the tar file. -v: Verbose, list the files as they are being extracted. -j: Bzip2, use bzip2 to decompress the tar file. -f: File, name of the tar file we want tar to work with.
If we list the files in the Download directory we will see that another directory called Guitar Songs has been created.
Choosing Where to Extract the Files To
If we want to extract the files to a location other than the current directory, we can specify a target directory using the -C (specified directory) option.
Looking in our Documents/Songs directory we’ll see the Guitar Songs directory has been created.
Note that the target directory must already exist, tar will not create it if it is not present. If you need to create a directory and have tar extract the files into it all in one command, you can do that as follows:
The -p (parents) option causes mkdir to create any parent directories that are required, ensuring the target directory is created.
Looking Inside Tar Files Before Extracting Them
So far we’ve just taken a leap of faith and extracted the files sight unseen. You might like to look before you leap. You can review the contents of a tar file before you extract it by using the -t (list) option. It is usually convenient to pipe the output through the less command.
Notice that we don’t need to use the -z option to list the files. We only need to add the -z option when we’re extracting files from a .tar.gz file. Likewise, we don’t need the -j option to list the files in a tar.bz2 file.
Scrolling through the output we can see that everything in the tar file is held within a directory called Ukulele Songs, and within that directory, there are files and other directories.
We can see that the Ukulele Songs directory contains directories called Random Songs, Ramones and Possibles.
To extract all the files from a directory within a tar file use the following command. Note that the path is wrapped in quotation marks because there are spaces in the path.
To extract a single file, provide the path and the name of the file.
You can extract a selection of files by using wildcards, where * represents any string of characters and ? represents any single character. Using wildcards requires the use of the –wildcards option.
Extracting Files Without Extracting Directories
If you don’t want the directory structure in the tar file to be recreated on your hard drive, use the –strip-components option. The –strip-components option requires a numerical parameter. The number represents how many levels of directories to ignore. Files from the ignored directories are still extracted, but the directory structure is not replicated on your hard drive.
If we specify –strip-components=1 with our example tar file, the Ukulele Songs top-most directory within the tar file is not created on the hard drive. The files and directories that would have been extracted to that directory are extracted in the target directory.
There are only two levels of directory nesting within our example tar file. So if we use –strip-components=2, all the files are extracted in the target directory, and no other directories are created.
If you look at the Linux man page you’ll see that tar has got to be a good candidate for the title of “command having the most command line options.” Thankfully, to allow us to extract files from .tar.gz and tar.bz2 files with a good degree of granular control, we only need to remember a handful of these options.