Archiving and Compressing files with GNU Tar and GNU Zip
Updated by Linode Written by Linode
tar
and gzip
provide a standard interface for creating archives and compressing files on Linux systems. Together, these utilities take a large number of files, save them together in an archive (i.e. as a single file), and compress the archive to save space. However, tar and gzip provide a multitude of features and options that can lead to hard-to-read commands and make even the simplest operations confusing.
This document provides an overview of tar
and gzip
usage, accompanied by a number of practical applications of these utilities. If you find this guide helpful, please consider our guide to basic administration practices or the rest of the Tools & Reference series.
Using Tar and Gzip
In this guide, tar
and gzip
refer to recent versions of “GNU tar” and “GNU gzip” which are included by default in all images provided by Linode.
The tar Command
The complexity of tar
does not derive from its basic form, but rather from the number of options and settings that you can use to create and interact with archives. Given the tar
file ~/backup-archive.tar
, the following command can be used to extract the contents of this file into the current directory:
tar -xf ~/backup-archive.tar
This will extract (-x
) the archive specified by the file (-f
) named on the prompt. This archive is not compressed. To create an archive of all the files in the directory ~/backup
, use the following command:
tar -c ~/backup > backup-archive.tar
By default, tar
sends the contents of the archive file to the standard output, you can use to this to further process the archive you create. You may choose to bypass the standard output functionality with the -f
option. The following command is equivalent to the previous command:
tar -cf backup_archive.tar ~/backup
When using the -f
option, always specify the file name of the archive you want to create before specifying the contents of the archive. You may also add a -v
option to increase the verbosity of some commands. For instance the following command will output a list of files as they are added to the archive:
tar -cvf backup_archive.tar ~/backup
The order of options is sometimes important. The -f
option needs to be the last option, and thus appear closest to the name of the file that it specifies. Therefore -cvf
will perform as expected while -cfv
will fail. Many common tasks using the tar
command are explained below.
The gzip Command
gzip
and the accompanying gunzip
command provide a simple and standard method of compressing individual files. Just as tar
does not contain the ability to compress the files that it archives, the gzip
tools are only able to act on single files. The following command takes the file full-text.txt
and compresses it in the file full-text.txt.gz
:
gzip full-text.txt
You can then use either of the following commands to decompress this file:
gunzip full-text.txt.gz
gzip -d full-text.txt.gz
You can add the -v
flag to increase verbosity and output statistics regarding the rate of compression:
gzip -v full-text.txt
gzip
accepts standard input and thus can be used to compress the output of a stream of text:
cat full-text.txt | gzip > full-text.txt.gz
The compression algorithm that gzip
uses to compress files can be configured to use a higher amount of compression and thus save space at the expense of time. This ratio is controlled by a numeric argument between -1
and -9
. The default configuration is -6
. gzip
also contains --fast
(equivalent to -1
) and --best
(equivalent to -9
) as helpful mnemonics:
gzip --best -v full-text.txt
gzip --fast -v full-text.txt
gzip -3 -v full-text.txt
gzip -8 -v full-text.txt
Creating An Archive
You can create a tar
archive of the ~/backup
directory with the following command:
tar -c ~/backup > backup-archive.tar
To create a file without using the standard output redirection, you may consider the following equivalent form:
tar -cf backup_archive.tar ~/backup
The order that options (e.g. the -cf
) are invoked in is important, and the -f
option must be followed directly by the name of the file that the tar
archive will create. The final argument is the folder or selection of files to be included in the archive.
Compressing Archives
Compress an Archive using Gzip
In conventional usage, tar
is combined with a compression utility to not only archive files for more efficient backup but also compress them. Some alternative compression and archiving tools with which you may be familiar include both functions in a single procedure. However, modern versions of tar
are able to interface with common compression libraries and tools like gzip
to create a compressed archive in a single step:
tar -czf ~/backup-archive.tar.gz ~/backup/
The -z
option in this command compresses the archive using gzip
, which is a common practice when creating “tar files”.
Compress an Archive using Bzip2 and Xzip Compression
tar
also supports using other compression systems which may offer better compression rates at the expense of processor time. To compress using bzip2
, issue the following command:
tar -cjf ~/backup-archive.tar.bz2 ~/backup/
To use the xzip
tool for compression use the -J
option as follows:
tar -cJf ~/backup-archive.tar.xz ~/backup/
Automatically Determining Compression Based on File Extension
To remove the necessity of remembering the corresponding file extensions and tar
options, you can use the -a
option which allows tar to detect the desired compression system based on the file extension. Therefore, the following commands will all create a tar
archive compressed with gzip
:
tar -czf ~/backup-archive.tar.gz ~/backup/
tar -caf ~/backup-archive.tar.gz ~/backup/
tar -caf ~/backup-archive.tgz ~/backup/
Similarly the following commands will all create archives with tar
compressed with bzip
:
tar -cjf ~/backup-archive.tar.bz2 ~/backup/
tar -caf ~/backup-archive.tar.bz2 ~/backup/
tar -caf ~/backup-archive.tb2 ~/backup/
tar -caf ~/backup-archive.tbz ~/backup/
As above, tar
will auto detect for zip compression given the extensions .tar.xz
and .txz
.
Discover the Contents of an Archive
While you can always extract the contents of an archive to review the manifest of files, this can be inefficient. tar
provides the ability to view the manifest of files in an archive without extracting them:
tar -tf ~/backup-archive.tar
This will produce a list of files contained within the archive. This command works with both compressed and uncompressed tar archives.
Extracting Files from a tar Archive
To extract files from a tar
archive, issue the following command:
tar -xf ~/backup-archive.tar
This command simply extracts the content of an uncompressed tar
archive into the current directory. Here is a more practical example of how tar
may be used to extract files from a compressed archive:
tar -xzvf ~/backup-archive.tar.gz
The options specified have the following effects: -x
extracts the contents of the archive, -z
filters the archive through the gzip
compression tool, -v
enables verbose output which prints a list of files as they are extracted from the archive, and -f
specifies that tar
will read input from the subsequently specified file ~/backup-archive.tar.gz
.
When an archive is compressed with one of the other compatible compression tools, you will need to replace the -z
option with the appropriate option flag for the compression type used. For example, to unpack an archive compressed with the bzip2
tool:
tar -xcvi ~/backup-archive.tar.bz2
The -a
option that automatically determines which compression tool to use based on the file extension is available in conjunction with the extraction option -x
. Additionally, tar
provides a -k
option to prevent replacing an existing file with a similar named file from a tar
archive.
Compressing Log Files
There are some files, particularly log files created by long running daemons like web and email servers, that can grow to a great size. While removing these files does not present a viable option, these files can grow unmanageably large in a short time. Since they are plain text, compression is effective; however, because log files tend to be distinct and independent of each other, it doesn’t make sense to use a tool like tar
. In these cases it makes sense to use gzip
directly:
gzip /var/log/mail.log
This will replace the original /var/log/mail.log
with a file named mail.log.gz
. To access the contents of this file:
gunzip /var/log/mail.log.gz
However, in most cases you do not need to fully uncompress a file in order to access its contents. The gzip
tool includes tools for accessing “gzipped” files with conventional Unix tools. You can access the contents of files compressed with gzip using the following utilities: zcat
(equivalent to cat
), zgrep
(equivalent to grep
) and zless
(equivalent to less
).
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
Join our Community
Find answers, ask questions, and help others.
This guide is published under a CC BY-ND 4.0 license.