Simple, Secure Backups for Linux with rsync
rsync is a UNIX tool for transferring files and synchronizing data. Unlike other file transfer tools (like FTP or SCP) rsync examines the files on both the sender and the receiver and efficiently transfers only what is required to synchronize them.
If this is the first run of rsync, it will transfer each file in its entirety. On the other hand, if this rsync job has been run before, it will transfer only the changes. If there have been no changes, it will transfer nothing.
The benefits of rsync are that it is already installed on most, if not all, linux distributions - so there is nothing to install or configure - and it is secure, as it runs over SSH, which is an encrypted transport.
When You Should Use rsync
rsync should be considered when there is a file, or a set of files and directories, that will need to be kept in sync with one another.
Each time you run the same rsync command, you will synchronize the two locations - the destination will be updated to match the source. This will be done efficiently - if it is a 10 gigabyte file but only 100 megabytes have changed, only those 100 megabytes will be sent - not the entire 10 gigabytes.
You can also specify a directory and rsync will recursively synchronize the entire contents of that directory and all of its subdirectories to the destination. This is a very convenient way to synchronize two filesystems, for instance, or two home directories.
rsync has Simple Requirements
As we mentioned, rsync is almost universally available on all linux systems - it is unlikely that you will need to install or configure any software. You can make sure rsync is installed by running this command:
# which rsync
The output should be something like this:
You should also make sure that you have the ssh client installed:
# which ssh
Both systems - the source and the destination - need to have rsync installed. Further, while the source system simply needs an ssh client (which we just tested for) the destination system needs to have the SSH Server running on port 22. You can test this by running this ssh command on the source system:
# ssh username@destination
So, if your destination host is 192.168.0.1 and the username you use to log into the destination is "username", you would test SSH with this command:
# ssh email@example.com
You may see a message like this:
The authenticity of host '192.168.0.1' can't be established.
DSA key fingerprint is 18:e3:aa:5d:4f:00:73:6d:67:af:6e:c9:10:6b:8d:23.
Are you sure you want to continue connecting (yes/no)?
type "yes" and hit enter(*), and then enter your password. You should now be logged into the destination server. Simply type "exit" and hit enter to log out. You have successfully tested the SSH connection between the source machine and the destination machine.
* If you are connecting to an rsync.net server, or any other server out on the Internet, you should confirm that the "key fingerprint" matches before typing "yes". rsync.net key fingerprints can be found here.
Synchronizing a Directory with rsync
We have confirmed that both systems (source and destination) have rsync installed and we have tested the SSH connection between them. Now you need to choose a directory to synchronize. Let's choose:
To synchronize the /etc directory, and all of the files in it, as well as all of the subdirectories underneath it, recursively, to /backup on the destination, run this command on the source system:
rsync -avH /etc firstname.lastname@example.org:backup
You will be asked for your password, as the rsync tool establishes an SSH connection between the two systems. You will then see a list of all the files it is transferring as they are transferred.
Once you have a new command prompt, the synchronization has completed.
Keeping in sync with rsync
After this first successful test, I encourage you to re-run the exact same rsync command immediately. You will see that it completes almost instantly, since very little (perhaps nothing) has changed in the source files since you ran it the first time - so it synchronizes the source and destination almost instantly.
If you make some changes to one or more files in /etc (the source directory we used in the test, above) and then re-run the same rsync command again, you will see those files retransferred, but the job will complete very quickly as most of the files have not changed.
It is worth noting that if you interrupt an rsync job - perhaps by typing CTRL-C while it is running, or perhaps because of a network failure, you can simply re-run the exact same command and it will pick up right where it left off. There are no problems that arise from a broken or aborted rsync job - simply re-run the exact same command again.
If you delete files from the source, they will remain in place on the destination - even though they no longer exist on the source. If you would like files that you delete on the source to also be deleted on the destination, add the --delete option to your command line:
rsync -avH --delete /etc email@example.com:backup
But be careful - if you are using rsync as part of a backup process, this can be dangerous - rsync doesn't know that you accidently deleted some files on the source, and with the --delete option in place, it will happily delete them on the destination ... and then you will not have them anywhere.
Automated, Scheduled Backups with rsync
rsync is a very simple, very reliable backup tool that is uniquely suited to the problem of backing up one linux system to another. Presumably you would like to do this automatically, on a schedule.
The problem is that, as we saw above, SSH asks us for a password. How will we type in a password every night when our backups run ? The answer is to use an SSH key to log in rather than a password.
On the source system, log in as the user that will be performing the backup.
As that user, on the source system, run this command:
ssh-keygen -t rsa
Accept the defaults - do not change the filenames or file locations It is very important that the resultant private and public keys reside in your home directories .ssh directory, or ~/.ssh (which is the default)
DO NOT enter a passphrase - just hit enter twice, leaving an empty passphrase.
Upload your newly created public key to the destination server using this command:
scp ~/.ssh/id_rsa.pub firstname.lastname@example.org:.ssh/authorized_keys
Now you have a key in place on the destination server. You should test it by logging in with SSH, just like we did with our original test, above:
# ssh email@example.com
... except this time, the system should simply log you in, without asking for a password. Remember, this passwordless login will only work for the source user that ran the ssh-keygen command, above - it will not work for any other users on the source system.
Now you may edit your crontab to schedule your rsync job to run. Enter the crontab with:
# crontab -e
and put in this job, which is an example that will run at midnight:
0 0 * * * /usr/bin/rsync -avH /etc firstname.lastname@example.org:backup
(note that we specified the full path to the rsync command, which you should always do when putting commands into the crontab)
(NOTE - your rsync might not be installed in /usr/bin - perhaps it is in /usr/local/bin - check the 'which rsync' command that we ran in the very beginning of this article for the location of your rsync command)
Now the /etc directory will be synchronized to the destination every night at midnight. If you want to sync more directories, you can add more rsync commands to the crontab:
0 0 * * * /usr/bin/rsync -avH /etc email@example.com:backup 10 0 * * * /usr/bin/rsync -avH /var firstname.lastname@example.org:backup 20 0 * * * /usr/bin/rsync -avH /home email@example.com:backup
The second command syncs /var, and runs at 12:10, and the third command sync /home and runs at 12:20. The string "10 0 * * *" means "the 10th minute, of the 0th hour (midnight), every day, every week, every month. If you wanted a job to run at 2:30 am, it would be "30 2 * * *".
Robust Backups with rsync
Simply synchronizing a source with a destination does NOT mean the source is properly backed up - but it's a start.
What you need next is to maintain multiple versions of your data going back in time such that you can restore a file or directory as it existed on a particular date.
This is very easy if your destination is an rsync.net offsite filesystem.
The rsync.net cloud storage platform creates and maintains daily and weekly snapshots of your entire account, you don't need to do anything - just keep doing a simple sync of your source to our destination. It works just like "Time Machine" in a Mac OSX system - at any time you can enter your rsync.net account and browse into a date in the past and see your entire account as it existed on that date.
However, if your destination is your own linux system, you'll need to use an old fashioned (but very elegant) system called "rsync snapshots".
rsync snapshots are detailed here. Remember, there is no reason to use the rsync snapshot method with an rsync.net account because our cloud storage platform is maintaining these snapshots for you already.
rsync.net offers secure cloud storage on an open standards platform for offsite backup and disaster recovery.
With five global locations, rsync.net has provided data safety and industry specific regulatory compliance since 2001.
Customers have the flexibility of using whatever tools they see fit and the backing of real engineers for support.
Let us help your firm protect data on any platform while reducing cost and complexity - Call us today at +1-619-819-9156 or email firstname.lastname@example.org.