Introduction
Everyone should be backing up their data. This just doesn't go towards sysadmins, but even people at home who never even think of it. Just like everything else in I.T., hard drives were built to fail. If you do not make efficient back ups, you are at your own mercy when your drive will no longer spin up, meaning all that data is now gone.
I try to make as many of my tasks as easy as possible simply by using Bash scripting. Its the most portable language meaning 99.9% of the time, you aren't going to need bash to run the script...and even if so, most of the code I write can work with other shells as well.
I'm going to copy/paste parts of my script, and give details about each block of code. At the end, I'll also provide a link to the full script.
What WIll This (Not) Do?
This script will do the following:
Lets Begin
DAY=`date +%d` MONTH=`date +%m` YEAR=`date +%Y`
Here we get various information needed for storing our backup information. Pretty straight forward with a smile at the end if I do say so myself. Pretty easy, eh?
# Backup directory to use (2011/08/31 for 08.31.2011) BKDIR="/backups/$YEAR/$MONTH/$DAY" if [ ! -d "$BKDIR" ]; then mkdir -p $BKDIR fi
We make the directory where the backup files will be stored (-p ensures any missing directories will be made). The script must have write permissions to the BKDIR (in this case /backups/), or else it will fail.
BKLOG="/backups/$YEAR/$MONTH/$DAY.log"
Log of information (will make more sense later). I like things being consistent.
ARRPOS=0
I'll be honest, I can't really explain this, but it is used, and should make more sense when you actually see its use. Its like trying to explain to someone new to computers how a keyboard makes a letter appear...you just tell them "just press the key to see its power" if you want to keep them interested in you.
DRIVE=('sda' 'sdb')
This backup solution is device-based, and my server has two devices (one main, another with misc. data). The end result will basically be $BKDIR/$DRIVE[$ARRPOS].tar.gz (i.e.: /backups/2012/01/26/sda.tar.gz)
BACKUP=('/' '/pub')
What to back up on each device (for me, this is backing up everything).
SDAEX=('/media' '/tmp' '/dev' '/proc' '/sys' '/mnt' '/pub' '/var/cache' '/backups')
A lot of these aren't needed, and we also do not want to back up our back ups by default.
touch $BKLOG
Create an empty file for the log file, making sure it can be made.
echo "To: someone@somewhere.com" > $BKLOG echo "From:Backups" >> $BKLOG echo -e "Subject: Generated backup report for `hostname` on $YEAR.$MONTH.$DAYn" >> $BKLOG echo -e ">> Backup for: $YEAR.$MONTH.$DAY started @ `date +%H:%M:%S`n" >> $BKLOG
The purpose of $BKLOG is to log the status of the back up process. When it is done, we will be e-mailing the report out (see "To:" field). You can change this however you want, this is how I did it for myself.
# Checks to see if day = 1, and if so, backs up the last month's backups if [ "$DAY" == "01" ]; then M=`echo -n $MONTH | awk '{printf substr($1,2)}'` let OLD=M-1 echo "- New month detected. Backing up previous month's ($OLD) backups." >> $BKLOG echo " + Backup file: /backups/$YEAR/$OLD.tar.gz" >> $BKLOG SD=$( { time tar -cpPzf /backups/$YEAR/$OLD.tar.gz /backups/$YEAR/$OLD/; } 2>&1 ) # Got stats, delete folder rm -rf /backups/$YEAR/$OLD SD=`echo -n "$SD" | grep real` MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'` SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'` echo -e "- done [ $MIN $SEC ].n" >> $BKLOG fi
As the comment block states, this is a monthly backup that occurs. It backs up the previous month's backups before starting one for the current day. This, combined with other routines put into the system keeps backups for a lengthy amount of time. This is also why we excluded /backups from our routine...WAY too many back ups of back ups if you ask me.
One thing I want to talk about, since its the meat of the actual back up routine, is this line:
tar -cpPzf /backups/$YEAR/$OLD.tar.gz /backups/$YEAR/$OLD/
This is basically telling tar to create (-c) a gunzipped (-z) back up file (-f) named /backups/$YEAR/$OLD.tar.gz containing the data found in /backups/$YEAR/$OLD/ directory, preserving permissions (-p), using absolute file names (-P) basically not stripping "/" from the beginning of the file name. The -P switch is used because it makes the output ugly, and it can lead to a broken back up.
Continuing on...
# Cycle through each drive and back up each for d in "${DRIVE[@]}"; do echo "- Backing up drive $d" >> $BKLOG # By default, at least don't backup lost+found directories EX="--exclude=lost+found" # If we are backing up drive 1 (/dev/sda), there's to exclude if [ $d == "sda" ]; then for e in "${SDAEX[@]}"; do EX="`echo -n $EX` --exclude=$e" done fi # Do the magic work and display some cool info SD=$( { time tar -cpPzf $BKDIR/$d.tar.gz $EX ${BACKUP[$ARRPOS]}; } 2>&1 ) SD=`echo -n "$SD" | grep real` MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'` SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'` SD=$(ls -liha $BKDIR/$d.tar.gz) SIZE=`echo -n $SD | awk '{printf $6}'` let ARRPOS++ done
This is the code that does the backing up of current data. This is also where we need ARRPOS. This is all pretty much self explanatory as well to be honest. Biggest change here is, besides the array wrapped in a for block, we get the file size of the created back up file. So, lets re-wind a little bit here and look at the for block...
Another block of code I didn't discuss earlier (since its in a couple of spots) is this:
SD=$( { time tar -cpPzf $BKDIR/$d.tar.gz $EX ${BACKUP[$ARRPOS]}; } 2>&1 ) SD=`echo -n "$SD" | grep real` MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'` SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'`
If you run the time command, you'll get an output like this:
[ehansen@sfu ~]$ time real 0m0.000s user 0m0.000s sys 0m0.000s
What the block of code does is measure how long it takes to create the back up file (tar command), and then we only measure the "real" time. The reason why this is done is because even in a multithreaded environment like Linux lets you have, a process may have to stop its thread for a moment to either let another user's program (or system action like signal handling) occur. The "real" time is the actual time it took for a command to execute. We use my best friend, Mr. awk, to parse the data from the information.
for d in "${DRIVE[@]}"; do
d will be whatever value is currently at $DRIVE[$ARRPOS]. For example, the first time around, d will be sda, second time it will be sdb.
# By default, at least don't backup lost+found directories EX="--exclude=lost+found" # If we are backing up drive 1 (/dev/sda), there's to exclude if [ $d == "sda" ]; then for e in "${SDAEX[@]}"; do EX="`echo -n $EX` --exclude=$e" done fi
ext3 & 4 file systems create this wonderful file called lost+found. Personally, I'm not a fan of it, because every time I try to restore corrupted data from it, I just get my bottom handed to me, but besides this, its a folder that really should be pointless to include in a routine back up measure. If we're working on the first partition (where /boot, /home, /var, etc... are stored), we exclude some of the more minor files that really mean nothing when the system is shut down. The reason why there's the line "EX="`echo -n $EX` --exclude=$e" is its basically the same as, for example, in PHP or Perl where you can do EX .= " --exclude=$e". Bash, however, is not as friendly with strings and concating.
# Mail this script out...ssmtp for GMail accounts, otherwise change for
# appropriate MTA
/usr/sbin/ssmtp -t < $BKLOG
This is the last of the back up script, just giving out some generic details, and then using ssmtp to send out the report. Nothing to really discuss here.