You are inside a passionate computer engineer's lab...
Proceed at your own risk...

Look for something inside my Lab...

Home-made Linux Backup Tool

I know I know... Linux has many fully-featured backup tools able to backup your system, configuration, user files,... from different levels and to different mediums. But, in my situation, I had some quite special requirements for the backup tool I want to use:
  • It should backup the whole system as is, at the raw level. Backup partitions, not files. As I needed to get away from the file system support and different problems arising when backing up individual files.
  • It should enable compression of the data. I went a bit far to ask for 7zip compression levels ;-)
  • It should be smart enough to avoid backing up the free space on the partition. And here began some problems I will talk about later.
  • It should have a balance between optimal performance and backup safety.
  • It should give me the maximum configuration options. And the default configuration should be optimal for most cases.
  • It should depend on the least number of tools, so I can backup my Linux OS from a Live CD for example, without requiring installation of rare or fat utilities.
  • It should give some feedback on the progression of the lengthy operations.
  • It should not be too hard to code too!
So here is my point clear now. Let's move to the heart of it!
I created a shell script called full-backup-1.0.sh that did just what I wanted. I will present it here so the community takes advantage of it and help me to make it better or fix some bugs, if any.

Usage

full-backup-1.0.sh input_device [--no-fs-check] [--no-zero-free-space] [--no-compress] [--read-only] [--safe-rbs] [--rbs=RBS]
This will perform a full raw backup of the partition data and create a single file holding the data of the partition in the current directory.
  • input_device: Source partition device node.
  • --no-fs-check: Disable file system check before starting the backup. This is not recommended for safety reasons. By default, file system check is enabled.
  • --no-zero-free-space: Disable filling the empty space in the partition with zeros. This could be useful to be able to recover deleted files from the partition later, but can reduce the compression ratio of the backup. By default, filling the empty space in the partition with zeros is enabled.
  • --no-compress: Disable compression. This implies the option --no-zero-free-space. This will result in a file that has the same size as the partition. This is useful when backup time must be the least possible while backup size is not an issue.
  • --read-only: Disable any operations that can modify the partition. This implies the options --no-fs-check and --no-zero-free-space.
  • --safe-rbs: Use a safe read buffer size. The read buffer size will be set to 512 bytes. Read buffer size is the number of bytes that are read at once from the source partition. Safe here means that if the partition medium has some damaged sectors, then the bad sectors will not hide the readable ones. Use this if you are unsure about the health of the source partition medium or when performance is not an issue.
  • --rbs=RBS: Set the read buffer size to the specified RBS size. By default RBS is 64KB (65536 bytes) and this should be a good deal for most cases where partition medium health is good.
Example:
full-backup-1.0.sh /dev/sda2

Personal experience

Using this command, I backed up my Debian Linux partition (Total size: 32 GB, Used size: 8 GB) in a compressed file of 2 GB size in 2h25min. The backup was done from a shell running on a Knoppix Linux Live DVD. For me, this is exactly what I needed.

Download

You can download the script with 7zip binaries for Intel i386 architecture here. The file is "tar.gz"-compressed.

Implementation details

Here I present the shell script I wrote with details on each section of it.

Simple BSD License

As usual, I license this code under the terms of the Simple BSD License. Here is the contents of the license:
Copyright (c) 2011 Koutheir Attouchi.
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Internal state initialization

Here I initialize the contents of the global variables used in the script. Here is the meaning of them:
  • readonly: this will be a read-only operation.
  • checkfs: check the file system for errors before the backup.
  • zerofs: fill the empty space of the partition with zeros.
  • imp: input mount point for the source partition.
  • zf: the file full with zeros that will be created on the source partition.
  • of: output file name.
  • nocompress: do not compress the output file.
  • compressor: file name of the executable file which will compress the data.
  • pidfile: a file that will hold a process identifier to allow for inter-process communication between child processes launched by this script.
  • statusfile: a FIFO pipe file that will enable progression display.
  • tempfile: a FIFO pipe file that will enable input data size control.
  • uiupdatedelay: delay between progress display updates.
  • rbs: reading buffer size.

Command line parsing

Here is a regular command line parsing, nothing special to tell.

Preliminary checks

For the moment, a simple check whether the source partition device node exists.

Internal state adjustements

  • The extension of the backup output file is adjusted to reflect the nature of the file (compressed or not).
  • Read-only mode disables the file system check and filling the free space with zeros.

Reading source partition information

As a normal assumption, I am always backing up Linux partitions holding ext2 file system or later. This allows me the use of the dumpe2fs utility to gather some file system information.

Listing current backup configuration

This is done to tell what will be done next so the user knows in advance if he configured the tool as desired or not.

Unmounting partition

This is done because it's unreliable to backup a partition that is mounted, unless some more advanced technologies are used, which we avoid here.

Checking partition file system for errors

This is a security measure, and it is very recommended.

Filling partition free space with zeros

This is a technique used to allow for the best possible compression ratio of the partition, avoiding backing up the free space, while still being a simple technique.
It tries to fill the empty space on the source partition with a huge file filled with zeros. Once the partition is full, remove the file. This leaves the space previously allocated for the file with zeros (most of it).
Once done, this will higher the frequency of zero blocks and will allow the compressor to allocate very small space to represent the free space on the partition.
Operation progress will be shown during the filling operation which may take a long time to get done. Once the operation is done, the partition is unmounted again.
Once drawback for this method, it overwrites all the free space on the partition. Thus it renders impossible the recovery of previously deleted files on the partition. But if that is not an issue, one should be fine using this technique.

Backing up partition data

In uncompressed form

This uses the dd utility to backup the partition and removes any padding at the end of operation. The padding may be done due to the block size specified to dd.

In 7zip-compressed form

This uses two instances of dd: one to read the partition data and another to limit the output to the really read data, thus avoiding the compression of the padding bits.
The compression used here is the "Ultra" profile of 7zip compressor, which should produce the best possible compression ratio, but will consume much memory and much CPU power.
As 7zip is not a standard program, I put it beside the shell script so that I can use it from a live medium like Knoppix.

Finalization

This cleans up the temporary files and reports the status of the backup operation.

No comments:

Post a Comment