Results 1 to 7 of 7
I'm running a nightly backup script of my home directory using tar. The directly size is 1.4G but regardless of the compression method (z, j, Z), I'm only compressing down ...
- 02-22-2009 #1Just Joined!
- Join Date
- Nov 2006
- Posts
- 9
tar compression issue
I'm running a nightly backup script of my home directory using tar. The directly size is 1.4G but regardless of the compression method (z, j, Z), I'm only compressing down to about 1.2G.
What can I do to get further compression if possible? I thought I could compress down to a third or half the original size.
Thanks.
- 02-22-2009 #2Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,695
Data Compression
Compression is only as effective as the data being compressed. Things like text files compress well because they have a lot of repetitive data in them. Files that have *already* been compressed or have mostly random data don't compress much.
Binaries, JPEG's, and MPEG's would be examples of file types that have already been compressed.
- 02-22-2009 #3
You might have more luck using something like rsync and only archive differences, although you have 1.4GB it is unlikely all of it changes each day. Something like backintime may also be worth a look - thread here.
- 02-23-2009 #4Just Joined!
- Join Date
- Nov 2006
- Posts
- 9
The script is one of those scripts that takes incremental daily and full backup weekly. I choose to use tar rather than rsync specifically because I needed to compress for disk space. I would think that the data could be compressed more but I guess everything is working fine. I was just curious if I was missing something.
Thanks for your time.
- 02-26-2009 #5
I was thinking you could use something like rsync to determine differences and then tar the differences, backintime was just an example using rsync ... but that may have done what you needed.
- 02-26-2009 #6Linux User
- Join Date
- Jun 2007
- Posts
- 318
You may be able to improve compression by splitting the compression into its own process and using (in the case of gzip) the --best or -9 option.
Code:gzip --best < /path/to/home | tar -cf backup.tgz -
- 02-26-2009 #7Just Joined!
- Join Date
- Nov 2006
- Posts
- 9
Jonathan183 - I believe the script already looks for changes and only runs a backup of those. As a matter of fact, I'm sure of it. I've posted the script below:
Code:#!/bin/sh # this program creates a "daily" tar incremental backup of the data specified # in SRC it puts the result in DEST # NOTE: Here "daily" doesn't mean anything - it just corresponds to # each time the program is run (this batch script is written to be run # each night, but that's not necessary) # everything must be locally mounted SRC="/home/stirling/work/" # don't forget the trailing "/" in DEST DEST="/disks/sdd1/backup/work/" # These are the number of days since the last full backup to wait # in order to make a new full backup DAYSINWEEK=7 # These are the files which store the days since last backup DAYSSINCELASTWEEKFILE=${DEST}days_since_last_week_backup DEST_TEMP=${DEST}temp.tgz DEST_DAY_PART=${DEST}day DEST_WEEK=${DEST}week.tgz DEST_FILELIST=${DEST}filelist DEST_FILELIST_TEMP=${DEST}filelist.temp # check to see if the source even exists if [ -d $SRC ] then # check to see if the destination exists... if not, then create it if [ ! -d $DEST ] then mkdir $DEST fi # still check to see if the destination directory exists, and if so, make #it writeable if [ -d $DEST ] then chmod u+w $DEST cd $DEST chmod u+w * else echo "Cannot create destination directory" exit 1 fi # We are going to do the weekly full backup, if necessary # first see if it is necessary if [ -f $DAYSSINCELASTWEEKFILE ] then DAYSSINCELASTWEEK=`cat $DAYSSINCELASTWEEKFILE` else DAYSSINCELASTWEEK=$DAYSINWEEK fi DAYSSINCELASTWEEK=`expr $DAYSSINCELASTWEEK + 1` if [ $DAYSSINCELASTWEEK -gt $DAYSINWEEK ] then # we are going to rotate backups... reset the count variable DAYSSINCELASTWEEK=0 # we need to make sure to delete any partial old backups if [ -f $DEST_TEMP ] then rm $DEST_TEMP fi if [ -f $DEST_FILELIST_TEMP ] then rm $DEST_FILELIST_TEMP fi # now make the temporary full backup (complete with a filelist) tar zcvpf $DEST_TEMP -g $DEST_FILELIST_TEMP $SRC >> /dev/null if [ ! $? -eq 0 ] then echo "backup failed" exit 1 fi # we have a successful full backup - we need to remove the old backups if [ -f $DEST_WEEK ] then rm $DEST_WEEK fi if [ -f $DEST_FILELIST ] then rm $DEST_FILELIST fi mv $DEST_TEMP $DEST_WEEK mv $DEST_FILELIST_TEMP $DEST_FILELIST if [ -f "${DEST_DAY_PART}1.tgz" ] then rm ${DEST_DAY_PART}* fi # put a timestamp into the log file echo -n "Weekly backup: " >> ${DEST}log date >> ${DEST}log fi # write the day count out echo $DAYSSINCELASTWEEK > $DAYSSINCELASTWEEKFILE # We are going to do the daily backups, if necessary # first see if it is necessary if [ $DAYSSINCELASTWEEK -gt 0 ] then DEST_DAY="${DEST_DAY_PART}${DAYSSINCELASTWEEK}.tgz" # we need to make sure to delete any partial old backups if [ -f $DEST_TEMP ] then rm $DEST_TEMP fi if [ -f $DEST_FILELIST_TEMP ] then rm $DEST_FILELIST_TEMP fi # now make the daily incremental backup (using the filelist) cp $DEST_FILELIST $DEST_FILELIST_TEMP tar zcvpf $DEST_TEMP -g $DEST_FILELIST_TEMP $SRC >> /dev/null if [ ! $? -eq 0 ] then echo "backup failed" exit 1 fi if [ -f $DEST_FILELIST ] then rm $DEST_FILELIST fi mv $DEST_TEMP $DEST_DAY mv $DEST_FILELIST_TEMP $DEST_FILELIST touch $DEST_DAY touch $DEST_FILELIST # put a timestamp on the backup date >> ${DEST}log fi # change the backup directory back to nonwriteable cd $DEST chmod a-w * chmod a-w $DEST else echo "error: directory $SRC does not exist" exit 1 fi echo "Success!!!" exit 0
vsemaska - I appreciate the response and will try that out myself to see if it works better.
Thanks all.


Reply With Quote