[Box Backup] Block Sizes and Diffing

Johann Glaser boxbackup at fluffy.co.uk
Tue Sep 11 10:09:31 BST 2007


Hi Chris!

> Ouch! 1216 different block sizes in the same file!

Thats a lot, yes. The file is created by
  tar cvf mysql-InnoDB.tar /var/lib/mysql/ib*
that means it contains the InnoDB files from MySQL. Not compressing it
is intended to enable BoxBackup to easier find differences.

> Ben, I think we need to fix the diffing algorithm. This doesn't seem 
> reasonable to me.

At least comparing the file and finding diffs should really be done just
by reading the file once. The memory usage of the bbackupd process will
increase when holding several checksum at a time, but that won't be that
much, I guess.

> > There is still a problem: insertions or deletions in the file can't be 
> > identified this way. Imagine a single byte insertion at the very 
> > beginning of the file. Then every 4k-aligned block will have changed and 
> > the whole file needs to be updated. This problem has already been 
> > addressed by rsync and is described at 
> > http://rsync.samba.org/tech_report/ ("The rsync algorithm" and "Rolling 
> > Checksum").
> 
> I believe that we already implement this, albeit modified to work with 
> encrypted data. See
> 
>    http://bbdev.fluffy.co.uk/svn/box/trunk/docs/backup/encrypt_rsync.txt
> 
> for details.

Ah, thanks, that was very instructive.

But still, I'm concerned about block sizes and alignment. How can the
blocks be calculated if the whole file is read in blocks aligned to
their size?

> Sorry, the easiest way is to configure, then cd bin/bbackupobjdump; make; 
> cd ../..; debug/bin/bbackupobjdump ... .

Ok, that was my trick too. :-)

Bye
  Hansi

-- 
Johann Glaser                          <glaser at ict.tuwien.ac.at>
             Institute of Computer Technology, E384
Vienna University of Technology, Gusshausstr. 27-29, A-1040 Wien
Phone: ++43/1/58801-38444                Fax: ++43/1/58801-38499




More information about the Boxbackup mailing list