[Box Backup-dev] Re: [Box Backup] Re: Block Sizes and Diffing (was: Re: [Box Backup] error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry)

Chris Wilson boxbackup-dev at fluffy.co.uk
Mon Sep 24 22:03:24 BST 2007


Hi Ben,

On Mon, 10 Sep 2007, Chris Wilson wrote:

> Hi Johann,
>
> On Mon, 10 Sep 2007, Johann Glaser wrote:
>
>>  The output consists of 1216 lines and starts with:
>>     642     1069 this  s=      49
>>     407      107 this  s=     177
>>     336     1065 this  s=    1633
>>     296     1363 this  s=    1649
>>     277     1159 this  s=    1617
>>     265     1424 this  s=    1681
>>     254     2706 this  s=    1601
>>     248     1011 this  s=    1665
>>     248     1005 this  s=      33
>>     246     1015 this  s=    1745
>>     219     1027 this  s=    1729
>>     205     1226 this  s=    1697
>>     200     1006 this  s=    1713
>>     194     1012 this  s=    1585
>>     156     1303 this  s=    1569
>>     155     1014 this  s=    1761
>>     141     1016 this  s=    2017
>>     114     1020 this  s=    2033
>>     111     1246 this  s=    1553
>>     103     1025 this  s=    1777
>>  (and all following are <100 for the first column)
>
> Ouch! 1216 different block sizes in the same file!
>
> Ben, I think we need to fix the diffing algorithm. This doesn't seem 
> reasonable to me.
>
>>  Unfortunately I don't understand BoxBackup's diffing and block-size
>>  algorithms, so I don't know what to conclude from my above listing. :-)
>>
>>  Do I understand correctly, that BoxBackup tries to find the smallest
>>  possible block size to transmit (and store) changes?
>
> No, it picks an "appropriate" block size for each chunk that it detects has 
> changed. Personally I don't think this is particularly smart, I think we 
> should keep the same block size for the whole file.

What do you think about the idea of reducing the number of possible block 
sizes in a file? Please could you explain to this bear of little brain 
how the block-size selection algorithm is supposed to work for patches, 
and why we allow so many different block sizes in the same file?

Is this related to chromi's diffing optimisation? Is he around to help us?

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |



More information about the Boxbackup-dev mailing list