[Box Backup] Bundled TDB fails to build

Chris Wilson chris at qwirx.com
Wed Oct 20 19:59:06 BST 2010


Hi Charles,

On Tue, 5 Oct 2010, Charles Lecklider wrote:
> On 05/10/2010 11:33, Achim wrote:
>> I would love to test the recent developments, especially 2756 (Improve
>> handling of directories with many files).
>
> That'd be the stuff I'm working on:
> http://www.boxbackup.org/svn/box/invisnet/perfhack1/

Thanks for the great work you've been doing on this! It's really nice to 
have someone else working on Box Backup with me :) It's been a while since 
I had a co-developer :)

> That branch is from 0.11rc8 so there's no QDBM vs. TDB issue.

I've reverted the switch to TDB, so the current trunk should use QDBM as 
your branch does. I hope that will make it easier to merge your 
improvements back into the trunk. Please let me know if you foresee any 
problems.

> However, in order to fix the problem the store directory format has 
> changed; this branch will work with old stores, but once you've switched 
> you cannot go back. There is no conversion process as such - directories 
> are simply written in the new format when they change.

I'd like to know a bit more about the new directory format. I've been 
trying to parse the commits you've made, but so far I haven't found the 
difference between the old and new formats, e.g. a new type ID in the 
header. Please could you enlighten me?

The reason that I ask is that I've decided that an object's references 
need to be permanently stored in the object itself, for security. 
Otherwise, if both the reference count database and a changed parent 
object are lost, the newly-added children would be left without any 
references and any way to reconnect them, which gives them zero 
references, which makes them liable for immediate deletion.

For this reason I've decided to change the store format for files and 
directories to be able to include any number of references, not just one 
(as currently supported).

This also makes a permanent reference database unnecessary, and if only a 
temporary one is required (to allow fast updates during housekeeping and 
bbstoraccounts check) then concurrency is no longer a problem (as only one 
of these can be running on a given account at a time), so I no longer need 
to use TDB instead of QDBM. This solves the problem of making TDB build on 
Windows.

However, it does mean that I need a fast way to append a new reference to 
file and directory objects. Currently I think that means rewriting both 
objects in their entirety, which would negate the benefits of the new 
directory format. So I was thinking about something like having a 
different kind of directory entry, which only contains additional 
references.

Since most changes (adding a new file to an existing directory) would 
require modifying every file in the directory, to add a new referencer's 
ID to the end of the file, I've also been thinking about having "template 
references" in the directory, which apply to every file that doesn't have 
its own individual reference count (that's most of them).

While this is a performance optimisation, I think I should be prepared to 
eventually have two new kinds of directory records: directory references 
(additional refs of the directory itself) and template references 
(additional refs of all files within the directory that don't have their 
own reference counts).

How do you think this might interact with the changes you've been making 
to the directory store format? Would it be a problem?

Also, while reviewing the code I suddenly realised that RaidFile doesn't 
support appending to an existing file, as far as I know. Files can only be 
completely rewritten, being converted to raidfile format in the process. 
Does that mean that your incremental directory writing code is 
incompatible with the user-space software RAID implemented by RaidFile?

Thanks in advance for your help,

Cheers, Chris.
-- 
_ ___ __     _
  / __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |



More information about the Boxbackup mailing list