From boxbackup-dev at boxbackup.org Sat Jul 4 12:00:00 2009 From: boxbackup-dev at boxbackup.org (boxbackup-dev at boxbackup.org) Date: Sat, 4 Jul 2009 12:00:00 +0100 (BST) Subject: [Box Backup-dev] Current open tickets Message-ID: <20090704110001.09709326029@www.boxbackup.org> Note: to view an indiviual ticket, use: https://www.boxbackup.org/trac/ticket/(number) The following is a listing of current problems submitted by Box Backup users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Ticket Owner Component Summary - ------ ------ ------------- ------------------------------------------------------------ n 4 martin box libraries Port Box Backup to AIX n 6 box libraries Contribute code: SMTP client, HTTP server, Database drivers, n 7 box libraries Improve restore speed on local repositories n 8 chris box libraries Improve handling of directories with many files n 13 chris bbackupd Fix file locking on Windows n 14 chris bbackupd Fix large file issues on Windows n 16 chris bbackupquery Restore deleted directories may fail a 17 chris bbackupquery List files using wildcards a 20 chris bbackupctl bbackupctl reload reports prior settings n 45 ben bbackupd File diff performance patch (reduced disk IO and wall time n 46 chris bbackupd bbackupd only ever saves reverse diffs, corrupted files on s n 47 chris bbackupd Account numbers greater than 2^31 (0x7fffffff) do not work c n 48 chris bbackupd Locations that don't exist on first run are never tried agai n 49 chris bbackupd ID map (rename tracking) broken since [288] n 50 chris bbackupquery No way to capture stderr under Windows n 51 chris bbackupd No way to force bbackupd to re-upload files under Windows n 52 chris bbackupd Unable to control the maintenance of old vs. deleted files n 53 chris bbackupd Comparing root directory locations does not work under Windo n 54 chris bbackupd Locations not found on disk (e.g. unmounted filesystems) can n 55 chris bbackupd Should store and preserve directory timestamps 20 tickets total. From boxbackup-dev at boxbackup.org Sat Jul 11 12:00:01 2009 From: boxbackup-dev at boxbackup.org (boxbackup-dev at boxbackup.org) Date: Sat, 11 Jul 2009 12:00:01 +0100 (BST) Subject: [Box Backup-dev] Current open tickets Message-ID: <20090711110001.5CB4D326026@www.boxbackup.org> Note: to view an indiviual ticket, use: https://www.boxbackup.org/trac/ticket/(number) The following is a listing of current problems submitted by Box Backup users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Ticket Owner Component Summary - ------ ------ ------------- ------------------------------------------------------------ n 4 martin box libraries Port Box Backup to AIX n 6 box libraries Contribute code: SMTP client, HTTP server, Database drivers, n 7 box libraries Improve restore speed on local repositories n 8 chris box libraries Improve handling of directories with many files n 13 chris bbackupd Fix file locking on Windows n 14 chris bbackupd Fix large file issues on Windows n 16 chris bbackupquery Restore deleted directories may fail a 17 chris bbackupquery List files using wildcards a 20 chris bbackupctl bbackupctl reload reports prior settings n 45 ben bbackupd File diff performance patch (reduced disk IO and wall time n 46 chris bbackupd bbackupd only ever saves reverse diffs, corrupted files on s n 47 chris bbackupd Account numbers greater than 2^31 (0x7fffffff) do not work c n 48 chris bbackupd Locations that don't exist on first run are never tried agai n 49 chris bbackupd ID map (rename tracking) broken since [288] n 50 chris bbackupquery No way to capture stderr under Windows n 51 chris bbackupd No way to force bbackupd to re-upload files under Windows n 52 chris bbackupd Unable to control the maintenance of old vs. deleted files n 53 chris bbackupd Comparing root directory locations does not work under Windo n 54 chris bbackupd Locations not found on disk (e.g. unmounted filesystems) can n 55 chris bbackupd Should store and preserve directory timestamps 20 tickets total. From boxbackup-dev at boxbackup.org Sun Jul 12 22:45:20 2009 From: boxbackup-dev at boxbackup.org (Chris Wilson) Date: Sun, 12 Jul 2009 22:45:20 +0100 (BST) Subject: [Box Backup-dev] Soft-RAID support In-Reply-To: <4A2A5B6B.3050501@sommerseths.net> References: <4A23A6FF.3080401@topphemmelig.net> <4A2A5B6B.3050501@sommerseths.net> Message-ID: Hi David, On Sat, 6 Jun 2009, David Sommerseth wrote: > Chris Wilson wrote: >>> "The server currently supports a kind of RAID 5 in userland for extra >>> reliability... This is deprecated and will be removed in a future >>> version." >>> >>> Is there any reasons this will be changed? >> >> Support for it was never finished (no recovery procedure), it is pretty >> limited (only supports RAID 5 and three devices) and it was written at a >> time when OS/software and hardware RAID were not as ubiquitous or well >> supported as they are now. > > I would be willing, with some guidance to look into such a tool, if that > is the main criteria for dropping this support. That would definitely be very helpful, thanks in advance. You can read the encrypted objects (which are reconstructed successfully) and then rewrite them, which will reestablish the redundant copies. > The soft-raid solution itself seems to work flawlessly and seems to only > need this recovery tool. Or are there any other issues which is not to > well known with the soft-raid which should make me worried? Are there > any critical bugs related to the current implementation? No, I don't think so. All of our tests actually run in RAID mode, hence the "more tested" aspect. However it does impose significant performance limitations which may prevent me from making some optimisations to reduce disk I/O in future, and the new refcount database will not be mirrored, but it can be reconstructed by housekeeping in any case, so it's more of a cache than a database. >> I can see your point about the usefulness of this for distributed >> encrypted backup. However I'm not convinced about the overall merits of >> storing the data in three separate locations. > > Regarding encryption, yes, that is one key element. But if the > organisation looses one remote storage with the complete backup directory, > it got all the needed information needed to begin to crack the encryption. > If you need minimum 2 sets to be able to crack the encryption, you have > another layer of security. And it was this combination which caught my > attention. When you add locally encrypted disks, you have the third layer > of security. That's a good point. I don't believe this idea has been proposed before, and I guess Ben didn't have it in mind when he implemented BB RAID or when he proposed to remove it. >>> I evaluated BoxBackup and set it up before this part of the >>> documentation changed. Anyhow, there's also a contradictory sentence >>> later on in the same URL: >>> >>> "NOTE Running the server in non-RAID mode has not been tested as >>> extensively as in RAID file mode." >> >> Strictly speaking, in my mind, this is not contradictory as it doesn't >> say that userland RAID is better or recommended, just more tested. > > Yes, exactly. And that was also why I choose to setup the soft-raid > solution. Increased possibilities for security, and better tested. I don't think the "better tested" part is particularly true any more. While all the unit tests do use RAID, I don't think that any users use it in production. >> However I think it may no longer be true. I suspect that few people are >> using the userland RAID feature in production. If anyone except David >> is, please speak up! > > I would also be interested in hearing others experiences as well! If > I'm the only one, I agree, it's not much point in continuing this > support in BoxBackup. Then I would need figure out another way how to > solve this. I will not continue on this path if soft-raid disappears > for sure in BoxBackup. As you have a good use case for it, I am not planning to remove it in the near future. However I would be interested in thinking about better ways to implement this, such as at the OS level. I do think it would be more efficient, not less, to implement this at the block level in the OS rather than in Box. I'm also planning to implement S3 client support in Box Backup fairly soon, and I expect that most users will move to that as it frees them from the need to ever buy more disks or take their systems offline for disk upgrades. Unless we can find a good way to support userland RAID on top of S3, I expect that these code paths will diverge significantly and you may find that fewer users use libraidfile at all. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From boxbackup-dev at boxbackup.org Sat Jul 18 12:00:02 2009 From: boxbackup-dev at boxbackup.org (boxbackup-dev at boxbackup.org) Date: Sat, 18 Jul 2009 12:00:02 +0100 (BST) Subject: [Box Backup-dev] Current open tickets Message-ID: <20090718110002.6C06B326029@www.boxbackup.org> Note: to view an indiviual ticket, use: https://www.boxbackup.org/trac/ticket/(number) The following is a listing of current problems submitted by Box Backup users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Ticket Owner Component Summary - ------ ------ ------------- ------------------------------------------------------------ n 4 martin box libraries Port Box Backup to AIX n 6 box libraries Contribute code: SMTP client, HTTP server, Database drivers, n 7 box libraries Improve restore speed on local repositories n 8 chris box libraries Improve handling of directories with many files n 13 chris bbackupd Fix file locking on Windows n 14 chris bbackupd Fix large file issues on Windows n 16 chris bbackupquery Restore deleted directories may fail a 17 chris bbackupquery List files using wildcards a 20 chris bbackupctl bbackupctl reload reports prior settings n 45 ben bbackupd File diff performance patch (reduced disk IO and wall time n 46 chris bbackupd bbackupd only ever saves reverse diffs, corrupted files on s n 47 chris bbackupd Account numbers greater than 2^31 (0x7fffffff) do not work c n 48 chris bbackupd Locations that don't exist on first run are never tried agai n 49 chris bbackupd ID map (rename tracking) broken since [288] n 50 chris bbackupquery No way to capture stderr under Windows n 51 chris bbackupd No way to force bbackupd to re-upload files under Windows n 52 chris bbackupd Unable to control the maintenance of old vs. deleted files n 53 chris bbackupd Comparing root directory locations does not work under Windo n 54 chris bbackupd Locations not found on disk (e.g. unmounted filesystems) can n 55 chris bbackupd Should store and preserve directory timestamps 20 tickets total. From boxbackup-dev at boxbackup.org Thu Jul 23 22:53:54 2009 From: boxbackup-dev at boxbackup.org (Stewart Adam) Date: Thu, 23 Jul 2009 17:53:54 -0400 Subject: [Box Backup-dev] Self-introduction Message-ID: <4A68DBF2.2070304@diffingo.com> Hi, First of all, I wanted to mention how impressed I am with Box Backup. Keep up the good work! My name is Stewart Adam and I'm the developer of fwbackups, a user backup program. So far fwbackups has been a wrapper to other utilities (rsync+tar) with a GUI built around it, but I have been slowly rewriting fwbackups into C++ to add some more features and make it more efficient than my current implementation in Python. One of my main goals for fwbackups was to make a user backup program that was very easy to use, but not to sacrifice any features or flexibility in the process of doing so. I came accross Box Backup the other day, and it already does much of what I wanted to accomplish [1] with the C++ rewrite of fwbackups. There are a few thing I'm looking at extending in order to make box backup satisfy my project goals: * A cross-platform GUI interface that could administrate backup configurations and jobs or perform file restores * Support for backups to local folders (and therefore any mounted drives) & optical media * Support for multiple users on a single machine (still need to investigate the possibility of running multiple non-root daemons) * Support for multiple backup configurations per user (data can be grouped and can have different settings, or be uploaded to different box backup servers) Would this something something that the development team is interested in collaborating on? Regards, Stewart [1] http://www.diffingo.com/oss/fwbackups/features From boxbackup-dev at boxbackup.org Thu Jul 23 22:57:52 2009 From: boxbackup-dev at boxbackup.org (Chris Wilson) Date: Thu, 23 Jul 2009 22:57:52 +0100 (BST) Subject: [Box Backup-dev] Self-introduction In-Reply-To: <4A68DBF2.2070304@diffingo.com> References: <4A68DBF2.2070304@diffingo.com> Message-ID: Hi Stewart, On Thu, 23 Jul 2009, Stewart Adam wrote: > First of all, I wanted to mention how impressed I am with Box Backup. > Keep up the good work! Thanks, and welcome to the group :) > * A cross-platform GUI interface that could administrate backup > configurations and jobs or perform file restores Have a look at Boxi. > * Support for backups to local folders (and therefore any mounted drives) & > optical media Has been discussed many times, not really hard to implement, but you can also just run bbstored on the same machine as bbackupd. > * Support for multiple users on a single machine (still need to investigate > the possibility of running multiple non-root daemons) Already works. They just need different ports, configuration files and data directories. > * Support for multiple backup configurations per user (data can be grouped > and can have different settings, or be uploaded to different box backup > servers) You can have as many locations and/or configuration files per user as you want. > Would this something something that the development team is interested > in collaborating on? Yes. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From boxbackup-dev at boxbackup.org Fri Jul 24 00:08:48 2009 From: boxbackup-dev at boxbackup.org (Stewart Adam) Date: Thu, 23 Jul 2009 19:08:48 -0400 Subject: [Box Backup-dev] Self-introduction In-Reply-To: References: <4A68DBF2.2070304@diffingo.com> Message-ID: <4A68ED80.9090608@diffingo.com> On 7/23/09 5:57 PM, Chris Wilson wrote: Hi Chris, Thanks for the fast response! >> * A cross-platform GUI interface that could administrate backup >> configurations and jobs or perform file restores > > Have a look at Boxi. > >> * Support for backups to local folders (and therefore any mounted >> drives) & optical media > Has been discussed many times, not really hard to implement, but you can > also just run bbstored on the same machine as bbackupd. This is what I was hoping could be worked around... For example, by avoiding the server altogether and adding support for local destinations into the client the average user will be able to backup data to a USB key or another folder much easier than setting up a server, client account and managing certificates just to get some files copied. >> * Support for multiple users on a single machine (still need to >> investigate the possibility of running multiple non-root daemons) > > Already works. They just need different ports, configuration files and > data directories. I'm not sure how large a change this would be, but it would be good to have it working on the same port. A typical user case would be I want to backup data on my 3 machines to the same server using my account, or say my bother, sister and I all have different accounts on the server but use the same computer (so 3 daemons need to share 1 port). >> * Support for multiple backup configurations per user (data can be >> grouped and can have different settings, or be uploaded to different >> box backup servers) > > You can have as many locations and/or configuration files per user as > you want. If I understand correctly, a daemon needs to be started for each configuration file? >> Would this something something that the development team is interested >> in collaborating on? > > Yes. > > Cheers, Chris. Excellent! I'm looking forward to helping work on box backup. In order to get myself familiar with the code are there any small tasks I can start by working on? Regards, Stewart From boxbackup-dev at boxbackup.org Fri Jul 24 13:54:23 2009 From: boxbackup-dev at boxbackup.org (David Sommerseth) Date: Fri, 24 Jul 2009 14:54:23 +0200 Subject: [Box Backup-dev] Soft-RAID support In-Reply-To: References: <4A23A6FF.3080401@topphemmelig.net> <4A2A5B6B.3050501@sommerseths.net> Message-ID: <4A69AEFF.10804@topphemmelig.net> Chris Wilson wrote: > Hi David, > > On Sat, 6 Jun 2009, David Sommerseth wrote: > >> Chris Wilson wrote: >>> Support for it was never finished (no recovery procedure), it is pretty >>> limited (only supports RAID 5 and three devices) and it was written at a >>> time when OS/software and hardware RAID were not as ubiquitous or well >>> supported as they are now. >> >> I would be willing, with some guidance to look into such a tool, if that >> is the main criteria for dropping this support. > > That would definitely be very helpful, thanks in advance. You can read > the encrypted objects (which are reconstructed successfully) and then > rewrite them, which will reestablish the redundant copies. I'll grab the code soon after the holiday season is over, and poke into this. I'd consider the program as a stand alone program somehow, which will do the recreation in a way which you suggest. As a brief quick idea of how I could imagine it: root at host # bbackrecover --source-dir1 /path/to/origdata_part1 \ --source-dir2 /path/to/origdata_part2 \ --recover-dir /path/to/recovered_part3 Only the missing part would then be recovered to the given directory in --recover-dir. Not sure though, if it would be needed to write data to part1 and part2 directories in addition to the already mentioned part3. Does this approach seem sensible? Any special parts in the code you'd recommend me to dig into before I begin to ask more questions? And any of the Box Backup developers available on IRC channels? Is the source code available via a public SCM URL? (git, svn, cvs) >> The soft-raid solution itself seems to work flawlessly and seems to >> only need this recovery tool. Or are there any other issues which is >> not to well known with the soft-raid which should make me worried? >> Are there any critical bugs related to the current implementation? > > No, I don't think so. All of our tests actually run in RAID mode, hence > the "more tested" aspect. However it does impose significant performance > limitations which may prevent me from making some optimisations to > reduce disk I/O in future, and the new refcount database will not be > mirrored, but it can be reconstructed by housekeeping in any case, so > it's more of a cache than a database. That sounds good. Of course I/O requests and performance are more complex when needing to keep control over three streams vs just one. But this optimisation is also depending on how clever the OS is able to spread the tasks. Of course, I do recognise that if all data is on the same device, an OS optimisation should probably be ignored. It could also be that with some syscalls, it's possible to do, at least some of, this optimisation inside BoxBackup (usually done by sorting by inodes of the files being read/written to/from, afaik - considering the inodes for all of these three streams). But in the case of using 3 different devices, the OS is the one which should do the optimisation. >>> I can see your point about the usefulness of this for distributed >>> encrypted backup. However I'm not convinced about the overall merits of >>> storing the data in three separate locations. >> >> Regarding encryption, yes, that is one key element. But if the >> organisation looses one remote storage with the complete backup >> directory, >> it got all the needed information needed to begin to crack the >> encryption. >> If you need minimum 2 sets to be able to crack the encryption, you have >> another layer of security. And it was this combination which caught my >> attention. When you add locally encrypted disks, you have the third >> layer >> of security. > > That's a good point. I don't believe this idea has been proposed before, > and I guess Ben didn't have it in mind when he implemented BB RAID or > when he proposed to remove it. And I who thought this feature was implemented for such a use-case ;-) >>> However I think it may no longer be true. I suspect that few people >>> are using the userland RAID feature in production. If anyone except >>> David is, please speak up! >> >> I would also be interested in hearing others experiences as well! If >> I'm the only one, I agree, it's not much point in continuing this >> support in BoxBackup. Then I would need figure out another way how to >> solve this. I will not continue on this path if soft-raid disappears >> for sure in BoxBackup. > > As you have a good use case for it, I am not planning to remove it in > the near future. However I would be interested in thinking about better > ways to implement this, such as at the OS level. I do think it would be > more efficient, not less, to implement this at the block level in the OS > rather than in Box. Thanks! This sounds good. Yeah, it would be possible to move it to kernel-space. But I'm not sure this would gain too much interest, as you have dmraid and mdraid in kernel already (thinking Linux primarily). To have a file-based soft-raid in addition, might be considered waste of time - and a more difficult case to optimise. Anyway, I'll try to mention it for some kernel fs developers at work. Another reason why not to depend on the OS here, is that this might not be possible or very difficult to implement such feature in all OS supported by Box Backup. > I'm also planning to implement S3 client support in Box Backup fairly > soon, and I expect that most users will move to that as it frees them > from the need to ever buy more disks or take their systems offline for > disk upgrades. Unless we can find a good way to support userland RAID on > top of S3, I expect that these code paths will diverge significantly and > you may find that fewer users use libraidfile at all. I haven't studied the S3 client in general much, and thus I have no idea how the Box Backup implementation for S3 will be. But if the soft-raid code is kept inside Box Backup, it might be easier to setup three remote destinations as well, or one local and two remote. Have you thought about supporting other remote protocols in addition? Like ssh or webdav? For me it sounds like you might plan such a remote layer is located in the bbstored, instead of assigning a local directory you assign a remote - or am I completely wrong? Thanks anyway for a good response! :) kind regards, David Sommerseth From boxbackup-dev at boxbackup.org Fri Jul 24 23:16:04 2009 From: boxbackup-dev at boxbackup.org (Chris Wilson) Date: Fri, 24 Jul 2009 23:16:04 +0100 (BST) Subject: [Box Backup-dev] Self-introduction In-Reply-To: <4A68ED80.9090608@diffingo.com> References: <4A68DBF2.2070304@diffingo.com> <4A68ED80.9090608@diffingo.com> Message-ID: Hi Stewart, On Thu, 23 Jul 2009, Stewart Adam wrote: >>> * Support for backups to local folders (and therefore any mounted >>> drives) & optical media >> >> Has been discussed many times, not really hard to implement, but you >> can also just run bbstored on the same machine as bbackupd. > > This is what I was hoping could be worked around... For example, by > avoiding the server altogether and adding support for local destinations > into the client the average user will be able to backup data to a USB > key or another folder much easier than setting up a server, client > account and managing certificates just to get some files copied. I agree that it would be easier. In theory the client communicates with the server by calling methods on the BackupProtocolClient class (lib/backupclient/autogen_BackupProtocolClient.cpp) which communicates over the network. I think you could modify the Protocol auto-generate script (lib/server/makeprotocol.pl.in) so that it generates a new client class, which for each method instantiates the appropriate server message class, stores the right information in its rProtocol object for it to read, executes it, and reads the response out; and then arrange a simple switch to create an instance of this new BackupProtocolClient class instead of the usual one, depending on some client configuration value, that would achieve what you want to do. >>> * Support for multiple users on a single machine (still need to >>> investigate the possibility of running multiple non-root daemons) >> >> Already works. They just need different ports, configuration files and >> data directories. > > I'm not sure how large a change this would be, but it would be good to > have it working on the same port. A typical user case would be I want to > backup data on my 3 machines to the same server using my account, or say > my bother, sister and I all have different accounts on the server but > use the same computer (so 3 daemons need to share 1 port). If you have the same account then it would make sense to use different locations in the same configuration file. If they use different accounts then they could better be written using separate configuration files. They should all be able to talk to the same server daemon on the same port, so different ports shouldn't be necessary, sorry for the false information. >>> * Support for multiple backup configurations per user (data can be >>> grouped and can have different settings, or be uploaded to >>> different box backup servers) >> >> You can have as many locations and/or configuration files per user as >> you want. > > If I understand correctly, a daemon needs to be started for each > configuration file? Yes, that's correct, is that a big problem? (Given that these are different users in each case, so the daemon should probably be run as the user who controls it in any case). > Excellent! I'm looking forward to helping work on box backup. In order > to get myself familiar with the code are there any small tasks I can > start by working on? Have you had a look at our Trac bug tracker, accessible via the Wiki? There are plenty of tasks there which need doing, some smaller than others. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From boxbackup-dev at boxbackup.org Fri Jul 24 23:29:26 2009 From: boxbackup-dev at boxbackup.org (Chris Wilson) Date: Fri, 24 Jul 2009 23:29:26 +0100 (BST) Subject: [Box Backup-dev] Soft-RAID support In-Reply-To: <4A69AEFF.10804@topphemmelig.net> References: <4A23A6FF.3080401@topphemmelig.net> <4A2A5B6B.3050501@sommerseths.net> <4A69AEFF.10804@topphemmelig.net> Message-ID: Hi David, On Fri, 24 Jul 2009, David Sommerseth wrote: >>>> Support for [software RAID] was never finished (no recovery >>>> procedure), it is pretty limited (only supports RAID 5 and three >>>> devices) and it was written at a time when OS/software and hardware >>>> RAID were not as ubiquitous or well supported as they are now. >>> >>> I would be willing, with some guidance to look into such a tool, if >>> that is the main criteria for dropping this support. >> >> That would definitely be very helpful, thanks in advance. You can read >> the encrypted objects (which are reconstructed successfully) and then >> rewrite them, which will reestablish the redundant copies. > > I'll grab the code soon after the holiday season is over, and poke into > this. I'd consider the program as a stand alone program somehow, which > will do the recreation in a way which you suggest. As a brief quick > idea of how I could imagine it: > > root at host # bbackrecover --source-dir1 /path/to/origdata_part1 \ > --source-dir2 /path/to/origdata_part2 \ > --recover-dir /path/to/recovered_part3 > > Only the missing part would then be recovered to the given directory in > --recover-dir. Not sure though, if it would be needed to write data to > part1 and part2 directories in addition to the already mentioned part3. > Does this approach seem sensible? I'd slightly prefer it if this was integrated with the main bbstoreaccounts utility, perhaps with the existing "check" command. I don't have a very strong objection to creating a new utility, but it seems to naturally belong there. The most obvious implementation would be to completely rewrite each object, which would require touching all three files, even though only one is strictly necessary. The alternative would be to write a utility which requires a deeper understanding of the RAID file format. You might like to consider that as an optimisation for later work, once you have the basic RAID recovery working. > Any special parts in the code you'd recommend me to dig into before I > begin to ask more questions? And any of the Box Backup developers > available on IRC channels? Sorry, I don't do IRC in general, I simply don't have time for it. However if you wanted to have a focused introduction or Q&A session to the code at a specific time and place, I think I could do that for you. > Is the source code available via a public SCM URL? (git, svn, cvs) It's all in Subversion at https://www.boxbackup.org/svn/box/trunk/. >> > The soft-raid solution itself seems to work flawlessly and seems to only >> > need this recovery tool. Or are there any other issues which is not to >> > well known with the soft-raid which should make me worried? Are there >> > any critical bugs related to the current implementation? >> >> No, I don't think so. All of our tests actually run in RAID mode, hence >> the "more tested" aspect. However it does impose significant performance >> limitations which may prevent me from making some optimisations to reduce >> disk I/O in future, and the new refcount database will not be mirrored, >> but it can be reconstructed by housekeeping in any case, so it's more of a >> cache than a database. > > That sounds good. Of course I/O requests and performance are more complex > when needing to keep control over three streams vs just one. But this > optimisation is also depending on how clever the OS is able to spread the > tasks. Of course, I do recognise that if all data is on the same device, an > OS optimisation should probably be ignored. It could also be that with some > syscalls, it's possible to do, at least some of, this optimisation inside > BoxBackup (usually done by sorting by inodes of the files being read/written > to/from, afaik - considering the inodes for all of these three streams). But > in the case of using 3 different devices, the OS is the one which should do > the optimisation. The largest problem that I'm aware of is that a RAID file can't be modified in place, it has to be completely rewritten. This is needlessly intensive on time, disk space and I/O operations, and a good reason to consider soft RAID as a candidate for being replaced by a faster filesystem in some cases. However, I don't know whether or when I will actually do so. Amazon S3 suffers from the same problem (in-place partial updates are not possible). >> As you have a good use case for it, I am not planning to remove it in >> the near future. However I would be interested in thinking about >> better ways to implement this, such as at the OS level. I do think it >> would be more efficient, not less, to implement this at the block >> level in the OS rather than in Box. > > Thanks! This sounds good. Yeah, it would be possible to move it to > kernel-space. But I'm not sure this would gain too much interest, as > you have dmraid and mdraid in kernel already (thinking Linux primarily). > To have a file-based soft-raid in addition, might be considered waste of > time - and a more difficult case to optimise. Anyway, I'll try to > mention it for some kernel fs developers at work. I wasn't actually proposing adding a file-based RAID system to the kernel, although I have considered it in the past as it has many potential advantages. But I don't see why it shouldn't be possible to use block-level kernel RAID to implement what you are intending to do, without requiring any code in Box Backup to support it. > Another reason why not to depend on the OS here, is that this might not > be possible or very difficult to implement such feature in all OS > supported by Box Backup. I think we can, and possibly should, delegate the RAID support to the OS, where the code is heavily tested, used by many other applications, and supports more interesting combinations such as generic block devices (e.g. iSCSI and ATAoE) as backends. >> I'm also planning to implement S3 client support in Box Backup fairly >> soon, and I expect that most users will move to that as it frees them >> from the need to ever buy more disks or take their systems offline for >> disk upgrades. Unless we can find a good way to support userland RAID >> on top of S3, I expect that these code paths will diverge >> significantly and you may find that fewer users use libraidfile at >> all. > > I haven't studied the S3 client in general much, and thus I have no idea > how the Box Backup implementation for S3 will be. But if the soft-raid > code is kept inside Box Backup, it might be easier to setup three remote > destinations as well, or one local and two remote. > > Have you thought about supporting other remote protocols in addition? > Like ssh or webdav? For me it sounds like you might plan such a remote > layer is located in the bbstored, instead of assigning a local directory > you assign a remote - or am I completely wrong? You could already use ssh/webdav as a backend, probably without problems, although it may violate POSIX guarantees and thereby lose some of the safety that Box Backup supposedly guarantees by relying on them (e.g. atomic rename over existing files to replace them). When I consider alternative backends, I'm more thinking about other remote filesystem protocols which don't map well onto POSIX semantics. S3 positively sucks in that regard, which makes the implementation very challenging :) Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From boxbackup-dev at boxbackup.org Sat Jul 25 12:00:01 2009 From: boxbackup-dev at boxbackup.org (boxbackup-dev at boxbackup.org) Date: Sat, 25 Jul 2009 12:00:01 +0100 (BST) Subject: [Box Backup-dev] Current open tickets Message-ID: <20090725110001.62AC2326026@www.boxbackup.org> Note: to view an indiviual ticket, use: https://www.boxbackup.org/trac/ticket/(number) The following is a listing of current problems submitted by Box Backup users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Ticket Owner Component Summary - ------ ------ ------------- ------------------------------------------------------------ n 4 martin box libraries Port Box Backup to AIX n 6 box libraries Contribute code: SMTP client, HTTP server, Database drivers, n 7 box libraries Improve restore speed on local repositories n 8 chris box libraries Improve handling of directories with many files n 13 chris bbackupd Fix file locking on Windows n 14 chris bbackupd Fix large file issues on Windows n 16 chris bbackupquery Restore deleted directories may fail a 17 chris bbackupquery List files using wildcards a 20 chris bbackupctl bbackupctl reload reports prior settings n 45 ben bbackupd File diff performance patch (reduced disk IO and wall time n 46 chris bbackupd bbackupd only ever saves reverse diffs, corrupted files on s n 47 chris bbackupd Account numbers greater than 2^31 (0x7fffffff) do not work c n 48 chris bbackupd Locations that don't exist on first run are never tried agai n 49 chris bbackupd ID map (rename tracking) broken since [288] n 50 chris bbackupquery No way to capture stderr under Windows n 51 chris bbackupd No way to force bbackupd to re-upload files under Windows n 52 chris bbackupd Unable to control the maintenance of old vs. deleted files n 53 chris bbackupd Comparing root directory locations does not work under Windo n 54 chris bbackupd Locations not found on disk (e.g. unmounted filesystems) can n 55 chris bbackupd Should store and preserve directory timestamps 20 tickets total. From boxbackup-dev at boxbackup.org Sat Jul 25 12:36:27 2009 From: boxbackup-dev at boxbackup.org (Ben Summers) Date: Sat, 25 Jul 2009 12:36:27 +0100 Subject: [Box Backup-dev] Re: Soft-RAID support Message-ID: <1482F3FB-3B2A-4C7F-AB4B-46E98DD92539@fluffy.co.uk> David Sommerseth wrote: > Chris Wilson wrote: >> Hi David, >> >> On Sat, 6 Jun 2009, David Sommerseth wrote: >> >>> Chris Wilson wrote: >>>> Support for it was never finished (no recovery procedure), it is >>>> pretty >>>> limited (only supports RAID 5 and three devices) and it was >>>> written at a >>>> time when OS/software and hardware RAID were not as ubiquitous or >>>> well >>>> supported as they are now. >>> >>> I would be willing, with some guidance to look into such a tool, >>> if that >>> is the main criteria for dropping this support. >> >> That would definitely be very helpful, thanks in advance. You can >> read >> the encrypted objects (which are reconstructed successfully) and then >> rewrite them, which will reestablish the redundant copies. > > I'll grab the code soon after the holiday season is over, and poke > into > this. I'd consider the program as a stand alone program somehow, > which > will do the recreation in a way which you suggest. As a brief quick > idea > of how I could imagine it: > > root at host # bbackrecover --source-dir1 /path/to/origdata_part1 \ > --source-dir2 /path/to/origdata_part2 \ > --recover-dir /path/to/recovered_part3 > > Only the missing part would then be recovered to the given directory > in > --recover-dir. Not sure though, if it would be needed to write data > to > part1 and part2 directories in addition to the already mentioned > part3. > Does this approach seem sensible? Box Backup is written as a set of interdependent libraries. There's lots of generic code for building UNIX client/server applications, which make it easy to deal with sockets, SSL, communication protocols and so on. And it avoids depending on anything other than OpenSSL. So I had imagined that the raidfile library would be used by other things. (That's why there's a separate raidfile.conf file.) In this scheme, there was going to be another daemon, raidfiled. Stuff which used the raidfile library would just write the file, then offload the conversion to the three stripes to raidfiled. You'll note the sequence of events in writing a raidfile lends itself nicely to this scheme. I had then imagined that errors would be notified to raidfiled, which would do the necessary fixing and alerting. Plus, of course, automatic rebuild if a stripe is lost. Rebuilding a raidfile should be as easy as opening it in read mode, then copying it to a raidfile of the same name in write mode. The difficult bit will be determining whether or not it needs recovery. A trivial approach will be to read everything and see if errors are reported, and this may indeed be the best way. Personally, I don't use the raidfile stuff to stripe any more, but store everything on a ZFS volume with mirroring or RAIDZ. Feels much safer to me. As Chris says, this kind of thing wasn't available when I started the project. Ben