From boxbackup-dev at fluffy.co.uk Tue Apr 10 22:30:11 2007 From: boxbackup-dev at fluffy.co.uk (Martin Ebourne) Date: Tue, 10 Apr 2007 22:30:11 +0100 Subject: [Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files In-Reply-To: References: <232035.70740.qm@web36708.mail.mud.yahoo.com> Message-ID: <1176240611.5579.18.camel@avenin.ebourne.me.uk> On Tue, 2007-04-10 at 20:29 +0100, Chris Wilson wrote: > Ben, I'd be happy to implement this (especially if Gary can provide his > quick and dirty code). What do you say, should we have it for the next > version (post 0.11)? Maybe as an optional feature, off by default? We > could store the MD5 checksum as an optional extended attribute, so it > would be backwards compatible, or else have a special magic value of the > MD5 field (maybe all zeroes) to indicate that no checksum has been stored. FWIW I'm undecided on this. I don't know what data-set Gary has run it on, but 15% overhead doesn't sound very realistic to me. Bearing in mind it will have to read the full contents of every file on every scan (unless you got funky and just checksumed a portion of the filesystem on each scan, which has its own risks), rather than just the directories, I think the overhead will be a lot higher. On a fairly full 50GB disk that would be 50GB to read on every scan, rather than probably 500MB. Sure we can make it optional and all these other things, but every option has its cost, and especially in backup software you want to be really conservative, an optional code path by definition will get a lot less testing than a mandatory one and hence more risky. Clearly file notification is the way to go, and any platform of any interest supports it these days. Having said that we'll still need a full scan recovery mode for cases where the file notification was not running and checksumming the data would give ultimate peace of mind. Given that it would only be fallback the cost wouldn't be so important in this case. (I am envisaging eg. if bbackupd is started as part of the bootup sequence and shutdown similarly there'd be no need for a scan, rather like a clean mount/unmount, but if the machine or bbackupd crashed a scan would be necessary.) Maybe have a normal full scan and a checksum full scan and an option that says every N scans is a checksum scan. This should encourage everyone to run the code so it gets lots of testing. Paranoid people and file notification people could set N=1 and others could set it higher to avoid the overhead. Don't have a way of switching it off, but people could always set N very high. There's no overhead on the initial backup because the data all has to read and encrypted, etc anyway. > If necessary, we can run the entire bbackupd test suite twice, once with > MD5 checksums enabled and once with them disabled, to make sure that > nothing gets broken either way. Otherwise, I can just write a test that > uses utimes(). If we have checksums they should be mandatory as I mentioned, to help improve reliability. I think a specific test of the checksum should be enough to check it works. Cheers, Martin. From boxbackup-dev at fluffy.co.uk Tue Apr 10 23:48:08 2007 From: boxbackup-dev at fluffy.co.uk (G.) Date: Tue, 10 Apr 2007 15:48:08 -0700 (PDT) Subject: [Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files Message-ID: <964035.23851.qm@web36703.mail.mud.yahoo.com> Martin, > FWIW I'm undecided on this. I don't know what data-set Gary has run it > on, but 15% overhead doesn't sound very realistic to me. Bearing in mind A quad-core QX6700 test server, striping on 10K Raptor drives, around 500GB backup set with 10% to ~25% typical change cycle. You are certainly right that this is a "your mileage will vary" type of situation here, and I was taking only approximate masurements, but that's what I was seeing: no overly significant slow-down (maybe it's because I spend a lot of time in large file diffs anyway). Checksum information already available for comparison in bbackupd in-memory cache and used for folder-level checking made things tolerable. It's easy to check with a simple ad-hoc piece of code before committing to an implementation. > Clearly file notification is the way to go, and any platform of any > interest supports it these days. Having said that we'll still need a > Maybe have a normal full scan and a checksum full scan and an option > that says every N scans is a checksum scan. This should encourage I don't think that's the way to go. MD5s not only provide enhanced change detection, but also remote store content verification on every backup cycle. Depending on media rotation strategy, bbstored soft limit settings, etc. one might have to treat the very last backup and the only usable backup. At any rate, there are situations when either a backup is 100% verified, or counts for no backup at all... Gary ____________________________________________________________________________________ Don't get soaked. Take a quick peak at the forecast with the Yahoo! Search weather shortcut. http://tools.search.yahoo.com/shortcuts/#loc_weather From boxbackup-dev at fluffy.co.uk Wed Apr 11 00:53:06 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Wed, 11 Apr 2007 00:53:06 +0100 (BST) Subject: [Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files In-Reply-To: <964035.23851.qm@web36703.mail.mud.yahoo.com> References: <964035.23851.qm@web36703.mail.mud.yahoo.com> Message-ID: Hi Gary, On Tue, 10 Apr 2007, G. wrote: > I don't think that's the way to go. MD5s not only provide enhanced > change detection, but also remote store content verification on every > backup cycle. I'm afraid not, that's a different problem. We would store checksums of the raw data, not the encrypted data, and so it would have to be downloaded and decrypted by the client (just like compare -a) to verify it properly. Since this is what compare -a already does, there would be no benefit here. Checksums of the encrypted data have been discussed, but that would require new commands on the server to return the IV and checksum of each block, which is more complex again. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | From boxbackup-dev at fluffy.co.uk Wed Apr 11 01:54:57 2007 From: boxbackup-dev at fluffy.co.uk (G.) Date: Tue, 10 Apr 2007 17:54:57 -0700 (PDT) Subject: [Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files Message-ID: <53750.11172.qm@web36713.mail.mud.yahoo.com> Chris, >> I don't think that's the way to go. MD5s not only provide enhanced >> change detection, but also remote store content verification on every >> backup cycle. >I'm afraid not, that's a different problem. We would store checksums of >the raw data, not the encrypted data, and so it would have to be >downloaded and decrypted by the client (just like compare -a) to verify it >properly. Since this is what compare -a already does, there would be no >benefit here. The plain-text MD5 checksum in question (I mean one MD5 checksum per file, not one MD5 checksum per block) could become a part of a file's attribute stream, which is already decrypted and used for comparison of file attributes. It could also be used to generate/compare stronger folder-level checksum. So, you can just use such an MD5 directly to verify whether last-known file or folder content on the server matches file or folder content locally. Much less work than compare -aq, since there is no need to compare checksums on block-by-block basis, and there is no need to re-download MD5s in the first place (they are all already in-memory after a backup cycle, and could be preserved by StoreObjectInfoFile; if not, bbackupd already downloads all file attribute stream objects anyway the first time around). (... do I remember this correctly, or did I screw up again...?) ;) > Checksums of the encrypted data have been discussed, but that would > require new commands on the server to return the IV and checksum of each > block, which is more complex again. Yup, you are absolutely right. I should have been more clear; I meant "remote checksum to local checksum" compare, not "remote checksum to remote disk content" (bbstored RAID-style integrity check) compare. Gary ____________________________________________________________________________________ Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html From boxbackup-dev at fluffy.co.uk Thu Apr 12 21:38:02 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Thu, 12 Apr 2007 21:38:02 +0100 (BST) Subject: [Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files In-Reply-To: <53750.11172.qm@web36713.mail.mud.yahoo.com> References: <53750.11172.qm@web36713.mail.mud.yahoo.com> Message-ID: Hi Gary, On Tue, 10 Apr 2007, G. wrote: > The plain-text MD5 checksum in question (I mean one MD5 checksum per > file, not one MD5 checksum per block) could become a part of a file's > attribute stream, which is already decrypted and used for comparison of > file attributes. It could also be used to generate/compare stronger > folder-level checksum. So, you can just use such an MD5 directly to > verify whether last-known file or folder content on the server matches > file or folder content locally. Much less work than compare -aq, since > there is no need to compare checksums on block-by-block basis, and there > is no need to re-download MD5s in the first place (they are all already > in-memory after a backup cycle, and could be preserved by > StoreObjectInfoFile; if not, bbackupd already downloads all file > attribute stream objects anyway the first time around). I'd say it uses less Internet bandwidth use than compare -a, but not less CPU or disk activity. We can't cache the checksums of local files on disk, otherwise we'd have the same problem that we do now :-( compare -aq does not compare checksums of anything, as far as I know, it just checks the file attributes. compare -a checks the file contents. A new mode might be compare -ac, which checks the checksums of remote blocks against their re-encrypted local block checksums. >> Checksums of the encrypted data have been discussed, but that would >> require new commands on the server to return the IV and checksum of each >> block, which is more complex again. > > Yup, you are absolutely right. I should have been more clear; I meant > "remote checksum to local checksum" compare, not "remote checksum to > remote disk content" (bbstored RAID-style integrity check) compare. Actually I didn't mean "remote checksum to remote disk content", but rather "local disk checksums to remote disk checksums", which is almost exactly as strong as "local disk data to remote disk data". If the encrypted block checksums are stored on the server rather than recomputed when necessary, this would be weaker (local disk checksums to remove saved checksums) but require less resources on the server. I think that the mode you describe, "remote checksum to remote disk content" would be better achieved by the client uploading the unencrypted checksum of the encrypted data, which is saved by the server as an unencrypted attribute, which bbstoreaccounts check can reverify at any time. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | From boxbackup-dev at fluffy.co.uk Thu Apr 12 23:56:55 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Thu, 12 Apr 2007 23:56:55 +0100 (BST) Subject: [Box Backup-dev] Re: [Box Backup-commit] #20: bbackupctl reload reports prior settings, 0.10 Win vchris_general_1280 In-Reply-To: <051.5677634ad5b36fd056c5b684f5f13b55@fluffy.co.uk> References: <051.5677634ad5b36fd056c5b684f5f13b55@fluffy.co.uk> Message-ID: Hi all, On Thu, 12 Apr 2007, trac at fluffy.co.uk wrote: > Change bbackupd.conf timing settings. > Run bbackupctl reload. > The "Daemon configuration summary" incorrectly reports the settings from > before the changes. Running it again reports the correct settings. > > Used notepad to make changes to bbackupd.conf to return timers from short > test settings to longer default settings. Then ran bbackupctl reload to > set them: > > C:\Program Files\Box Backup>bbackupctl.exe reload > > Using configuration file C:\Program Files\Box Backup\bbackupd.conf > Daemon configuration summary: > AutomaticBackup = true > UpdateStoreInterval = 3 seconds > MinimumFileAge = 4 seconds > MaxUploadWait = 24 seconds > Succeeded. > > C:\Program Files\Box Backup>bbackupctl.exe reload > > Using configuration file C:\Program Files\Box Backup\bbackupd.conf > Daemon configuration summary: > AutomaticBackup = true > UpdateStoreInterval = 3600 seconds > MinimumFileAge = 21600 seconds > MaxUploadWait = 86400 seconds > Succeeded. This is an interesting one. The daemon's settings are sent in the summary line, which is sent before the reload takes effect. Why are the settings sent on each command? Couldn't we have an "info" command which returns them, which is sent by "bbackupctl reload" after the reload command? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | From boxbackup-dev at fluffy.co.uk Fri Apr 13 12:18:24 2007 From: boxbackup-dev at fluffy.co.uk (Ben Summers) Date: Fri, 13 Apr 2007 12:18:24 +0100 Subject: [Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files Message-ID: On Thu, 12 Apr 2007 21:38:02, Chris Wilson wrote: > compare -aq does not compare checksums of anything, as far as I > know, it > just checks the file attributes. compare -a checks the file > contents. A > new mode might be compare -ac, which checks the checksums of remote > blocks > against their re-encrypted local block checksums. Wrong. compare -q does check the checksums. Here's the code, run for each file: // Compare file -- fetch it mrConnection.QueryGetBlockIndexByID(i->second->GetObjectID()); // Stream containing block index std::auto_ptr blockIndexStream(mrConnection.ReceiveStream()); // Compare equal = BackupStoreFile::CompareFileContentsAgainstBlockIndex (localName.c_str(), *blockIndexStream, mrConnection.GetTimeout()); Ben From boxbackup-dev at fluffy.co.uk Fri Apr 13 13:47:03 2007 From: boxbackup-dev at fluffy.co.uk (G.) Date: Fri, 13 Apr 2007 05:47:03 -0700 (PDT) Subject: [Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files Message-ID: <20070413124703.43763.qmail@web36702.mail.mud.yahoo.com> Chris, > I'd say it uses less Internet bandwidth use than compare -a, but not less > CPU or disk activity. Eliminating "compare -aq" block-level checksum download requirement speeds up the entire verification process by an order of magnitude. If I recall correctly, calculating one MD5 hash value for an entire file and using it for a comparison is also significantly faster than calculating multiple block-by-clock hash values and making multiple block-by-clock comparisons. There is also the half-way option of pre-downloading, caching, and persisting (StoreObjectInfoFile) all remote block-level checksum information, instead of generating somewhat redundant MD5s. Not too elegant, though. > We can't cache the checksums of local files on disk, otherwise we'd have > the same problem that we do now :-( I didn't catch that one... We already cache file attribute information locally (in-memory, and preserved by StoreObjectInfoFile) to be able to use it for change detection as well (folder-level checksum algorithm takes it into consideration). > compare -aq does not compare checksums of anything, as far as I know, it Beg to differ here, Chris... > I think that the mode you describe, "remote checksum to remote disk > content" would be better achieved by the client uploading the unencrypted > checksum of the encrypted data, which is saved by the server as an > unencrypted attribute, which bbstoreaccounts check can reverify at any > time. Ok, let's forget my remote content verification idea for the moment (since I'm getting confused here ;)). --- So, it's the plaintext MD5 as a part of a file attribute stream vs. pre-caching block-level checksum information vs. inode notification. However, I think we do need an option to not only 100% guarantee change detection, but also remote content verification during each backup cycle. I would personally accept a sacrifice of even 50% of performance (who cares, the thing runs in the wee hours of the morning anyway and takes hours already) to be absolutely sure that once a backup cycle completes successfully, remote content matches local content, "beyond reasonable doubt" :). Gary __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From boxbackup-dev at fluffy.co.uk Tue Apr 17 12:10:16 2007 From: boxbackup-dev at fluffy.co.uk (Ben Summers) Date: Tue, 17 Apr 2007 12:10:16 +0100 Subject: [Box Backup-dev] Trac spam Message-ID: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> So, it's happened. http://bbdev.fluffy.co.uk/trac/ticket/2 How do we delete the crap? Ben From boxbackup-dev at fluffy.co.uk Tue Apr 17 12:50:57 2007 From: boxbackup-dev at fluffy.co.uk (Stuart Hickinbottom) Date: Tue, 17 Apr 2007 12:50:57 +0100 Subject: [Box Backup-dev] Trac spam In-Reply-To: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> References: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> Message-ID: <4624B4A1.5010106@hickinbottom.demon.co.uk> http://trac-hacks.org/wiki/TicketDeletePlugin I've used an older version of it (which allows deletion of whole tickets only, not individual comments), and it was fine. The current version claims to support individual comment deletion. I've been doing it through SQL myself, but I wouldn't recommend it! Stuart Ben Summers wrote: > > So, it's happened. > > http://bbdev.fluffy.co.uk/trac/ticket/2 > > How do we delete the crap? > > Ben > > > > _______________________________________________ > Boxbackup-dev mailing list > Boxbackup-dev at fluffy.co.uk > http://lists.warhead.org.uk/mailman/listinfo/boxbackup-dev From boxbackup-dev at fluffy.co.uk Tue Apr 17 20:12:11 2007 From: boxbackup-dev at fluffy.co.uk (James O'Gorman) Date: Tue, 17 Apr 2007 20:12:11 +0100 Subject: [Box Backup-dev] Trac spam In-Reply-To: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> References: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> Message-ID: <20070417191211.GF38629@netinertia.co.uk> On Tue, Apr 17, 2007 at 12:10:16PM +0100, Ben Summers wrote: > > So, it's happened. > > http://bbdev.fluffy.co.uk/trac/ticket/2 > > How do we delete the crap? I've deleted it directly from the database. I've also been deleting the accounts that spambots are registering for themselves - I suppose this brings up the question of whether I should disable user self-registration? (i.e an admin would have to create users upon request) James From boxbackup-dev at fluffy.co.uk Tue Apr 17 21:19:41 2007 From: boxbackup-dev at fluffy.co.uk (Martin Ebourne) Date: Tue, 17 Apr 2007 21:19:41 +0100 Subject: [Box Backup-dev] Trac spam In-Reply-To: <20070417191211.GF38629@netinertia.co.uk> References: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> <20070417191211.GF38629@netinertia.co.uk> Message-ID: <1176841181.3539.4.camel@avenin.ebourne.me.uk> On Tue, 2007-04-17 at 20:12 +0100, James O'Gorman wrote: > On Tue, Apr 17, 2007 at 12:10:16PM +0100, Ben Summers wrote: > > > > So, it's happened. > > > > http://bbdev.fluffy.co.uk/trac/ticket/2 > > > > How do we delete the crap? > > I've deleted it directly from the database. > > I've also been deleting the accounts that spambots are registering for > themselves - I suppose this brings up the question of whether I should > disable user self-registration? (i.e an admin would have to create users > upon request) Is it possible to have admin confirmation on new user accounts? Or maybe one of those captcha things, but I generally hate those. Cheers, Martin. From boxbackup-dev at fluffy.co.uk Tue Apr 17 22:34:00 2007 From: boxbackup-dev at fluffy.co.uk (James O'Gorman) Date: Tue, 17 Apr 2007 22:34:00 +0100 Subject: [Box Backup-dev] Trac spam In-Reply-To: <1176841181.3539.4.camel@avenin.ebourne.me.uk> References: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> <20070417191211.GF38629@netinertia.co.uk> <1176841181.3539.4.camel@avenin.ebourne.me.uk> Message-ID: <20070417213359.GG38629@netinertia.co.uk> On Tue, Apr 17, 2007 at 09:19:41PM +0100, Martin Ebourne wrote: > On Tue, 2007-04-17 at 20:12 +0100, James O'Gorman wrote: > > On Tue, Apr 17, 2007 at 12:10:16PM +0100, Ben Summers wrote: > > > > > > So, it's happened. > > > > > > http://bbdev.fluffy.co.uk/trac/ticket/2 > > > > > > How do we delete the crap? > > > > I've deleted it directly from the database. > > > > I've also been deleting the accounts that spambots are registering for > > themselves - I suppose this brings up the question of whether I should > > disable user self-registration? (i.e an admin would have to create users > > upon request) > > Is it possible to have admin confirmation on new user accounts? Or maybe > one of those captcha things, but I generally hate those. I checked, and the TracAccountManager plugin doesn't have this function yet but there do seem to be tickets open against it requesting such features. Anyone have any thoughts as to what we should do in the mean time? James From boxbackup-dev at fluffy.co.uk Tue Apr 17 22:40:51 2007 From: boxbackup-dev at fluffy.co.uk (Martin Ebourne) Date: Tue, 17 Apr 2007 22:40:51 +0100 Subject: [Box Backup-dev] Trac spam In-Reply-To: <20070417213359.GG38629@netinertia.co.uk> References: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> <20070417191211.GF38629@netinertia.co.uk> <1176841181.3539.4.camel@avenin.ebourne.me.uk> <20070417213359.GG38629@netinertia.co.uk> Message-ID: <1176846051.3539.9.camel@avenin.ebourne.me.uk> On Tue, 2007-04-17 at 22:34 +0100, James O'Gorman wrote: > I checked, and the TracAccountManager plugin doesn't have this function > yet but there do seem to be tickets open against it requesting such > features. > > Anyone have any thoughts as to what we should do in the mean time? Since you're the one currently having to fix it I guess it's down to whether you feel the current erasing graffiti approach is too high an overhead or ok at the moment. If it's wasting too much time then the options appear to be add the plugin that lets other people delete the spam so we can spread the load, or add a big notice saying email the admins/group asking for an account. Cheers, Martin. From boxbackup-dev at fluffy.co.uk Tue Apr 17 22:52:15 2007 From: boxbackup-dev at fluffy.co.uk (James O'Gorman) Date: Tue, 17 Apr 2007 22:52:15 +0100 Subject: [Box Backup-dev] Trac spam In-Reply-To: <1176846051.3539.9.camel@avenin.ebourne.me.uk> References: <4E349D8B-2F95-4774-AA36-05B9A65F98D8@fluffy.co.uk> <20070417191211.GF38629@netinertia.co.uk> <1176841181.3539.4.camel@avenin.ebourne.me.uk> <20070417213359.GG38629@netinertia.co.uk> <1176846051.3539.9.camel@avenin.ebourne.me.uk> Message-ID: <20070417215215.GH38629@netinertia.co.uk> On Tue, Apr 17, 2007 at 10:40:51PM +0100, Martin Ebourne wrote: > Since you're the one currently having to fix it I guess it's down to > whether you feel the current erasing graffiti approach is too high an > overhead or ok at the moment. There hasn't been too much, so it's manageable, but it does rely on either a) me regularly checking or b) someone prodding me (either privately or on-list). I've been pretty busy recently so a) is less likely but it's only a quick thing to delete a row from the database so I don't mind people dropping me a note. > If it's wasting too much time then the options appear to be add the > plugin that lets other people delete the spam so we can spread the load, > or add a big notice saying email the admins/group asking for an account. It would probably be better if, in the long run other admins (i.e. you and Ben) and possibly people marked as "developers" within Trac (i.e. Chris, Per) had the ability to remove the spam too. I'll leave this one up to the "real" users (I'm happy to do either) - would people prefer to have an extra plugin to remove the spam or disable user self-registration? James From boxbackup-dev at fluffy.co.uk Wed Apr 18 12:19:17 2007 From: boxbackup-dev at fluffy.co.uk (Ben Summers) Date: Wed, 18 Apr 2007 12:19:17 +0100 Subject: [Box Backup-dev] Trac spam Message-ID: <8C5336E1-A461-4162-8AB6-B6EB96C579ED@fluffy.co.uk> On Tue, 17 Apr 2007 22:52:15 +0100, James O'Gorman wrote: > I'll leave this one up to the "real" users (I'm happy to do either) - > would people prefer to have an extra plugin to remove the spam or > disable user self-registration? I vote for disabling user self-registration, with a big notice explaining why and promising to create accounts promptly. This solves the problem and doesn't require people to do extra work because of spammers. I don't think we get enough new users for this to be a problem. When we do, let's revisit the issue. Ben