From chris at qwirx.com Thu Apr 5 23:00:27 2012 From: chris at qwirx.com (Chris Wilson) Date: Thu, 5 Apr 2012 23:00:27 +0100 (BST) Subject: [Box Backup] No housekeeping? In-Reply-To: References: <1327339595.6225.44.camel@crusty.backed-up.net> Message-ID: Hi Peter, Sorry for the long delay in replying, I somehow missed your email. Your email client doesn't seem to know how to quote messages properly, which means that I have to manually edit them when replying to make them readable to everyone else. On Wed, 1 Feb 2012, Peter Hall wrote: >> It might be. But I did notice that housekeeping was running for three >> days without finishing in your log. The block counts are not updated >> while housekeeping is running, only when it finishes. So is it possible >> that housekeeping did actually finish and remove enough files to bring >> the store back under the soft limit, and you didn't notice that the >> block counts have finally been reduced? > > That sounds plausible. I just checked the server and housekeeping is > still running. Same kind of messages in the log as the one i previously > attached. One file is removed every couple of minutes, which could take > a very long while if there are tens or even hundreds of thousands of > files to be deleted. ? Unfortunately I am now stuck (admittedly self > inflicted) with a full store and unable to upload new backups until > housekeeping finishes. > > Would it be possible to change future versions to update the block count > while running housekeeping? I could do that. If it updated all the time it would be very slow, but I could make it update every five minutes or something like that. > I see very high memory usage as well which I guess is due to boxbackup > having to keep track of deleted files, or something similar? Yes, probably either deleted files (pending full deletion) or the reference count database. About how many files do you actually have, and how much memory does the housekeeping process use? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From peter at preacher.se Thu Apr 5 23:14:35 2012 From: peter at preacher.se (Peter Hall) Date: Fri, 6 Apr 2012 00:14:35 +0200 Subject: [Box Backup] No housekeeping? In-Reply-To: References: <1327339595.6225.44.camel@crusty.backed-up.net> Message-ID: Hi Chris, thanks for the reply. I'll try replying and hopefully the gmail webclient won't make too much of a mess of things. Den 6 april 2012 00:00 skrev Chris Wilson : > Hi Peter, > > Sorry for the long delay in replying, I somehow missed your email. > > Your email client doesn't seem to know how to quote messages properly, > which means that I have to manually edit them when replying to make them > readable to everyone else. > > > On Wed, 1 Feb 2012, Peter Hall wrote: > > It might be. But I did notice that housekeeping was running for three >>> days without finishing in your log. The block counts are not updated while >>> housekeeping is running, only when it finishes. So is it possible that >>> housekeeping did actually finish and remove enough files to bring the store >>> back under the soft limit, and you didn't notice that the block counts have >>> finally been reduced? >>> >> >> That sounds plausible. I just checked the server and housekeeping is >> still running. Same kind of messages in the log as the one i previously >> attached. One file is removed every couple of minutes, which could take a >> very long while if there are tens or even hundreds of thousands of files to >> be deleted. Unfortunately I am now stuck (admittedly self inflicted) with >> a full store and unable to upload new backups until housekeeping finishes. >> >> Would it be possible to change future versions to update the block count >> while running housekeeping? >> > > I could do that. If it updated all the time it would be very slow, but I > could make it update every five minutes or something like that. That would be very appreciated. I ran housekeeping for several weeks withouh it ever finishing. I finally aborted, ran check and fix and it recovered a few hundred megabytes after that. > > > I see very high memory usage as well which I guess is due to boxbackup >> having to keep track of deleted files, or something similar? >> > > Yes, probably either deleted files (pending full deletion) or the > reference count database. About how many files do you actually have, and > how much memory does the housekeeping process use? I can't say for certain about the number of files, this is a clients fileserver that I have not checked through, but there could be a few directories with tens or even hundred throusands files in them. According to top bbstored now uses 65% of available memory, This is the output of 'free': # free -m total used free shared buffers cached Mem: 436 431 5 0 29 5 -/+ buffers/cache: 396 40 Swap: 1961 980 980 A new version that continually > > > Cheers, Chris. > -- > _____ __ _ > \ __/ / ,__(_)_ | Chris Wilson Cambs UK | > / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/**SQL Developer | > \__/_/_/_//_/___/ | We are GNU : free your mind & your software | > > _______________________________________________ > Boxbackup mailing list > Boxbackup at boxbackup.org > http://lists.boxbackup.org/cgi-bin/mailman/listinfo/boxbackup > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at preacher.se Thu Apr 5 23:16:22 2012 From: peter at preacher.se (Peter Hall) Date: Fri, 6 Apr 2012 00:16:22 +0200 Subject: [Box Backup] No housekeeping? In-Reply-To: References: <1327339595.6225.44.camel@crusty.backed-up.net> Message-ID: Sensitive trackpad sent the message a bit early. Den 6 april 2012 00:14 skrev Peter Hall : > Hi Chris, thanks for the reply. I'll try replying and hopefully the gmail > webclient won't make too much of a mess of things. > > Den 6 april 2012 00:00 skrev Chris Wilson : > > Hi Peter, >> >> Sorry for the long delay in replying, I somehow missed your email. >> >> Your email client doesn't seem to know how to quote messages properly, >> which means that I have to manually edit them when replying to make them >> readable to everyone else. >> >> >> On Wed, 1 Feb 2012, Peter Hall wrote: >> >> It might be. But I did notice that housekeeping was running for three >>>> days without finishing in your log. The block counts are not updated while >>>> housekeeping is running, only when it finishes. So is it possible that >>>> housekeeping did actually finish and remove enough files to bring the store >>>> back under the soft limit, and you didn't notice that the block counts have >>>> finally been reduced? >>>> >>> >>> That sounds plausible. I just checked the server and housekeeping is >>> still running. Same kind of messages in the log as the one i previously >>> attached. One file is removed every couple of minutes, which could take a >>> very long while if there are tens or even hundreds of thousands of files to >>> be deleted. Unfortunately I am now stuck (admittedly self inflicted) with >>> a full store and unable to upload new backups until housekeeping finishes. >>> >>> Would it be possible to change future versions to update the block count >>> while running housekeeping? >>> >> >> I could do that. If it updated all the time it would be very slow, but I >> could make it update every five minutes or something like that. > > > That would be very appreciated. I ran housekeeping for several weeks > withouh it ever finishing. I finally aborted, ran check and fix and it > recovered a few hundred megabytes after that. > > >> >> >> I see very high memory usage as well which I guess is due to boxbackup >>> having to keep track of deleted files, or something similar? >>> >> >> Yes, probably either deleted files (pending full deletion) or the >> reference count database. About how many files do you actually have, and >> how much memory does the housekeeping process use? > > > I can't say for certain about the number of files, this is a clients > fileserver that I have not checked through, but there could be a few > directories with tens or even hundred throusands files in them. According > to top bbstored now uses 65% of available memory, This is the output of > 'free': > # free -m > total used free shared buffers cached > Mem: 436 431 5 0 29 5 > -/+ buffers/cache: 396 40 > Swap: 1961 980 980 > > A new version that continually > deletes objects while running housekeeping would be great, it would probably help and if not at least help in debbuging the issue I am having. Thanks and best regards, Peter > > >> >> >> Cheers, Chris. >> -- >> _____ __ _ >> \ __/ / ,__(_)_ | Chris Wilson Cambs UK | >> / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/**SQL Developer | >> \__/_/_/_//_/___/ | We are GNU : free your mind & your software | >> >> _______________________________________________ >> Boxbackup mailing list >> Boxbackup at boxbackup.org >> http://lists.boxbackup.org/cgi-bin/mailman/listinfo/boxbackup >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maddog at mir.com Mon Apr 9 09:29:24 2012 From: maddog at mir.com (Matto Marjanovic) Date: Mon, 09 Apr 2012 01:29:24 -0700 Subject: [Box Backup] out-of-the-blue CipherException? Message-ID: <4F829DE4.9090602@mir.com> Hiya, Do the attached error logs look familiar to anyone? A few days ago, this client started tossing errors into the logfile, and stopped backing up. Neither the client nor the server have had *any* code updates/changes for at least a couple of months (not to boxbackup nor any other code/configuration/libraries). The log snippet below shows the last connection that had transferred any data; all subsequent attempts are "uploaded 0", for the last 6 days. There's nothing exciting in the server log; the server just times out, after getting bored waiting for the vanished client. I'm flummoxed, -m Apr 3 10:37:22 b05s11le Box Backup (bbackupd)[21506]: NOTICE: Beginning scan of local files Apr 3 10:37:22 b05s11le Box Backup (bbackupd)[21506]: NOTICE: About to notify administrator about event backup-start, running script '/etc/boxbackup/bbackupd/notifyadmin backup-start' Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: CipherException(EVPFinalFailure) at CipherContext.cpp(503) Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: ConnectionException(Conn_TLSReadFailed) at SocketStreamTLS.cpp(377) Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: WARNING: Suppressing duplicate notification about backup-error Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: ERROR: Exception caught (Cipher EVPFinalFailure 5/6), reset state and waiting to retry... Apr 3 10:37:57 b05s11le Box Backup (bbackupd)[21506]: NOTICE: File statistics: total file size uploaded 1136352, bytes already on server 951592, encoded size 94055 Apr 3 10:39:38 b05s11le Box Backup (bbackupd)[21506]: NOTICE: Beginning scan of local files Apr 3 10:39:39 b05s11le Box Backup (bbackupd)[21506]: NOTICE: About to notify administrator about event backup-start, running script '/etc/boxbackup/bbackupd/notifyadmin backup-start' Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: CipherException(EVPFinalFailure) at CipherContext.cpp(503) Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: ConnectionException(Conn_TLSReadFailed) at SocketStreamTLS.cpp(377) Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: WARNING: Suppressing duplicate notification about backup-error Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: ERROR: Exception caught (Cipher EVPFinalFailure 5/6), reset state and waiting to retry... Apr 3 10:40:43 b05s11le Box Backup (bbackupd)[21506]: NOTICE: File statistics: total file size uploaded 0, bytes already on server 0, encoded size 0 From maddog at mir.com Mon Apr 9 16:34:17 2012 From: maddog at mir.com (Matto Marjanovic) Date: Mon, 09 Apr 2012 08:34:17 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F829DE4.9090602@mir.com> References: <4F829DE4.9090602@mir.com> Message-ID: <4F830179.1030201@mir.com> On 04/09/12 01:29, Matto Marjanovic wrote: > Hiya, > > Do the attached error logs look familiar to anyone? > > A few days ago, this client started tossing errors into the logfile, > and stopped backing up. Neither the client nor the server have had > *any* code updates/changes for at least a couple of months (not to > boxbackup nor any other code/configuration/libraries). The log snippet Sorry I left this out before, but code in question is (debian): boxbackup-server 0.11~rc8~r2714-1~bpo50+1 boxbackup-client 0.11~rc2-7squeeze1 I suppose I can try updating the client (and server). I kinda wanted to avoid that until I understood what changed on April 3rd. -m > below shows the last connection that had transferred any data; all > subsequent attempts are "uploaded 0", for the last 6 days. > > There's nothing exciting in the server log; the server just times out, > after getting bored waiting for the vanished client. > > I'm flummoxed, > -m > > Apr 3 10:37:22 b05s11le Box Backup (bbackupd)[21506]: NOTICE: Beginning scan of local files > Apr 3 10:37:22 b05s11le Box Backup (bbackupd)[21506]: NOTICE: About to notify administrator about event backup-start, running script '/etc/boxbackup/bbackupd/notifyadmin backup-start' > Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: CipherException(EVPFinalFailure) at CipherContext.cpp(503) > Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt > Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt > Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: ConnectionException(Conn_TLSReadFailed) at SocketStreamTLS.cpp(377) > Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: WARNING: Suppressing duplicate notification about backup-error > Apr 3 10:37:47 b05s11le Box Backup (bbackupd)[21506]: ERROR: Exception caught (Cipher EVPFinalFailure 5/6), reset state and waiting to retry... > Apr 3 10:37:57 b05s11le Box Backup (bbackupd)[21506]: NOTICE: File statistics: total file size uploaded 1136352, bytes already on server 951592, encoded size 94055 > Apr 3 10:39:38 b05s11le Box Backup (bbackupd)[21506]: NOTICE: Beginning scan of local files > Apr 3 10:39:39 b05s11le Box Backup (bbackupd)[21506]: NOTICE: About to notify administrator about event backup-start, running script '/etc/boxbackup/bbackupd/notifyadmin backup-start' > Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: CipherException(EVPFinalFailure) at CipherContext.cpp(503) > Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt > Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: ERROR: SSL error during Read: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt > Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: WARNING: Exception thrown: ConnectionException(Conn_TLSReadFailed) at SocketStreamTLS.cpp(377) > Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: WARNING: Suppressing duplicate notification about backup-error > Apr 3 10:40:33 b05s11le Box Backup (bbackupd)[21506]: ERROR: Exception caught (Cipher EVPFinalFailure 5/6), reset state and waiting to retry... > Apr 3 10:40:43 b05s11le Box Backup (bbackupd)[21506]: NOTICE: File statistics: total file size uploaded 0, bytes already on server 0, encoded size 0 > > _______________________________________________ > Boxbackup mailing list > Boxbackup at boxbackup.org > http://lists.boxbackup.org/cgi-bin/mailman/listinfo/boxbackup From chris at qwirx.com Mon Apr 9 23:09:30 2012 From: chris at qwirx.com (Chris Wilson) Date: Mon, 9 Apr 2012 23:09:30 +0100 (BST) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F829DE4.9090602@mir.com> References: <4F829DE4.9090602@mir.com> Message-ID: Hi Matto, On Mon, 9 Apr 2012, Matto Marjanovic wrote: > Do the attached error logs look familiar to anyone? > > A few days ago, this client started tossing errors into the logfile, and > stopped backing up. Neither the client nor the server have had *any* > code updates/changes for at least a couple of months (not to boxbackup > nor any other code/configuration/libraries). The log snippet below > shows the last connection that had transferred any data; all subsequent > attempts are "uploaded 0", for the last 6 days. > > There's nothing exciting in the server log; the server just times out, > after getting bored waiting for the vanished client. There was a discussion about a similar error in the mailing list archives from 2007, where it appears that the cause (in that case) might have been a corrupted filename entry in the root directory, which couldn't be decrypted. To test this, could you try three things: * run bbackupquery on the client, and use the "ls" command to list files in the root directory, and if it fails, report the full output; * run "bbstoreaccounts check fix" on the account on the store server; * run bbackupd on the client from the command line with the -DV option, which will hopefully produce a full stack trace when it fails, which would be very helpful to know for investigating further. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From maddog at mir.com Tue Apr 10 05:25:37 2012 From: maddog at mir.com (Matto Marjanovic) Date: Mon, 09 Apr 2012 21:25:37 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> Message-ID: <4F83B641.9000106@mir.com> On 04/09/12 15:09, Chris Wilson wrote: > Hi Matto, > > On Mon, 9 Apr 2012, Matto Marjanovic wrote: > >> Do the attached error logs look familiar to anyone? ... > There was a discussion about a similar error in the mailing list > archives from 2007, where it appears that the cause (in that case) ... > To test this, could you try three things: > > * run bbackupquery on the client, and use the "ls" command to list > files in the root directory, and if it fails, report the full > output; No problems with "ls" at the root, "ls xxx" for every xxx in the root directory. > * run "bbstoreaccounts check fix" on the account on the store > server; I ran it without the 'fix', so as to not inadvertantly 'fix' a problem that I/we do not yet understand. The output is basically this: WARNING: Spurious file backup/00000002/refcount.db found WARNING: File ID 0x18f8d has different container ID, probably moved ...[same warning with various File ID's, repeated around 3800 times]... WARNING: File ID 0x68926 has different container ID, probably moved WARNING: Finished checking store account ID 0x00000002: 1 errors found WARNING: No changes to the store account have been made. WARNING: Run again with fix option to fix these errors INFO: Checking store account ID 0x00000002... INFO: Phase 1, check objects... TRACE: Max dir starting ID is 0x6e400 INFO: Phase 2, check directories... INFO: Phase 3, check root... INFO: Phase 4, fix unattached objects... INFO: Phase 5, fix unrecovered inconsistencies... INFO: Phase 6, regenerate store info... I have no idea what the "1 errors found" refers to.... > * run bbackupd on the client from the command line with the -DV > option, which will hopefully produce a full stack trace when it > fails, which would be very helpful to know for investigating > further. Here it is (two of them; the second in a destructor, so it looks like secondary damage): ... TRACE: Upload decision: /[REDACTED]/8.: will not upload (not modified sinWARNING: Excepti on thrown: CipherException(EVPFinalFailure) at CipherContext.cpp(467) ERROR: SSL error while reading: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt ERROR: SSL error while reading: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt WARNING: Exception thrown: ConnectionException(Conn_TLSReadFailed) at SocketStreamTLS.cpp(339) ERROR: Exception caught (Cipher EVPFinalFailure 5/6), reset state and waiting to retry... ce last upload) TRACE: Upload decision: /[REDACTED]/180.: will not upload (not modified since last upload ) TRACE: Upload decision: /[REDACTED]/172.: will not upload (not modified since last upload ) TRACE: Upload decision: /[REDACTED]/83.: will not upload (not modified since last upload) TRACE: Receiving stream, size 345041 bytes TRACE: Obtained 10 stack frames. TRACE: Stack frame 0: bbackupd(DumpStackBacktrace()+0x22) [0x8119b92] TRACE: Stack frame 1: bbackupd(CipherContext::TransformBlock(void*, int, void const*, int)+0x6e8) [0x80eeae8] TRACE: Stack frame 2: bbackupd(BackupStoreFilenameClear::DecryptEncoded(CipherContext&) const+0x62) [0x80b20d2] TRACE: Stack frame 3: bbackupd(BackupStoreFilenameClear::MakeClearAvailable() const+0x1a8) [0x80b2298] TRACE: Stack frame 4: bbackupd(BackupStoreFilenameClear::GetClearFilename() const+0x12) [0x80b2492] TRACE: Stack frame 5: bbackupd(BackupClientDirectoryRecord::UpdateItems(BackupClientDirectoryRecord::SyncParams&, std::string const&, std::string const&, BackupStoreDirectory*, std::vector >&, std::vector >&, std::vector > const&)+0xc6) [0x8071d66] TRACE: Stack frame 6: bbackupd(BackupClientDirectoryRecord::SyncDirectory(BackupClientDirectoryRecord::SyncParams&, long long, std::string const&, std::string const&, bool)+0xec6) [0x8076c06] TRACE: Stack frame 7: bbackupd(BackupClientDirectoryRecord::UpdateItems(BackupClientDirectoryRecord::SyncParams&, std::string const&, std::string const&, BackupStoreDirectory*, std::vector >&, std::vector >&, std::vector > const&)+0x20ac) [0x8073d4c] TRACE: Stack frame 8: bbackupd(BackupClientDirectoryRecord::SyncDirectory(BackupClientDirectoryRecord::SyncParams&, long long, std::string const&, std::string const&, bool)+0xec6) [0x8076c06] TRACE: Stack frame 9: bbackupd(BackupClientDirectoryRecord::UpdateItems(BackupClientDirectoryRecord::SyncParams&, std::string const&, std::string const&, BackupStoreDirectory*, std::vector >&, std::vector >&, std::vector > const&)+0x20ac) [0x8073d4c] TRACE: Obtained 10 stack frames. TRACE: Stack frame 0: bbackupd(DumpStackBacktrace()+0x22) [0x8119b92] TRACE: Stack frame 1: bbackupd(SocketStreamTLS::Read(void*, int, int)+0xac) [0x80e757c] TRACE: Stack frame 2: bbackupd(IOStream::ReadFullBuffer(void*, int, int*, int)+0x50) [0x81062c0] TRACE: Stack frame 3: bbackupd(Protocol::CheckAndReadHdr(void*)+0x1af) [0x80d759f] TRACE: Stack frame 4: bbackupd(Protocol::Receive()+0x1e) [0x80d786e] TRACE: Stack frame 5: bbackupd(BackupProtocolClient::Receive()+0x1d) [0x80b72fd] TRACE: Stack frame 6: bbackupd(BackupProtocolClient::Query(BackupProtocolClientSetClientStoreMarker const&)+0x30) [0x80be860] TRACE: Stack frame 7: bbackupd(BackupClientContext::CloseAnyOpenConnection()+0x12b) [0x8069bdb] TRACE: Stack frame 8: bbackupd(BackupClientContext::~BackupClientContext()+0x19) [0x8069f39] TRACE: Stack frame 9: bbackupd(BackupDaemon::RunSyncNow()+0x1271) [0x8083c21] TRACE: timer: no more events, going to sleep. TRACE: BackupDaemon::NotifySysadmin() called, event = backup-error INFO: About to notify administrator about event backup-error, running script '/etc/boxbackup/bbackupd/notifyadmin backup-error "/etc/boxbackup/bbackupd.conf"' NOTICE: Finished scan of local files NOTICE: File statistics: total file size uploaded 2385022, bytes already on server 2275880, encoded size 89408 TRACE: BackupDaemon::NotifySysadmin() called, event = backup-finish INFO: About to notify administrator about event backup-finish, running script '/etc/boxbackup/bbackupd/notifyadmin backup-finish "/etc/boxbackup/bbackupd.conf"' (Hmm... "BackupStoreFilenameClear::DecryptEncoded(CipherContext&)" sounds kind of like what you were talking about.) Any suggestions on where to probe from here? (I guess I should try and dig up the thread from 2007....) -m From dave at bdisystems.co.uk Tue Apr 10 21:50:36 2012 From: dave at bdisystems.co.uk (dave bamford) Date: Tue, 10 Apr 2012 21:50:36 +0100 Subject: [Box Backup] No housekeeping? In-Reply-To: <1327339595.6225.44.camel@crusty.backed-up.net> References: <1327339595.6225.44.camel@crusty.backed-up.net> Message-ID: <1334091036.3448.312.camel@crusty.backed-up.net> Still getting a problem with the housekeeping process dying and having to restart it manually. here is the extract from running in verbose mode from the log when it falls over > Mar 29 11:02:29 bart bbstored/hk[29681]: WARNING: Exception thrown: RaidFileException(OSError) at RaidFileWrite.cpp(385) > Mar 29 11:02:29 bart bbstored/hk[29681]: ERROR: Failed to delete file: /backups/box/backup/00001004/fe/05/oe6.rfwX: No such file or directory (2) > Mar 29 11:02:29 bart bbstored/hk[29681]: WARNING: Exception thrown: RaidFileException(OSError) at RaidFileWrite.cpp(440) I now have to run a check fix on that account and restart box to get housekeeping back. Regards Dave Bamford On Mon, 2012-01-23 at 17:26 +0000, dave bamford wrote: > Hi Peter > > I had a similar problem and it turned out Housekeeping was aborting on a > corrupt account and never restarting until I restarted bbstored. > Try starting bbstored -v and check the logs > > Regards > > Dave Bamford > > On Mon, 2012-01-23 at 17:43 +0100, Peter Hall wrote: > > Hi fellow boxbackup users, > > > > > > My store is full, and it seems housekeeping isn't running! > > > > > > # bbstoreaccounts info 1 > > Account ID: 0x00000001 > > Last object ID: 0x1106496 > > Used: 243199923 blocks, 927.73 GB, 99% | > > *************** | > > Old files: 3849882 blocks, 14.69 GB, 1% | > > | > > Deleted files: 127698533 blocks, 487.13 GB, 52% |******** > > | > > Directories: 451091 blocks, 1.72 GB, 0% | > > | > > Soft limit: 230400000 blocks, 878.91 GB, 94% | > > *************** | > > Hard limit: 243200000 blocks, 927.73 GB, 100% | > > ****************| > > Client store marker: 17851542 > > > > > > Plenty of deleted files to remove and make space for new ones, but > > it's been like this for days now. > > > > > > I've tried restarting the server, and drastically lowering the > > housekeeping limit: > > # grep -i keep /etc/boxbackup/bbstored.conf > > TimeBetweenHousekeeping = 120 > > > > > > But still no cleaning of the old files. > > > > > > All the references to housekeeping I can find in the logs are: > > client=0x00000001[29150]: WARNING: Reference count database is missing > > or corrupted, creating a new one, expect housekeeping to find and fix > > problems with reference counts later. > > > > > > I have run a fix of the account with "bbstoreaccounts check 1 fix". It > > found and fixed three errors on the first run, subsequent runs find no > > errors. > > > > > > bbstored seems to be doing something though, it takes around 45% > > memory and 50% cpu if I consult 'top'. > > > > > > Is it normal that it can take several days before I see any > > housekeeping progress on a store of this size? > > > > > > Thanks in advance, > > Peter > > _______________________________________________ > > Boxbackup mailing list > > Boxbackup at boxbackup.org > > http://lists.boxbackup.org/cgi-bin/mailman/listinfo/boxbackup > > > _______________________________________________ > Boxbackup mailing list > Boxbackup at boxbackup.org > http://lists.boxbackup.org/cgi-bin/mailman/listinfo/boxbackup From chris at qwirx.com Wed Apr 11 23:08:04 2012 From: chris at qwirx.com (Chris Wilson) Date: Wed, 11 Apr 2012 23:08:04 +0100 (BST) Subject: [Box Backup] No housekeeping? In-Reply-To: <1334091036.3448.312.camel@crusty.backed-up.net> References: <1327339595.6225.44.camel@crusty.backed-up.net> <1334091036.3448.312.camel@crusty.backed-up.net> Message-ID: Hi Dave, On Tue, 10 Apr 2012, dave bamford wrote: > Still getting a problem with the housekeeping process dying and having > to restart it manually. > > here is the extract from running in verbose mode from the log when it > falls over > > >> Mar 29 11:02:29 bart bbstored/hk[29681]: WARNING: Exception thrown: RaidFileException(OSError) at RaidFileWrite.cpp(385) >> Mar 29 11:02:29 bart bbstored/hk[29681]: ERROR: Failed to delete file: /backups/box/backup/00001004/fe/05/oe6.rfwX: No such file or directory (2) >> Mar 29 11:02:29 bart bbstored/hk[29681]: WARNING: Exception thrown: RaidFileException(OSError) at RaidFileWrite.cpp(440) > > I now have to run a check fix on that account and restart box to get housekeeping back. Sorry about that. I'll try to fix it. Which version of Box Backup are you running on the store server? It should be impossible since ages ago that this exception could kill the bbstored housekeeping process. Also, could you try running bbstored with the -V option until housekeeping dies, and send me the stack traces that it outputs from this exception? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From chris at qwirx.com Thu Apr 12 00:05:14 2012 From: chris at qwirx.com (Chris Wilson) Date: Thu, 12 Apr 2012 00:05:14 +0100 (BST) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F83B641.9000106@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> Message-ID: Hi Matto, On Mon, 9 Apr 2012, Matto Marjanovic wrote: > I ran it without the 'fix', so as to not inadvertantly 'fix' a problem > that I/we do not yet understand. Sorry, yes, good idea. > The output is basically this: > > WARNING: Spurious file backup/00000002/refcount.db found [...] > I have no idea what the "1 errors found" refers to.... I think it's the one above. I hope that I've fixed that one already in newer versions of bbstored. > Here it is (two of them; the second in a destructor, so it looks like > secondary damage): Agreed. > Any suggestions on where to probe from here? (I guess I should try and > dig up the thread from 2007....) I don't think the thread from 2007 will help you. In that case, the corrupt filename was at the root level. In your case it's three levels down. I have added some extra debugging code to the trunk, which will hopefully help to identify the faulty filename and its container. Please could you try downloading and building the latest code from trunk, and run a test backup with it, using the -DV option, to see if it does find anything useful? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From maddog at mir.com Thu Apr 12 04:18:27 2012 From: maddog at mir.com (Matto Marjanovic) Date: Wed, 11 Apr 2012 20:18:27 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> Message-ID: <4F864983.7050309@mir.com> On 04/11/12 16:05, Chris Wilson wrote: > Hi Matto, > > On Mon, 9 Apr 2012, Matto Marjanovic wrote: > >> I ran it without the 'fix', so as to not inadvertantly 'fix' a >> problem that I/we do not yet understand. > > Sorry, yes, good idea. > >> The output is basically this: >> >> WARNING: Spurious file backup/00000002/refcount.db found > [...] >> I have no idea what the "1 errors found" refers to.... > > I think it's the one above. I hope that I've fixed that one already > in newer versions of bbstored. ...and last night, I ran "check fix", figuring I had nothing to lose. It did its business, but the Cipher exception remained. The "check fix" seems to have removed the 'spurious' refcount.db file, which caused a complaint when bbstored was started up again --- but it sounds like this may be old news to you. (I must admit I have not looked through the trunk changelogs recently.) ... >> Any suggestions on where to probe from here? (I guess I should try >> and dig up the thread from 2007....) > > I don't think the thread from 2007 will help you. In that case, the > corrupt filename was at the root level. In your case it's three > levels down. > > I have added some extra debugging code to the trunk, which will > hopefully help to identify the faulty filename and its container. > Please could you try downloading and building the latest code from > trunk, and run a test backup with it, using the -DV option, to see if > it does find anything useful? I'll try that out -- it may take a couple of days to report back. In the meantime, could you perhaps elaborate on the nature of the problem a bit more? I got as far into the code as figuring out that some sequence of encrypted filename records are sent from the server, and then decrypted individually, and the decryption is exploding down in openssl. I had not yet figured out how the record stream is sent, nor if the problem is truly in the client or if it could be bad data spit out by the server. -m From chris at qwirx.com Thu Apr 12 09:12:08 2012 From: chris at qwirx.com (Chris Wilson) Date: Thu, 12 Apr 2012 09:12:08 +0100 (BST) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F864983.7050309@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> Message-ID: Hi Matto, On Wed, 11 Apr 2012, Matto Marjanovic wrote: > ...and last night, I ran "check fix", figuring I had nothing to lose. > It did its business, but the Cipher exception remained. The "check fix" > seems to have removed the 'spurious' refcount.db file, which caused a > complaint when bbstored was started up again --- but it sounds like > this may be old news to you. (I must admit I have not looked through > the trunk changelogs recently.) Yeah, it's not supposed to be removed, bbstored will just recreate it next time it runs. > I'll try that out -- it may take a couple of days to report back. Thanks! > In the meantime, could you perhaps elaborate on the nature of the > problem a bit more? I got as far into the code as figuring out that > some sequence of encrypted filename records are sent from the server, > and then decrypted individually, and the decryption is exploding down in > openssl. I had not yet figured out how the record stream is sent, nor > if the problem is truly in the client or if it could be bad data spit > out by the server. The server has a filename stored on its disk, in one of the directory files, which the client can't decrypt. The client needs to decrypt all filenames in a directory on the server to know whether it needs to create a new one for a local file that didn't exist before, update an existing one keeping history, or delete one that no longer exists on the client. The corruption might originally have happened on either the server or the client, probably while the directory was in memory a neutrino struck a RAM chip and flipped one of the bits, and the corrupted name was written back out to disk on the server. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From maddog at mir.com Sat Apr 14 23:14:35 2012 From: maddog at mir.com (Matto Marjanovic) Date: Sat, 14 Apr 2012 15:14:35 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> Message-ID: <4F89F6CB.2050908@mir.com> On 04/12/12 01:12, Chris Wilson wrote: > On Wed, 11 Apr 2012, Matto Marjanovic wrote: ... > I have added some extra debugging code to the trunk, which will > hopefully help to identify the faulty filename and its > container. Please could you try downloading and building the latest > code from trunk, and run a test backup with it, using the -DV > option, to see if it does find anything useful? ... Done, and yes, it helped to identify the subdirectory, though not so much the particular file. The directory is in a cyrus mailspool, many levels deep, with hundreds of files. Log output is appended at the end. I added some additional output of my own: the binary block which failed to be decrypted, and the preceding block which was successfully decrypted (in case that would help predicting what comes next). Anyhow: armed with this information, what now? Also: where do I find the blowfish key? I figure I should be able to run those little 8-byte blocks through openssl and see how they fail or succeed, right? >> In the meantime, could you perhaps elaborate on the nature of the >> problem a bit more? I got as far into the code as figuring out that ... > The server has a filename stored on its disk, in one of the directory > files, which the client can't decrypt. The client needs to decrypt > all filenames in a directory on the server to know whether it needs > to create a new one for a local file that didn't exist before, update > an existing one keeping history, or delete one that no longer exists > on the client. > > The corruption might originally have happened on either the server or > the client, probably while the directory was in memory a neutrino > struck a RAM chip and flipped one of the bits, and the corrupted name > was written back out to disk on the server. Sure, blame the hapless (and almost massless) neutrinos. Even if not due to a bug (and I'm not convinced it's the act of elementary particles just yet), the failure mode is pretty heinous. There's got to be a reasonable way to recover from such an error. Heh... in fact: if it is indeed a one-bit cosmic-ray induced error, I should be able to flip the 64 bits in the binary block one at a time until I get something that decodes successfully! I'll try that out (once I get a pointer on how/where to extract the key). -m ps: The tail of the log output: TRACE: Obtained 10 stack frames. TRACE: Stack frame 0: DumpStackBacktrace()+0x22 TRACE: Stack frame 1: CipherContext::TransformBlock(void*, int, void const*, int)+0x9bb TRACE: Stack frame 2: BackupStoreFilenameClear::DecryptEncoded(CipherContext&) const+0x7c TRACE: Stack frame 3: BackupStoreFilenameClear::MakeClearAvailable() const+0x1d0 TRACE: Stack frame 4: BackupStoreFilenameClear::GetClearFilename() const+0x12 TRACE: Stack frame 5: BackupClientDirectoryRecord::DecryptFilename(BackupStoreFilenameClear, long long, std::string const&)+0x2c TRACE: Stack frame 6: BackupClientDirectoryRecord::DecryptFilename(BackupStoreDirectory::Entry*, std::string const&)+0x67 TRACE: Stack frame 7: BackupClientDirectoryRecord::UpdateItems(BackupClientDirectoryRecord::SyncParams&, std::string const&, std::string const&, Location const&, BackupStoreDirectory*, std::vector >&, std::vector >&, std::vector > const&)+0xc3 TRACE: Stack frame 8: BackupClientDirectoryRecord::SyncDirectory(BackupClientDirectoryRecord::SyncParams&, long long, std::string const&, std::string con st&, Location const&, bool)+0x1933 TRACE: Stack frame 9: BackupClientDirectoryRecord::UpdateItems(BackupClientDirectoryRecord::SyncParams&, std::string const&, std::string const&, Location const&, BackupStoreDirectory*, std::vector >&, std::vector >&, std::vector > const&)+0x2077 WARNING: Exception thrown: CipherException(EVPFinalFailure) at CipherContext.cpp(467) ERROR: TransformBlock failed on rEncoded: (size: 10) HEAD: 2a 0| BODY: ed 44 f6 5d db 84 2d cc ERROR: TransformBlock lastEncoded: (size: 10) HEAD: 2a 0| BODY: 68 ee 47 6a fb 11 9f cb ERROR: TransformBlock lastDecoded '1945.' ERROR: Failed to decrypt filename for object 0x3f33 in directory 0x3438 (/local-a7/cyrus-spool/mail/domain/[REDACTED]) ERROR: SSL error while reading: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt ERROR: SSL error while reading: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt From chris at qwirx.com Sun Apr 15 14:13:56 2012 From: chris at qwirx.com (Chris Wilson) Date: Sun, 15 Apr 2012 16:13:56 +0300 (EAT) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F89F6CB.2050908@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> Message-ID: Hi Matto, On Sat, 14 Apr 2012, Matto Marjanovic wrote: > Done, and yes, it helped to identify the subdirectory, though not so > much the particular file. The directory is in a cyrus mailspool, many > levels deep, with hundreds of files. Log output is appended at the end. Yeah, there's not a lot we can do to identify the filename. I've changed the code to treat the corrupted file as nonexistent, and if you run a bbackupd with -DV, then you'll see all the files it decides to upload in that directory with "Upload decision: will upload: not on server". One of those files *was* on the server, but we can't tell which one because the filename is corrupt. > I added some additional output of my own: the binary block which failed > to be decrypted, and the preceding block which was successfully > decrypted (in case that would help predicting what comes next). > > Anyhow: armed with this information, what now? > > Also: where do I find the blowfish key? I figure I should be able to > run those little 8-byte blocks through openssl and see how they fail or > succeed, right? lib/backupclient/BackupClientCryptoKeys.cpp calls BackupStoreFilenameClear::SetBlowfishKey with part of the KeysFile data. The filename encryption key is the first 56 bytes of the KeysFile, and the IV is the next 8 bytes. > Sure, blame the hapless (and almost massless) neutrinos. Looks like I should have said "cosmic rays" instead :) "research[1] has shown that the majority of one-off ("soft") errors in DRAM chips occur as a result of background radiation, chiefly neutrons from cosmic ray secondaries, which may change the contents of one or more memory cells or interfere with the circuitry used to read/write them." "Most primary cosmic rays (those that enter the atmosphere from deep space) are composed of familiar stable subatomic particles that normally occur on Earth, such as protons, atomic nuclei, or electrons." > Even if not due to a bug (and I'm not convinced it's the act of > elementary particles just yet), the failure mode is pretty heinous. > There's got to be a reasonable way to recover from such an error. Sure, there are a practically infinite number of possible errors that have reasonable recovery strategies but which have never been encountered or predicted and therefore never coded around. If we tried to write software so that every possible error condition was handled in the optimum way, we would either never finish anything useful, or cost $10K per line like the code NASA writes for its shuttles. I'm grateful that Ben's code was kind enough to detect this error condition and throw an exception that prevented anything really bad from happening, and helped us to narrow down the cause, rather than silently corrupting data. > Heh... in fact: if it is indeed a one-bit cosmic-ray induced error, I > should be able to flip the 64 bits in the binary block one at a time > until I get something that decodes successfully! I'll try that out > (once I get a pointer on how/where to extract the key). Yes, you can try that. However be aware that: "Although the decryption operation can produce an error if padding is enabled, it is not a strong test that the input data or key is correct. A random block has better than 1 in 256 chance of being of the correct format and problems with the input data earlier on will not produce a final decrypt error." -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From maddog at mir.com Mon Apr 16 05:10:21 2012 From: maddog at mir.com (Matto Marjanovic) Date: Sun, 15 Apr 2012 21:10:21 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> Message-ID: <4F8B9BAD.5010203@mir.com> On 04/15/12 06:13, Chris Wilson wrote: > Hi Matto, > > On Sat, 14 Apr 2012, Matto Marjanovic wrote: > >> Done, and yes, it helped to identify the subdirectory, though not >> so much the particular file. The directory is in a cyrus mailspool, >> many levels deep, with hundreds of files. Log output is appended at >> the end. > > Yeah, there's not a lot we can do to identify the filename. I've > changed the code to treat the corrupted file as nonexistent, and if ... Awesome. I will give the latest trunk a go. ... >> Sure, blame the hapless (and almost massless) neutrinos. > > Looks like I should have said "cosmic rays" instead :) > > "research[1] has shown that the majority of one-off ("soft") errors > in DRAM chips occur as a result of background radiation, chiefly > neutrons from cosmic ray secondaries, which may change the contents ... (Heh, I'm really surprised that it's neutrons and not protons....) ... >> Even if not due to a bug (and I'm not convinced it's the act of >> elementary particles just yet), the failure mode is pretty heinous. >> There's got to be a reasonable way to recover from such an error. > > Sure, there are a practically infinite number of possible errors that > have reasonable recovery strategies but which have never been > encountered or predicted and therefore never coded around. If we ... B-b-but, this is the one error that hit *me*! > I'm grateful that Ben's code was kind enough to detect this error > condition and throw an exception that prevented anything really bad > from happening, and helped us to narrow down the cause, rather than > silently corrupting data. I am grateful for that as well. By "pretty heinous", I was thinking that a single-bit error in many gigabytes of data managed to take down the entire backup operation for days. However, the fix you have added does the trick of recovering from this situation --- and that's what I was hinting at. >> Heh... in fact: if it is indeed a one-bit cosmic-ray induced error, >> I should be able to flip the 64 bits in the binary block one at a >> time until I get something that decodes successfully! I'll try that >> out (once I get a pointer on how/where to extract the key). > > Yes, you can try that. However be aware that: > > "Although the decryption operation can produce an error if padding is > enabled, it is not a strong test that the input data or key is > correct. A random block has better than 1 in 256 chance of being of > the correct format and problems with the input data earlier on will > not produce a final decrypt error." ...but the cleartext is not random at all: I know that it should look something like "NNNN.". And, lo and behold, it *was* a one-bit error! Bit 6 to be exact, and the original file name was "3294.". -m From maddog at mir.com Mon Apr 16 21:40:09 2012 From: maddog at mir.com (Matto Marjanovic) Date: Mon, 16 Apr 2012 13:40:09 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F8B9BAD.5010203@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> Message-ID: <4F8C83A9.6010104@mir.com> On 04/15/12 21:10, Matto Marjanovic wrote: > On 04/15/12 06:13, Chris Wilson wrote: >> Hi Matto, >> >> On Sat, 14 Apr 2012, Matto Marjanovic wrote: >> >>> Done, and yes, it helped to identify the subdirectory, though not >>> so much the particular file. The directory is in a cyrus mailspool, >>> many levels deep, with hundreds of files. Log output is appended at >>> the end. >> >> Yeah, there's not a lot we can do to identify the filename. I've >> changed the code to treat the corrupted file as nonexistent, and if > ... > > Awesome. I will give the latest trunk a go. Hmm... no go: ... TRACE: Upload decision: /local-a7/[REDACTED]/7203.: will upload (not on server) TRACE: Read 4096 bytes at 4096, 4182 remain, eta 0s TRACE: Read 4182 bytes at 8278, 0 remain, eta 0s TRACE: Sending header byte 153 plus 4608 bytes to stream TRACE: Sent 4609 bytes to stream TRACE: Sending header byte 33 plus 33 bytes to stream TRACE: Sent 34 bytes to stream TRACE: Sending end of stream byte TRACE: Sent end of stream byte ERROR: SSL error while reading: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt ERROR: SSL error while reading: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt TRACE: Obtained 10 stack frames. TRACE: Stack frame 0: DumpStackBacktrace()+0x22 TRACE: Stack frame 1: SocketStreamTLS::Read(void*, int, int)+0xbd TRACE: Stack frame 2: IOStream::ReadFullBuffer(void*, int, int*, int)+0x50 TRACE: Stack frame 3: Protocol::CheckAndReadHdr(void*)+0x2df TRACE: Stack frame 4: Protocol::ReceiveInternal()+0x1e TRACE: Stack frame 5: BackupProtocolClient::Receive()+0x20 TRACE: Stack frame 6: BackupProtocolClient::Query(BackupProtocolStoreFile const&, IOStream&)+0x3f TRACE: Stack frame 7: BackupClientDirectoryRecord::UploadFile(BackupClientDirectoryRecord::SyncParams&, std::string const&, std::string const&, BackupStoreFilename const&, long long, long long, long long, bool)+0x1dd TRACE: Stack frame 8: BackupClientDirectoryRecord::UpdateItems(BackupClientDirectoryRecord::SyncParams&, std::string const&, std::string const&, Location const&, BackupStoreDirectory*, std::vector >&, std::vector >&, std::vector > const&)+0x1220 TRACE: Stack frame 9: BackupClientDirectoryRecord::SyncDirectory(BackupClientDirectoryRecord::SyncParams&, long long, std::string const&, std::string const&, Location const&, bool)+0x1933 WARNING: Exception thrown: ConnectionException(Conn_TLSReadFailed) at SocketStreamTLS.cpp(339) ERROR: Failed to upload file: /local-a7/[REDACTED]/7203.: caught exception: Connection TLSReadFailed (Probably a network issue between client and server, or a problem with the server.) (7/34) ... This is the first "will upload" decision made after the undecryptable-filename entry was skipped over (by your latest trunk changes). Another spurious-bit-error stored on the server? Dirty state left in libssl? -m From chris at qwirx.com Tue Apr 17 06:36:00 2012 From: chris at qwirx.com (Chris Wilson) Date: Tue, 17 Apr 2012 08:36:00 +0300 (EAT) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F8C83A9.6010104@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> Message-ID: Hi Matto, On Mon, 16 Apr 2012, Matto Marjanovic wrote: > TRACE: Upload decision: /local-a7/[REDACTED]/7203.: will upload (not on > server) > TRACE: Read 4096 bytes at 4096, 4182 remain, eta 0s > TRACE: Read 4182 bytes at 8278, 0 remain, eta 0s > TRACE: Sending header byte 153 plus 4608 bytes to stream > TRACE: Sent 4609 bytes to stream > TRACE: Sending header byte 33 plus 33 bytes to stream > TRACE: Sent 34 bytes to stream > TRACE: Sending end of stream byte > TRACE: Sent end of stream byte > ERROR: SSL error while reading: error:06065064:digital envelope > routines:EVP_DecryptFinal_ex:bad decrypt > ERROR: SSL error while reading: error:06065064:digital envelope > routines:EVP_DecryptFinal_ex:bad decrypt > TRACE: Obtained 10 stack frames. > TRACE: Stack frame 0: DumpStackBacktrace()+0x22 > TRACE: Stack frame 1: SocketStreamTLS::Read(void*, int, int)+0xbd > TRACE: Stack frame 2: IOStream::ReadFullBuffer(void*, int, int*, int)+0x50 > TRACE: Stack frame 3: Protocol::CheckAndReadHdr(void*)+0x2df > TRACE: Stack frame 4: Protocol::ReceiveInternal()+0x1e > TRACE: Stack frame 5: BackupProtocolClient::Receive()+0x20 > TRACE: Stack frame 6: BackupProtocolClient::Query(BackupProtocolStoreFile > const&, IOStream&)+0x3f > TRACE: Stack frame 7: > BackupClientDirectoryRecord::UploadFile(BackupClientDirectoryRecord::SyncParams&, > std::string const&, std::string const&, BackupStoreFilename const&, long > long, long long, long long, bool)+0x1dd This means that the server send bad data to the client, which it couldn't decrypt. It's not a corrupt filename this time, it's in the SSL stream of protocol communications. It's possible that the server threw an exception in the middle of writing data to the client, in which case you should find it in the server logs. If not, how confident are you about the server and its RAM? Would you run a memory test on it? > Dirty state left in libssl? That's also possible, I'll investigate. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From chris at qwirx.com Wed Apr 18 07:09:30 2012 From: chris at qwirx.com (Chris Wilson) Date: Wed, 18 Apr 2012 09:09:30 +0300 (EAT) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> Message-ID: Hi Matto, On Tue, 17 Apr 2012, Chris Wilson wrote: > This means that the server send bad data to the client, which it > couldn't decrypt. It's not a corrupt filename this time, it's in the SSL > stream of protocol communications. > > It's possible that the server threw an exception in the middle of > writing data to the client, in which case you should find it in the > server logs. > > If not, how confident are you about the server and its RAM? Would you > run a memory test on it? > >> Dirty state left in libssl? > > That's also possible, I'll investigate. Having thought about it some more, I think it's an incredible coincidence that the protocol stream would get corrupted by a random error just after a failed decrypt, so my working hypothesis is dirty state in libssl. It will take me some time to write up proper tests for this. Perhaps you can exclude that directory from your backup in the mean time, to get the backups going again? Or delete the corrupted object from the directory using bbackupquery? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From dave at logical-progress.com Mon Apr 23 09:04:54 2012 From: dave at logical-progress.com (dave bamford) Date: Mon, 23 Apr 2012 09:04:54 +0100 Subject: [Box Backup] Just testing as not receiving mails from list Message-ID: <1335168294.5087.259.camel@millhouse.backed-up.net> Also www.boxbackup.org is not working error message is "Connection terminated unexpectedly" Tried Chrome and epiphany. Dave Bamford From james at netinertia.co.uk Mon Apr 23 12:35:45 2012 From: james at netinertia.co.uk (James O'Gorman) Date: Mon, 23 Apr 2012 12:35:45 +0100 Subject: [Box Backup] Just testing as not receiving mails from list In-Reply-To: <1335168294.5087.259.camel@millhouse.backed-up.net> References: <1335168294.5087.259.camel@millhouse.backed-up.net> Message-ID: <20120423113544.GA2523@netinertia.co.uk> Hi Dave, On Mon, Apr 23, 2012 at 09:04:54AM +0100, dave bamford wrote: > Also www.boxbackup.org is not working > error message is "Connection terminated unexpectedly" Tried Chrome and > epiphany. The web server crashed overnight. I kicked it about 30 mins ago (I was aware of it at 06:30 today thanks to Pingdom but didn't have an opportunity to do anything about it until now). The mailing list is on a different server so if you're not receiving messages it could be for a different reason - might be worth logging in to the Mailman interface and checking if the system has marked you 'nomail' (link is at the bottom of each post). I'd do this for you but I don't have the admin password with me at the moment! James From chris at qwirx.com Sat Apr 28 19:28:01 2012 From: chris at qwirx.com (Chris Wilson) Date: Sat, 28 Apr 2012 19:28:01 +0100 (BST) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> Message-ID: Hi Matto, On Wed, 18 Apr 2012, Chris Wilson wrote: > Having thought about it some more, I think it's an incredible coincidence > that the protocol stream would get corrupted by a random error just after a > failed decrypt, so my working hypothesis is dirty state in libssl. It will > take me some time to write up proper tests for this. I think I've fixed this error now, although I haven't finished repairing the damage that this has done to the tests. Anyway it would be great if you could try out the latest trunk and let me know if it fixes the problem for you. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From maddog at mir.com Sun Apr 29 04:43:36 2012 From: maddog at mir.com (Matto Marjanovic) Date: Sat, 28 Apr 2012 20:43:36 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> Message-ID: <4F9CB8E8.8010901@mir.com> On 04/28/12 11:28, Chris Wilson wrote: > Hi Matto, > > On Wed, 18 Apr 2012, Chris Wilson wrote: > >> Having thought about it some more, I think it's an incredible >> coincidence that the protocol stream would get corrupted by a >> random error just after a failed decrypt, so my working hypothesis >> is dirty state in libssl. It will take me some time to write up >> proper tests for this. > > I think I've fixed this error now, although I haven't finished > repairing the damage that this has done to the tests. Anyway it would > be great if you could try out the latest trunk and let me know if it > fixes the problem for you. Trying to compile trunk now... and getting errors like this: [CXX] BackupClientContext.cpp In file included from BackupClientContext.cpp:23: ../../lib/server/SocketStreamTLS.h:19: error: using typedef-name 'SSL' after 'class' /usr/include/openssl/ossl_typ.h:145: error: 'SSL' has a previous declaration here ../../lib/server/SocketStreamTLS.h:20: error: using typedef-name 'BIO' after 'class' /usr/include/openssl/bio.h:203: error: 'BIO' has a previous declaration here In file included from BackupDaemon.h:25, from BackupClientContext.cpp:27: ../../lib/server/TLSContext.h:14: error: using typedef-name 'SSL_CTX' after 'class' /usr/include/openssl/ossl_typ.h:146: error: 'SSL_CTX' has a previous declaration here make[1]: *** [../../release/bin/bbackupd/BackupClientContext.o] Error 1 I already tried a 'make clean', './configure', etc., though, have not tried to build in a pristine checkout (this is all after a simple 'svn up' to the checkout that I've been working in for the last week+). I'll continue to try to ferret this out myself, but if any bells are ringing, I am listening. (It's got to have something to do with changes to config headers and a missing 'TLS_CLASS_IMPLEMENTATION_CPP'....) -m From maddog at mir.com Sun Apr 29 05:12:57 2012 From: maddog at mir.com (Matto Marjanovic) Date: Sat, 28 Apr 2012 21:12:57 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F9CB8E8.8010901@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> <4F9CB8E8.8010901@mir.com> Message-ID: <4F9CBFC9.2040509@mir.com> On 04/28/12 20:43, Matto Marjanovic wrote: > On 04/28/12 11:28, Chris Wilson wrote: ... >> be great if you could try out the latest trunk and let me know if it >> fixes the problem for you. > > Trying to compile trunk now... and getting errors like this: > > [CXX] BackupClientContext.cpp > In file included from BackupClientContext.cpp:23: > ../../lib/server/SocketStreamTLS.h:19: error: using typedef-name 'SSL' after 'class' > /usr/include/openssl/ossl_typ.h:145: error: 'SSL' has a previous declaration here ... > I'll continue to try to ferret this out myself, but if any bells are ringing, > I am listening. (It's got to have something to do with changes to config > headers and a missing 'TLS_CLASS_IMPLEMENTATION_CPP'....) The breakage is due to r3106 (last commit to trunk). Test run of the client with '-DV' now underway, -m From maddog at mir.com Sun Apr 29 08:31:02 2012 From: maddog at mir.com (Matto Marjanovic) Date: Sun, 29 Apr 2012 00:31:02 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> Message-ID: <4F9CEE36.2010801@mir.com> On 04/28/12 11:28, Chris Wilson wrote: > Hi Matto, > > On Wed, 18 Apr 2012, Chris Wilson wrote: > >> Having thought about it some more, I think it's an incredible >> coincidence that the protocol stream would get corrupted by a >> random error just after a failed decrypt, so my working hypothesis >> is dirty state in libssl. It will take me some time to write up >> proper tests for this. > > I think I've fixed this error now, although I haven't finished > repairing the damage that this has done to the tests. Anyway it would > be great if you could try out the latest trunk and let me know if it > fixes the problem for you. Success! A backup ran to completion --- that took a while. Was the fix indeed the one-line change in r3105? That must have been a pain to find (esp. since it doesn't look like it has anything to do with libssl state). Thank you, -m From chris at qwirx.com Sun Apr 29 20:35:00 2012 From: chris at qwirx.com (Chris Wilson) Date: Sun, 29 Apr 2012 20:35:00 +0100 (BST) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F9CEE36.2010801@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> <4F9CEE36.2010801@mir.com> Message-ID: Hi Matto, On Sun, 29 Apr 2012, Matto Marjanovic wrote: >> I think I've fixed this error now, although I haven't finished >> repairing the damage that this has done to the tests. Anyway it would >> be great if you could try out the latest trunk and let me know if it >> fixes the problem for you. > > Success! A backup ran to completion --- that took a while. Great, thanks for testing and letting me know :) > Was the fix indeed the one-line change in r3105? That must have been a > pain to find (esp. since it doesn't look like it has anything to do with > libssl state). No, 3105 was incidental, I don't think we actually use ZeroStream anywhere critical. 3097 and 3100 were the critical fixes. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From chris at qwirx.com Sun Apr 29 20:39:54 2012 From: chris at qwirx.com (Chris Wilson) Date: Sun, 29 Apr 2012 20:39:54 +0100 (BST) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F9CBFC9.2040509@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> <4F9CB8E8.8010901@mir.com> <4F9CBFC9.2040509@mir.com> Message-ID: Hi Matto, On Sat, 28 Apr 2012, Matto Marjanovic wrote: > On 04/28/12 20:43, Matto Marjanovic wrote: >> On 04/28/12 11:28, Chris Wilson wrote: >> Trying to compile trunk now... and getting errors like this: >> >> [CXX] BackupClientContext.cpp >> In file included from BackupClientContext.cpp:23: >> ../../lib/server/SocketStreamTLS.h:19: error: using typedef-name 'SSL' >> after 'class' >> /usr/include/openssl/ossl_typ.h:145: error: 'SSL' has a previous >> declaration here > ... >> I'll continue to try to ferret this out myself, but if any bells are >> ringing, >> I am listening. (It's got to have something to do with changes to config >> headers and a missing 'TLS_CLASS_IMPLEMENTATION_CPP'....) > > The breakage is due to r3106 (last commit to trunk). Thanks for that, did you revert r3106 to compile and test? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software | From maddog at mir.com Sun Apr 29 21:41:22 2012 From: maddog at mir.com (Matto Marjanovic) Date: Sun, 29 Apr 2012 13:41:22 -0700 Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> <4F9CEE36.2010801@mir.com> Message-ID: <4F9DA772.1000206@mir.com> On 04/29/12 12:35, Chris Wilson wrote: > On Sun, 29 Apr 2012, Matto Marjanovic wrote: > >>> I think I've fixed this error now, although I haven't finished ... >> Success! A backup ran to completion --- that took a while. > > Great, thanks for testing and letting me know :) Hi, Chris, >> Was the fix indeed the one-line change in r3105? That must have >> been a pain to find (esp. since it doesn't look like it has >> anything to do with libssl state). > > No, 3105 was incidental, I don't think we actually use ZeroStream > anywhere critical. 3097 and 3100 were the critical fixes. Ah-ha... that still must have been a pain to find. Thanks again. And, yes, I did revert r3106 to successfully compile and test. -m From chris at qwirx.com Mon Apr 30 09:15:46 2012 From: chris at qwirx.com (Chris Wilson) Date: Mon, 30 Apr 2012 09:15:46 +0100 (BST) Subject: [Box Backup] out-of-the-blue CipherException? In-Reply-To: <4F9DA772.1000206@mir.com> References: <4F829DE4.9090602@mir.com> <4F83B641.9000106@mir.com> <4F864983.7050309@mir.com> <4F89F6CB.2050908@mir.com> <4F8B9BAD.5010203@mir.com> <4F8C83A9.6010104@mir.com> <4F9CEE36.2010801@mir.com> <4F9DA772.1000206@mir.com> Message-ID: Hi Matto, On Sun, 29 Apr 2012, Matto Marjanovic wrote: >>> Was the fix indeed the one-line change in r3105? That must have been a >>> pain to find (esp. since it doesn't look like it has anything to do >>> with libssl state). >> >> No, 3105 was incidental, I don't think we actually use ZeroStream >> anywhere critical. 3097 and 3100 were the critical fixes. > > Ah-ha... that still must have been a pain to find. Thanks again. Yes, it took about 8 hours of debugging to write a test case and step through OpenSSL to work out why it was reporting a connection error that didn't exist, and how to fix it. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer | \__/_/_/_//_/___/ | We are GNU : free your mind & your software |