[Box Backup-dev] ProtocolUncertainStream error

Chris Wilson boxbackup-dev at fluffy.co.uk
Thu Aug 7 16:47:22 BST 2008


Hi Pete,

On Thu, 7 Aug 2008, Peter Jalajas, GigaLock Backup Services wrote:

> Aug  7 14:54:34 (none) bbstored[10004]: WARNING: Failed to open file:
> /mnt/bu/backup/10002xxxx/47/02/o2e.difftemp: No such file or directory
> (2)
> 
> /mnt/bu is the jungledisk s3 fuse mount point.

This definitely looks like delayed propagation. We open the same file 
twice in close succession, the first open creates it and the second one 
requires it to already exist. Normal filesystem semantics guarantee that 
this will work, but S3 is not a normal filesystem. In particular:

> Updates to a single key are atomic. For example, if you PUT to an 
> existing key, a subsequent read might return the old data or the updated 
> data, but it will never write corrupted or partial data...
>
> Amazon S3 achieves high availability by replicating data across multiple 
> servers within Amazon's data centers. After a "success" is returned, 
> your data is safely stored. However, information about the changes might not 
> immediately replicate across Amazon S3 and you might observe the 
> following behaviors:
>
> * A process writes a new object to Amazon S3 and immediately attempts to 
>   read it. Until the change is fully propagated, Amazon S3 might report 
>   "key does not exist."
>
> * A process writes a new object to Amazon S3 and immediately lists 
>   keys within its bucket. Until the change is fully propagated, the 
>   object might not appear in the list.

[http://docs.amazonwebservices.com/AmazonS3/2006-03-01/ConsistencyModel.html]

I believe that what is happening is related to this behaviour. What to do 
about it depends on whether the s3fs developers consider this to be a bug 
or a feature of s3fs :)

In my view, s3fs should try hard to provide as close to normal filesystem 
semantics as possible, so after creating a file, if a subsequent access 
discovers that the file does not exist, it should keep retrying until the 
file does exist (or else silently pretend that the file does exist). This 
behaviour would increase application compatibility with s3fs.

On the other hand, if they only want to provide a lightweight layer over 
S3, and don't want to implement complex workarounds for S3's behaviour, 
then each application would have to fix it itself. 

It is possible to do the same workarounds in Box Backup, but it's really 
ugly and I'd be reluctant to either work on it or to merge that code into 
the mainline, unless other important filesystems have similar properties 
(which I doubt). 

My feeling is that the real place for such code in Box Backup would be in 
an alternative storage backend that implements the RaidFile interface and 
stores data in S3. Unfortunately, as we've discussed before and I'm sure 
you know, quite a bit of work is involved in doing that, especially in 
unit-testing it.

> I took the liberty of updating
> https://www.boxbackup.org/trac/wiki/SourceCodeRepository re how to do
> svn up.   You might check my work.

Thanks, looks fine to me!

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\ _/_/_/_//_/___/ | We are GNU : free your mind & your software |



More information about the Boxbackup-dev mailing list