[Box Backup] BadBackupStoreFile

Chris Wilson boxbackup at fluffy.co.uk
Thu Aug 30 20:34:54 BST 2007


Hi Johann,

On Wed, 29 Aug 2007, Johann Glaser wrote:

>>> We have two >2GB files in the backup store. See the bottom of a sorted
>>> listing:
>>> [...]
>>> 748651521 ./cc/02/o1f.rfw
>>> 748679825 ./72/03/o8e.rfw
>>> 748705025 ./3c/03/oa7.rfw
>>> 759350104 ./20/04/o10.rfw
>>> 759356296 ./23/06/o12.rfw
>>> 759552073 ./9e/06/o8f.rfw
>>> 759633897 ./1a/07/obc.rfw
>>> 1529736909 ./ba/o55.rfw
>>> 1539937214 ./69/01/o05.rfw
>>> 2744679666 ./16/01/o5f.rfw
>>> 5133609317 ./f2/o44.rfw
>>
>> Do you have any errors restoring or comparing the other large file?
>
> Yes, there are errors:
> query > compare -E . .
> Local file './__db.002/__db.002' has different contents to store file './__db.002'.
> Local file './__db.003/__db.003' has different contents to store file './__db.003'.
> Local file './__db.004/__db.004' has different contents to store file './__db.004'.
> Local file './__db.005/__db.005' has different contents to store file './__db.005'.
> Local file './__db.006/__db.006' has different contents to store file './__db.006'.
...
> [ 0 (of 5) differences probably due to file modifications after the last upload ]
> Differences: 5 (0 dirs excluded, 0 files excluded)

Are these errors expected, i.e. did those files change since the last 
backup? The message seems to indicate that they did not, and therefore 
another possible bug in 0.10. But it's also possible that Subversion or 
BDB manually changes the timestamps on these files, rendering Box Backups' 
timestamp comparison useless.

> ERROR: (4/48) during file fetch and comparsion for './strings'
> ERROR: (7/41) during file fetch and comparsion for './transactions'
> ERROR: (7/41) during file fetch and comparsion for './uuids'

The 7/41 errors are a symptom of a broken connection (loss of 
synchronisation) after the comparison for ./strings failed, which is 
expected (unfortunately). Please could you try to identify the other large 
file and to compare it separately, to see if you get a 4/48 error? (I'd 
expect so).

> In the directory there are some more files which haven't been mentioned
> in the output above. "strings" is the only large file (7.4GB). All other
> files in this directory are <55MB.

Any idea, then, what the other file over 2GB is? (./f2/o44.rfw)

> PID 28114 was still running with nearly 100% CPU when already at the 
> bbackupquery prompt. Typing "ls" just hang. I had to kill it, just 
> restarting the boxbackup-server didn't stop this task.

That's really bad, sorry. Can you reproduce this?

> Yes, indeed. But I want to state that there are cases where an exception 
> is not an internal bug but another problem, e.g. that a (single) backup 
> store file was deleted or its permissions changed by somebody playing 
> around. Therefore there should be some fault tolerance or graceful error 
> recovery to not endanger the rest of the backup.

The cases that you mention should not cause an exception to be thrown, but 
rather a recoverable error condition. If you think that they are aborting 
the backup, then I'd really appreciate your help to find out why.

>> I agree partly, but I think that we shouldn't have to write "backup 
>> continues" after every error message. It should be safe to say that if 
>> you see a message saying that it stopped because of an exception, then 
>> it did, otherwise it didn't. Perhaps we should document that better.
>
> Good idea.

What Box Backup documentation have you read so far? Do you have an idea 
where the best place to document this would be, so that you would have 
found it if it existed?

>> The path name is converted to ID by taking the two hex digits from each 
>> component, reversing the order (most significant byte is the last one, 
>> before ".rfw") and padding with zeroes on the left. So, for example, 
>> ./cc/02/o1f.rfw is 001f02cc. (I think that's right anyway).
>
> I found its a bit more complicated. For two levels above mentioned file 
> ./f2/o44.rfw belongs to the ID 0000f244. For three levels the 
> translation is ./xx/yy/ozz.rfw -> ID=00yyxxzz.

OK, sorry, you learn something every day :-)

>> You can compare those IDs to the ones given in the remote directory 
>> listings in bbackupquery, but unfortunately there isn't a global 
>> reverse mapping so you need to manually hunt through directories to 
>> find them, sorry.
>
> Hehe, thats a good point to mention a feature request. :-)

OK, added to http://bbdev.fluffy.co.uk/trac/wiki/FeatureRequests. Feel 
free to add your own feature requests there too.

> For some nearly-equally-sized files in the backup store I found that
> they belong to the very same file on the client, so they represent
> different versions. When looking with bbackupquery at old versions, all
> of them have similar large size.
...
> Unfortunately we backup on an external storage (Iomega StorCenter 150D)
> connected with NFS over 100MBit/s which is _extremely_ slow, especially
> for directory listings. So such timeouts might well be the problem.
>
> I found this option in bbackupd.conf. Which unit is used for the time?
> Seconds? Milliseconds?

Units are seconds. Where did you look for this information, i.e. where 
should we improve the documentation?

> Another feature request: The backup server should (additionally) store 
> checksums across large blocks of files, e.g. 1MB blocks. Then only these 
> checksums need to be read from disk instead of the whole backup file.

I have a feeling that we already do, but I'm not 100% sure. Ben, do you 
know?

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |



More information about the Boxbackup mailing list