From boxbackup-dev at fluffy.co.uk Mon May 7 13:05:11 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Mon, 7 May 2007 13:05:11 +0100 (BST) Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) Message-ID: Hi all, Ben and I have been discussing how we could possibly use Sun's Solaris ZFS to provide most of the functionality of Box Backup. We could get the rest with a couple of layers above and below ZFS, which would make Box much simpler and give us cool features like versioning and snapshots. We think we should move this to the -dev list so that we can all discuss it. We're discussing two possible approaches, not necessarily incompatible or either-or, just talking points: Option 1. Any filesystem on client, sync to encrypted filesystem (like encfs) on top of zfs on top of iscsi block device (which is mounted from the server, over the net). Advantages: supports any filesystem on client. Very simple server (effectively a network block device) which holds ZFS images which are managed entirely by the client. The server could in principle mount these images locally, but would see only encrypted filenames and data. Disadvantages: more complex and less efficient than 2. Option 2. Client runs encfs on top of local zfs, synchronises this to remote ZFS periodically, as Ben describes below. Advantages: more efficient (we think), simpler Box code (encrypted filesystem only) Disadvantages: requires kernel support, especially to run ZFS in the kernel in most cases. requires client to run zfs filesystems. The message below is part of our discussion. If anyone is lost or needs more introduction to the issue, please ask. I'll reply to Ben's email below, shortly. Cheers, Chris. -- _ ___ __ _ / __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | ---------- Forwarded message ---------- Date: Mon, 7 May 2007 12:29:46 +0100 From: Ben Summers To: Chris Wilson Subject: Re: Learning from ZFS On 7 May 2007, at 12:01, Chris Wilson wrote: > Hi Ben, > > On Mon, 7 May 2007, Ben Summers wrote: > >> Well amusingly, I can just realised that you can almost implement a >> wonderful Box Backup system without any of the annoying bits using Solaris: >> a combination of zfs and iSCSI can do it quite nicely... but there are two >> missing bits, the encryption and the compression. The encryption should be >> easily solvable with a small bit of kernel code, but the compression is more >> tricky. > > Let me see if I understand this: instead of a bbstored server, you run an > iscsi server? (which i guess you could do as a bit of software running on a > hosted *nix box, although I don't know whether it exists, whether it's secure > or deployable over the Internet). You can always use a tunnel. > > Then, as a client you'd use a mounted zfs filesystem? Yes. > > On Linux you could mount this on a cryptoloop filesystem, or alternatively > run an encrypted filesystem like encfs over the top of zfs. I'm not sure how > possible this is on BSD. I'm sure this is a very solvable problem. > >> So there could be a good model there, or perhaps just make a userland port >> of zfs. > > A userland port of zfs sounds like a great idea. zfs is CDDL and apparently > mixes with BSD licenses without problems, it's just been ported to a BSD > kernel. > > This could simplify the server down to block read/write operations, I think an ideal server should just read/write blocks, but with snapshots so it can remove old stuff. > or enable us to replace it with a commodity or custom iscsi implementation > (or Linux NBD). > > Would it make sense to just encrypt filenames and contents, store them in ZFS > and use its API (including snapshots, versioning, deltas), building all of > that into the bbackupd client? > > Does this layering make sense? > > BackupDaemon > ------------ > BackupClientContext > ------------ > BackupClientDirectoryRecord > ------------ > File name and data encryption > ------------ > ZFS > ------------ > iScsi client > ------------ > Network > ------------ > iScsi server That's certainly one way of doing it, and one which is nice and cross platform with everything in userland. But if we're modifiying ZFS we might as well get it to use our block driver to avoid the dependency on iSCSI and all that entails. The biggest costs of bbackupd are: * scanning directories (various APIs can make this easier, but you'll still need to scan if the daemon is ever not running or misses anything) * rsync * tracking of objects for efficient renames, and not tracking everything to avoid huge costs of doing so. So using ZFS on the client for it's backed up FS avoids the need for any of that, because you do a snapshot and ask for the differences between the snapshots. Here's how it would work: Server exposes a zvol within a ZFS pool as an iSCSI target. Client has all the data on ZFS filesystems. To do a backup, (everything runs on the client): * Snapshot FS (atomic, quick and cheap) * Mount iSCSI target * zfs send | zfs receive to transfer the incremental changes to the server * Dismount the iSCSI target To protect against bugs on the client wiping data from the server, do snapshots of the zvol on the server to an appropriate schedule. Housekeeping on both ends is done by deleting redundant snapshots. Now this doesn't give encryption, so the iSCSI target needs to have some crypto on top. Easy one to solve if your kernel can't do it anyway. Then you have Box Backup! The only thing you don't get is the compression, which is more difficult to just patch on top. ZFS does do compression, but to take advantage of it you'd have to use it on the source FS, which I guess many won't want to do. Since I'm probably going to move to Solaris properly for servers and things, I'm going to try this out. Then see what can be learned and how we can get this into Box Backup for systems which don't use ZFS as their filesystem. We should move this to -dev, I think. You have my permission to post anything in this email. (how old-school netiquette I can be sometimes) Ben !DSPAM:463f0dae95882732111066! From boxbackup-dev at fluffy.co.uk Mon May 7 14:07:02 2007 From: boxbackup-dev at fluffy.co.uk (Stuart Hickinbottom) Date: Mon, 07 May 2007 14:07:02 +0100 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: References: Message-ID: <463F2476.5050607@hickinbottom.com> That's very interesting. I'd read about ZFS previously and seen it had many of the features that we rely on BB to provide at the moment. What I'd be most interested in is some roadmap process that would get us here, with some (I suspect) FAQs about that process. In particular: 1. Will existing stores be portable to the new store format? I'm pretty certain that it's 'no', but that would be a significant issue for existing users who have a large amount of legacy remote data and slow data links (eg the internet). Perhaps it would at least be possible to convert an existing store in a one-way process? 2. What clients and server OS's would be supported? I see there's work-in-progress for ZFS over FUSE for Linux, but would this be supported for both client and server, and what about other OS's? 3. The current network protocol, whilst it has had its share of problems with things like timeouts and large files, does work quite well over unreliable links such as the internet. Would a reliance on ZFS mounted remotely over such links cause problems with reliability and recovery when those links drop connections? Perhaps there are no problems, but I've tried tunnelling network filesystems before and have had such issues (principally, /cringe/ when I was trying to do that with Samba). 4. Is this independent on any work to release 0.11 in the near future? What is the driver for such a significant departure, I wonder? Whilst, of course, the development is going to go in whatever direction the developers would like, why is the current architecture considered in need of replacement? There has now been a lot of testing and usage of BB over a number of years and that would be effectively cancelled for a new architecture based on ZFS (and, indeed, ports of ZFS will come with their own difficulties since they're quite new and not widely used). Don't get me wrong - I'm not against such a departure at all and it would give me a chance to play with some new toys, but I'm interested in whether there's a reason other than that it will be a nice change for us all. Stuart Chris Wilson wrote: > Hi all, > > Ben and I have been discussing how we could possibly use Sun's Solaris > ZFS to provide most of the functionality of Box Backup. We could get > the rest with a couple of layers above and below ZFS, which would make > Box much simpler and give us cool features like versioning and snapshots. > > We think we should move this to the -dev list so that we can all > discuss it. > > We're discussing two possible approaches, not necessarily incompatible > or either-or, just talking points: > > Option 1. Any filesystem on client, sync to encrypted filesystem (like > encfs) on top of zfs on top of iscsi block device (which is mounted > from the server, over the net). > > Advantages: supports any filesystem on client. Very simple server > (effectively a network block device) which holds ZFS images which are > managed entirely by the client. The server could in principle mount > these images locally, but would see only encrypted filenames and data. > > Disadvantages: more complex and less efficient than 2. > > Option 2. Client runs encfs on top of local zfs, synchronises this to > remote ZFS periodically, as Ben describes below. > > Advantages: more efficient (we think), simpler Box code (encrypted > filesystem only) > > Disadvantages: requires kernel support, especially to run ZFS in the > kernel in most cases. requires client to run zfs filesystems. > > The message below is part of our discussion. If anyone is lost or > needs more introduction to the issue, please ask. I'll reply to Ben's > email below, shortly. > > Cheers, Chris. From boxbackup-dev at fluffy.co.uk Mon May 7 19:42:39 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Mon, 7 May 2007 19:42:39 +0100 (BST) Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: <463F2476.5050607@hickinbottom.com> References: <463F2476.5050607@hickinbottom.com> Message-ID: Hi Stuart, > That's very interesting. I'd read about ZFS previously and seen it had > many of the features that we rely on BB to provide at the moment. > > What I'd be most interested in is some roadmap process that would get us > here, with some (I suspect) FAQs about that process. Right now might be a bit early for a roadmap, given that we don't know which road we're taking :-) but I think it's a good idea and we should produce one (and a FAQ). > In particular: > > 1. Will existing stores be portable to the new store format? I'm pretty > certain that it's 'no', but that would be a significant issue for existing > users who have a large amount of legacy remote data and slow data links (eg > the internet). Perhaps it would at least be possible to convert an existing > store in a one-way process? I think the answer would be "No" as well. If we shift to a ZFS filesystem on the server, we could provide an upgrade path, something like creating a new account (with a new certificate) and a client program to copy the old accounts contents to the new one. That would probably not be too hard to test and be confident about. Sharing both data formats in the same store, or upgrading on the server without the client's keys, would probably be much harder to test and to get right. > 2. What clients and server OS's would be supported? I see there's > work-in-progress for ZFS over FUSE for Linux, but would this be > supported for both client and server, and what about other OS's? I'd like to support a cross-platform solution. My proposal would involve replacing Box Backup's filesystem code with the ZFS code, running in user space and tightly integrated, so that no kernel support would be required, nor ZFS. This is not ideal in many ways, but it is portable and doesn't require the client to change all their filesystems to ZFS. > 3. The current network protocol, whilst it has had its share of problems > with things like timeouts and large files, does work quite well over > unreliable links such as the internet. > > Would a reliance on ZFS mounted remotely over such links cause problems > with reliability and recovery when those links drop connections? Perhaps > there are no problems, but I've tried tunnelling network filesystems > before and have had such issues (principally, /cringe/ when I was trying > to do that with Samba). That's a good point, and very important for us to investigate. It would be particularly interesting (for me) to look at the assumptions that ZFS makes about the underlying block devices, and see how accurately we can preserve and guarantee those semantics over an unreliable network. > 4. Is this independent on any work to release 0.11 in the near future? Yes, this is definitely independent, it evolved from some discussions we were having about what 0.20 should be. It will definitely not affect the upcoming release of 0.11 :-) > What is the driver for such a significant departure, I wonder? Whilst, > of course, the development is going to go in whatever direction the > developers would like, why is the current architecture considered in > need of replacement? The store management of the current system is not particularly good. In particular, we don't have point-in-time filesets which would allow point-in-time store browsing and restore; the server manages the client's old and deleted versions of files in a rather cumbersome and intrusive way (imho); housekeeping is too slow; we want to deprecate and replace raidfile. These would all be solved by using a good, reliable, robust versioning filesystem over a robust network protocol. But such filesystems are difficult to find. Box implements one, with a lot of code, but it's not ideal. ZFS implements one too, possibly better, but with even more code! ZFS is an option that looks particularly interesting at the moment, which prompted our discussion. > There has now been a lot of testing and usage of BB over a number of > years and that would be effectively cancelled for a new architecture > based on ZFS Not all of it. We still have good tests that we will continue to run. And ZFS has hopefully had a lot of testing as well, probably more than Box. > Don't get me wrong - I'm not against such a departure at all and it > would give me a chance to play with some new toys, but I'm interested in > whether there's a reason other than that it will be a nice change for us > all. I think so, i.e. it brings useful benefits and simplifies > > Stuart > > Chris Wilson wrote: >> Hi all, >> >> Ben and I have been discussing how we could possibly use Sun's Solaris ZFS >> to provide most of the functionality of Box Backup. We could get the rest >> with a couple of layers above and below ZFS, which would make Box much >> simpler and give us cool features like versioning and snapshots. >> >> We think we should move this to the -dev list so that we can all discuss >> it. >> >> We're discussing two possible approaches, not necessarily incompatible or >> either-or, just talking points: >> >> Option 1. Any filesystem on client, sync to encrypted filesystem (like >> encfs) on top of zfs on top of iscsi block device (which is mounted from >> the server, over the net). >> >> Advantages: supports any filesystem on client. Very simple server >> (effectively a network block device) which holds ZFS images which are >> managed entirely by the client. The server could in principle mount these >> images locally, but would see only encrypted filenames and data. >> >> Disadvantages: more complex and less efficient than 2. >> >> Option 2. Client runs encfs on top of local zfs, synchronises this to >> remote ZFS periodically, as Ben describes below. >> >> Advantages: more efficient (we think), simpler Box code (encrypted >> filesystem only) >> >> Disadvantages: requires kernel support, especially to run ZFS in the kernel >> in most cases. requires client to run zfs filesystems. >> >> The message below is part of our discussion. If anyone is lost or needs >> more introduction to the issue, please ask. I'll reply to Ben's email >> below, shortly. >> >> Cheers, Chris. > _______________________________________________ > Boxbackup-dev mailing list > Boxbackup-dev at fluffy.co.uk > http://lists.warhead.org.uk/mailman/listinfo/boxbackup-dev > > > !DSPAM:463f24b4176691484613203! > -- _ ___ __ _ / __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | From boxbackup-dev at fluffy.co.uk Wed May 9 15:58:55 2007 From: boxbackup-dev at fluffy.co.uk (Wout Mertens) Date: Wed, 9 May 2007 16:58:55 +0200 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: References: Message-ID: Just some general comments on this discussion. 1. ZFS does do compression, and you can even send from an uncompressed ZFS to a compressed ZFS and it will work. See http:// blogs.sun.com/mmusante/entry/zfs_compression_and_you for an example of this. I was surprised to discover this myself, I assumed that compression was part of the highest ZFS layer but it seems that it isn't. 2. ZFS will have encryption at some point in the near future. Perhaps the encryption could be ignored for the moment. 3. Perhaps you can reuse the http://zfs-on-fuse.blogspot.com zfs on fuse efforts, that code should be easier to port 4. One thing that would really rock if extra code were added that only stores blocks with the same checksum once. ZFS currently doesn't have that, but I think it's technically feasible to do something like that. Apart from that, great ideas in this thread, keep em coming ;-) Wout. From boxbackup-dev at fluffy.co.uk Wed May 9 17:25:17 2007 From: boxbackup-dev at fluffy.co.uk (Martin Ebourne) Date: Wed, 09 May 2007 17:25:17 +0100 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: References: Message-ID: <1178727918.27737.17.camel@avenin.ebourne.me.uk> On Wed, 2007-05-09 at 16:58 +0200, Wout Mertens wrote: > 4. One thing that would really rock if extra code were added that > only stores blocks with the same checksum once. ZFS currently doesn't > have that, but I think it's technically feasible to do something like > that. Er no, that wouldn't rock at all! Cheers, Martin. From boxbackup-dev at fluffy.co.uk Thu May 10 07:43:28 2007 From: boxbackup-dev at fluffy.co.uk (Stuart Hickinbottom) Date: Thu, 10 May 2007 07:43:28 +0100 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: References: <463F2476.5050607@hickinbottom.com> Message-ID: <4642BF10.10800@hickinbottom.com> Thanks for the clarifications, Chris, that's cleared a few things up for me. I'll watch with great interest how things develop! On the side issue of 0.11 - once an alpha package is produced I'll give it a good test on my machines (Linux server and clients, plus a Win32 client, server remote over internet). Presumably a compiled Win32 package will be available at that time as well? Stuart Chris Wilson wrote: > Hi Stuart, > >> That's very interesting. I'd read about ZFS previously and seen it >> had many of the features that we rely on BB to provide at the moment. >> >> What I'd be most interested in is some roadmap process that would get >> us here, with some (I suspect) FAQs about that process. > > Right now might be a bit early for a roadmap, given that we don't know > which road we're taking :-) but I think it's a good idea and we should > produce one (and a FAQ). > >> In particular: >> >> 1. Will existing stores be portable to the new store format? I'm >> pretty certain that it's 'no', but that would be a significant issue >> for existing users who have a large amount of legacy remote data and >> slow data links (eg the internet). Perhaps it would at least be >> possible to convert an existing store in a one-way process? > > I think the answer would be "No" as well. If we shift to a ZFS > filesystem on the server, we could provide an upgrade path, something > like creating a new account (with a new certificate) and a client > program to copy the old accounts contents to the new one. That would > probably not be too hard to test and be confident about. > > Sharing both data formats in the same store, or upgrading on the > server without the client's keys, would probably be much harder to > test and to get right. > >> 2. What clients and server OS's would be supported? I see there's >> work-in-progress for ZFS over FUSE for Linux, but would this be >> supported for both client and server, and what about other OS's? > > I'd like to support a cross-platform solution. My proposal would > involve replacing Box Backup's filesystem code with the ZFS code, > running in user space and tightly integrated, so that no kernel > support would be required, nor ZFS. This is not ideal in many ways, > but it is portable and doesn't require the client to change all their > filesystems to ZFS. > >> 3. The current network protocol, whilst it has had its share of >> problems with things like timeouts and large files, does work quite >> well over unreliable links such as the internet. >> >> Would a reliance on ZFS mounted remotely over such links cause >> problems with reliability and recovery when those links drop >> connections? Perhaps there are no problems, but I've tried tunnelling >> network filesystems before and have had such issues (principally, >> /cringe/ when I was trying to do that with Samba). > > That's a good point, and very important for us to investigate. It > would be particularly interesting (for me) to look at the assumptions > that ZFS makes about the underlying block devices, and see how > accurately we can preserve and guarantee those semantics over an > unreliable network. > >> 4. Is this independent on any work to release 0.11 in the near future? > > Yes, this is definitely independent, it evolved from some discussions > we were having about what 0.20 should be. It will definitely not > affect the upcoming release of 0.11 :-) > >> What is the driver for such a significant departure, I wonder? >> Whilst, of course, the development is going to go in whatever >> direction the developers would like, why is the current architecture >> considered in need of replacement? > > The store management of the current system is not particularly good. > In particular, we don't have point-in-time filesets which would allow > point-in-time store browsing and restore; the server manages the > client's old and deleted versions of files in a rather cumbersome and > intrusive way (imho); housekeeping is too slow; we want to deprecate > and replace raidfile. > > These would all be solved by using a good, reliable, robust versioning > filesystem over a robust network protocol. But such filesystems are > difficult to find. Box implements one, with a lot of code, but it's > not ideal. ZFS implements one too, possibly better, but with even more > code! > > ZFS is an option that looks particularly interesting at the moment, > which prompted our discussion. > >> There has now been a lot of testing and usage of BB over a number of >> years and that would be effectively cancelled for a new architecture >> based on ZFS > > Not all of it. We still have good tests that we will continue to run. > And ZFS has hopefully had a lot of testing as well, probably more than > Box. > >> Don't get me wrong - I'm not against such a departure at all and it >> would give me a chance to play with some new toys, but I'm interested >> in whether there's a reason other than that it will be a nice change >> for us all. > > I think so, i.e. it brings useful benefits and simplifies > >> >> Stuart >> >> Chris Wilson wrote: >>> Hi all, >>> >>> Ben and I have been discussing how we could possibly use Sun's >>> Solaris ZFS to provide most of the functionality of Box Backup. We >>> could get the rest with a couple of layers above and below ZFS, >>> which would make Box much simpler and give us cool features like >>> versioning and snapshots. >>> >>> We think we should move this to the -dev list so that we can all >>> discuss it. >>> >>> We're discussing two possible approaches, not necessarily >>> incompatible or either-or, just talking points: >>> >>> Option 1. Any filesystem on client, sync to encrypted filesystem >>> (like encfs) on top of zfs on top of iscsi block device (which is >>> mounted from the server, over the net). >>> >>> Advantages: supports any filesystem on client. Very simple server >>> (effectively a network block device) which holds ZFS images which >>> are managed entirely by the client. The server could in principle >>> mount these images locally, but would see only encrypted filenames >>> and data. >>> >>> Disadvantages: more complex and less efficient than 2. >>> >>> Option 2. Client runs encfs on top of local zfs, synchronises this >>> to remote ZFS periodically, as Ben describes below. >>> >>> Advantages: more efficient (we think), simpler Box code (encrypted >>> filesystem only) >>> >>> Disadvantages: requires kernel support, especially to run ZFS in the >>> kernel in most cases. requires client to run zfs filesystems. >>> >>> The message below is part of our discussion. If anyone is lost or >>> needs more introduction to the issue, please ask. I'll reply to >>> Ben's email below, shortly. >>> >>> Cheers, Chris. >> _______________________________________________ >> Boxbackup-dev mailing list >> Boxbackup-dev at fluffy.co.uk >> http://lists.warhead.org.uk/mailman/listinfo/boxbackup-dev >> >> >> !DSPAM:463f24b4176691484613203! >> > From boxbackup-dev at fluffy.co.uk Thu May 10 10:38:47 2007 From: boxbackup-dev at fluffy.co.uk (Wout Mertens) Date: Thu, 10 May 2007 11:38:47 +0200 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: <1178727918.27737.17.camel@avenin.ebourne.me.uk> References: <1178727918.27737.17.camel@avenin.ebourne.me.uk> Message-ID: <68CEB586-992E-4308-BB90-212C91565DFC@cisco.com> On 09 May 2007, at 18:25, Martin Ebourne wrote: > On Wed, 2007-05-09 at 16:58 +0200, Wout Mertens wrote: > >> 4. One thing that would really rock if extra code were added that >> only stores blocks with the same checksum once. ZFS currently doesn't >> have that, but I think it's technically feasible to do something like >> that. > > Er no, that wouldn't rock at all! Care to elaborate? Wout. From boxbackup-dev at fluffy.co.uk Thu May 10 11:28:33 2007 From: boxbackup-dev at fluffy.co.uk (Martin Ebourne) Date: Thu, 10 May 2007 11:28:33 +0100 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: <68CEB586-992E-4308-BB90-212C91565DFC@cisco.com> References: <1178727918.27737.17.camel@avenin.ebourne.me.uk> <68CEB586-992E-4308-BB90-212C91565DFC@cisco.com> Message-ID: <20070510112833.e60ywwzi84gso84w@ebourne.me.uk> Wout Mertens wrote: > On 09 May 2007, at 18:25, Martin Ebourne wrote: > >> On Wed, 2007-05-09 at 16:58 +0200, Wout Mertens wrote: >> >>> 4. One thing that would really rock if extra code were added that >>> only stores blocks with the same checksum once. ZFS currently doesn't >>> have that, but I think it's technically feasible to do something like >>> that. >> >> Er no, that wouldn't rock at all! > > Care to elaborate? Well obviously given two arbitrary blocks that have the same checksum (or sha hash or whatever), it's very unlikely that the blocks are actually the same. It's pretty important for a backup system to give you back the actual data you stored, not just some data that happens to have the same checksum! Cheers, Martin. From boxbackup-dev at fluffy.co.uk Thu May 10 12:46:23 2007 From: boxbackup-dev at fluffy.co.uk (Ben Summers) Date: Thu, 10 May 2007 12:46:23 +0100 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) Message-ID: <79972BB1-7B29-4783-A8E9-2C5AB05B265D@fluffy.co.uk> On Thu, 10 May 2007 11:28, Martin Ebourne wrote: > Wout Mertens wrote: > >> On 09 May 2007, at 18:25, Martin Ebourne wrote: >> >> >>> On Wed, 2007-05-09 at 16:58 +0200, Wout Mertens wrote: >>> >>> >>>> 4. One thing that would really rock if extra code were added that >>>> only stores blocks with the same checksum once. ZFS currently >>>> doesn't >>>> have that, but I think it's technically feasible to do something >>>> like >>>> that. >>>> >>> >>> Er no, that wouldn't rock at all! >>> >> >> Care to elaborate? >> > > Well obviously given two arbitrary blocks that have the same checksum > (or sha hash or whatever), it's very unlikely that the blocks are > actually the same. It's pretty important for a backup system to give > you back the actual data you stored, not just some data that happens > to have the same checksum! Hmmm. Yes and no. Yes, you want your original data back. 100% guaranteed. No, in that if you stick to this rule absolutely you can't use rsync or Box Backup's rsync-like algorithm. Maybe, in that there's a lower chance of it being a problem in the rsync case. Maybe we should add an option to turn off bandwidth efficiency? Ben From boxbackup-dev at fluffy.co.uk Thu May 10 13:56:46 2007 From: boxbackup-dev at fluffy.co.uk (Martin Ebourne) Date: Thu, 10 May 2007 13:56:46 +0100 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: <79972BB1-7B29-4783-A8E9-2C5AB05B265D@fluffy.co.uk> References: <79972BB1-7B29-4783-A8E9-2C5AB05B265D@fluffy.co.uk> Message-ID: <20070510135646.63nitixori8wwooo@ebourne.me.uk> Ben Summers wrote: > Hmmm. Yes and no. > > Yes, you want your original data back. 100% guaranteed. > > No, in that if you stick to this rule absolutely you can't use rsync > or Box Backup's rsync-like algorithm. > > Maybe, in that there's a lower chance of it being a problem in the > rsync case. There's a difference here in use between the original suggestion, =20 which was to use checksums as a compression system (which doesn't =20 work) and the way rsync etc use checksums to detect changes (checksums =20 were originally designed to detect errors of course and this works =20 well). Given two random blocks that have the same checksum, it is very =20 unlikely that they contain the same data, hence no good for compression. On the other side if you have the checksum for a block and the data in =20 the block is subsequently changed, it is very unlikely that they will =20 have the same checksum, which makes it work for the rsync etc case. Put simply, checksums are very good at detecting differences, but very =20 bad at proving similarity. (And it is counter-intuitive that these are =20 not reciprocal.) Cheers, Martin. From boxbackup-dev at fluffy.co.uk Sat May 12 01:02:49 2007 From: boxbackup-dev at fluffy.co.uk (Charles Lecklider) Date: Sat, 12 May 2007 01:02:49 +0100 Subject: [Box Backup-dev] (no subject) Message-ID: <46450429.7040403@invis.net> From boxbackup-dev at fluffy.co.uk Sat May 12 18:34:03 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Sat, 12 May 2007 18:34:03 +0100 (BST) Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: <4642BF10.10800@hickinbottom.com> References: <463F2476.5050607@hickinbottom.com> <4642BF10.10800@hickinbottom.com> Message-ID: Hi Stuart, > On the side issue of 0.11 - once an alpha package is produced I'll give > it a good test on my machines (Linux server and clients, plus a Win32 > client, server remote over internet). Presumably a compiled Win32 > package will be available at that time as well? That's great, thanks! Of course I will produce a Win32 binary package built from exactly the same source for you and others to test. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | From boxbackup-dev at fluffy.co.uk Thu May 17 21:36:18 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Thu, 17 May 2007 21:36:18 +0100 (BST) Subject: [Box Backup-dev] Re: [Box Backup-commit] Re: #24: Extraneous files in Windows clients? In-Reply-To: <060.aa2866c1dcbc6460bee8a594cb78eef6@fluffy.co.uk> References: <051.5e4c21ba25db2b0ee4bba2dac48ad3ba@fluffy.co.uk> <060.aa2866c1dcbc6460bee8a594cb78eef6@fluffy.co.uk> Message-ID: Hi Pete, On Sat, 5 May 2007, trac at fluffy.co.uk wrote: > #24: Extraneous files in Windows clients? > ----------------------+----------------------------------------------------- > Reporter: petej | Owner: > Type: defect | Status: new > Priority: trivial | Milestone: > Component: scripts | Version: 0.10 > Resolution: | Keywords: > ----------------------+----------------------------------------------------- > Comment (by petej): > > Replying to [ticket:24 petej]: > boxbackup-chris_general_1569-backup-client-mingw32.zip doesn't contain > pcreposix.dll. Should it? Does it need to? I thought I had statically linked the pcreposix library on win32. Does it run without it? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | From boxbackup-dev at fluffy.co.uk Thu May 17 22:49:23 2007 From: boxbackup-dev at fluffy.co.uk (E.W. Peter Jalajas) Date: Thu, 17 May 2007 14:49:23 -0700 (PDT) Subject: [Box Backup-dev] Re: [Box Backup-commit] Re: #24: Extraneous files in Windows clients? Message-ID: <843149.64195.qm@web60611.mail.yahoo.com> Hi Chris, I just updated a client, that's been running great for a year on 564 (or 538), with 1662. I first stopped the service, removed the service, and renamed or deleted all the .exe's and .dll's, before extracting your .zip. It just completed a 22 minute upload cycle without the pcreposix.dll with no reported problems. Is there a good specific test I should run? Note that I had to rename bbackupd.exe and bbackupquery.exe because I got "Access is denied" when I tried to delete them. Eventually bbackupd.exe deleted, but I'm still waiting for bbackupquery.exe to be able to be deleted. Weird. Also weird is the error message has the filename as it was before I renamed it, not matching what I'm looking at in Windows Explorer. I just noticed ticket 29 is essentially a duplicate of 24. I'm an idiot and/or I need to stop working on this stuff so late at night. Feel free to kill 29. Thanks, Pete ----- Original Message ---- From: Chris Wilson To: boxbackup-dev at fluffy.co.uk Sent: Thursday, May 17, 2007 4:36:18 PM Subject: [Box Backup-dev] Re: [Box Backup-commit] Re: #24: Extraneous files in Windows clients? Hi Pete, On Sat, 5 May 2007, trac at fluffy.co.uk wrote: > #24: Extraneous files in Windows clients? > ----------------------+----------------------------------------------------- > Reporter: petej | Owner: > Type: defect | Status: new > Priority: trivial | Milestone: > Component: scripts | Version: 0.10 > Resolution: | Keywords: > ----------------------+----------------------------------------------------- > Comment (by petej): > > Replying to [ticket:24 petej]: > boxbackup-chris_general_1569-backup-client-mingw32.zip doesn't contain > pcreposix.dll. Should it? Does it need to? I thought I had statically linked the pcreposix library on win32. Does it run without it? Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | _______________________________________________ Boxbackup-dev mailing list Boxbackup-dev at fluffy.co.uk http://lists.warhead.org.uk/mailman/listinfo/boxbackup-dev From boxbackup-dev at fluffy.co.uk Fri May 18 00:06:51 2007 From: boxbackup-dev at fluffy.co.uk (Chris Wilson) Date: Fri, 18 May 2007 00:06:51 +0100 (BST) Subject: [Box Backup-dev] Re: [Box Backup-commit] Re: #24: Extraneous files in Windows clients? In-Reply-To: <843149.64195.qm@web60611.mail.yahoo.com> References: <843149.64195.qm@web60611.mail.yahoo.com> Message-ID: Hi Pete, > I just updated a client, that's been running great for a year on 564 (or > 538), with 1662. I first stopped the service, removed the service, and > renamed or deleted all the .exe's and .dll's, before extracting your > .zip. It just completed a 22 minute upload cycle without the > pcreposix.dll with no reported problems. Is there a good specific test > I should run? No, it wouldn't even start if the DLL was missing. Can you find it anywhere else on your system? > Note that I had to rename bbackupd.exe and bbackupquery.exe because I > got "Access is denied" when I tried to delete them. Eventually > bbackupd.exe deleted, but I'm still waiting for bbackupquery.exe to be > able to be deleted. Weird. Also weird is the error message has the > filename as it was before I renamed it, not matching what I'm looking at > in Windows Explorer. Are bbackupd/bbackupquery processes still running on your system? Could you kill them and try again to replace the binaries? > I just noticed ticket 29 is essentially a duplicate of 24. I'm an idiot > and/or I need to stop working on this stuff so late at night. Feel free > to kill 29. Thanks for the notification. Cheers, Chris. -- _____ __ _ \ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK | / (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer | \ _/_/_/_//_/___/ | We are GNU-free your mind-and your software | From boxbackup-dev at fluffy.co.uk Fri May 18 05:29:53 2007 From: boxbackup-dev at fluffy.co.uk (E.W. Peter Jalajas) Date: Thu, 17 May 2007 21:29:53 -0700 (PDT) Subject: [Box Backup-dev] Re: [Box Backup-commit] Re: #24: Extraneous files in Windows clients? Message-ID: <995213.84644.qm@web60622.mail.yahoo.com> > No, it wouldn't even start if the DLL was missing. Can you find it > anywhere else on your system? I did a Windows search and found pcreposix.dll in various subdirectories of Program Files\Box Backup and in various zip files ("compressed (zipped) folders") under there, left over from prior updates. Not sure if this helps (Box Backup is installed in C:\Program Files\Box Backup\): P:\>path PATH=C:\Program Files\Support Tools\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\Microsoft SQL Server\80\Tools\Binn\;D:\Program Files\;D:\cygwin\bin; P:\>echo %ProgramFiles% C:\Program Files > Are bbackupd/bbackupquery processes still running on your system? Could > you kill them and try again to replace the binaries? No, sorry, I neglected to mention that in my post. They were not shown in Task Manager. After logging in and out of that server a few times today, the bbackupquery file, now renamed to just "x" still can't be deleted, with Access denied. bbackupd.exe is running, but no bbackupquery (or "x") process. Maybe weirder still, I can move it (cut and paste) to another directory, but still can't delete it from that new location. If I copy and paste it, I can delete the copy. ... Got it... I had a flashback that maybe cygwin was involved a year ago on this server. I opened a cygwin terminal and did this, slightly redacted: petej at HM /cygdrive/c/Program Files/Box Backup/INET $ ls -al total 5033 drwx------+ 2 Administrators Domain Users 0 May 18 00:13 . drwx------+ 5 Administrators Domain Users 0 May 18 00:11 .. -rwx------+ 1 Administrators Domain Users 5148900 May 18 00:08 x petej at HM /cygdrive/c/Program Files/Box Backup/INET $ rm x petej at HM /cygdrive/c/Program Files/Box Backup/INET $ ls -al total 1 drwx------+ 2 Administrators Domain Users 0 May 18 00:17 . drwx------+ 5 Administrators Domain Users 0 May 18 00:11 .. Thanks, Chris. Pete From boxbackup-dev at fluffy.co.uk Fri May 18 08:42:55 2007 From: boxbackup-dev at fluffy.co.uk (Dave Bamford) Date: Fri, 18 May 2007 08:42:55 +0100 Subject: [Box Backup-dev] Re: [Box Backup-commit] Re: #24: Extraneous files in Windows clients? In-Reply-To: <843149.64195.qm@web60611.mail.yahoo.com> References: <843149.64195.qm@web60611.mail.yahoo.com> Message-ID: <464D58FF.30807@logical-progress.com> Hi Peter I had problems deleting the old .exe files too. But especially on 2003 server, I eventually had to rename the folder containing the files before I could delete them. Thanks for the tip of using cygwin, I will try that next time. Dave Bamford. E.W. Peter Jalajas wrote: > Hi Chris, > > I just updated a client, that's been running great for a year on 564 (or 538), with 1662. I first stopped the service, removed the service, and renamed or deleted all the .exe's and .dll's, before extracting your .zip. It just completed a 22 minute upload cycle without the pcreposix.dll with no reported problems. Is there a good specific test I should run? > > Note that I had to rename bbackupd.exe and bbackupquery.exe because I got "Access is denied" when I tried to delete them. Eventually bbackupd.exe deleted, but I'm still waiting for bbackupquery.exe to be able to be deleted. Weird. Also weird is the error message has the filename as it was before I renamed it, not matching what I'm looking at in Windows Explorer. > > I just noticed ticket 29 is essentially a duplicate of 24. I'm an idiot and/or I need to > stop working on this stuff so late at night. Feel free to kill 29. > > > Thanks, > Pete > > ----- Original Message ---- > From: Chris Wilson > To: boxbackup-dev at fluffy.co.uk > Sent: Thursday, May 17, 2007 4:36:18 PM > Subject: [Box Backup-dev] Re: [Box Backup-commit] Re: #24: Extraneous files in Windows clients? > > Hi Pete, > > On Sat, 5 May 2007, trac at fluffy.co.uk wrote: > > >> #24: Extraneous files in Windows clients? >> ----------------------+----------------------------------------------------- >> Reporter: petej | Owner: >> Type: defect | Status: new >> Priority: trivial | Milestone: >> Component: scripts | Version: 0.10 >> Resolution: | Keywords: >> ----------------------+----------------------------------------------------- >> Comment (by petej): >> >> Replying to [ticket:24 petej]: >> boxbackup-chris_general_1569-backup-client-mingw32.zip doesn't contain >> pcreposix.dll. Should it? >> > > Does it need to? I thought I had statically linked the pcreposix library > on win32. Does it run without it? > > Cheers, Chris. > From boxbackup-dev at fluffy.co.uk Tue May 22 16:23:48 2007 From: boxbackup-dev at fluffy.co.uk (Wout Mertens) Date: Tue, 22 May 2007 17:23:48 +0200 Subject: [Box Backup-dev] Re: Learning from ZFS (fwd) In-Reply-To: <20070510135646.63nitixori8wwooo@ebourne.me.uk> References: <79972BB1-7B29-4783-A8E9-2C5AB05B265D@fluffy.co.uk> <20070510135646.63nitixori8wwooo@ebourne.me.uk> Message-ID: <56F85A7B-6E32-400F-BAEC-3F329C69EF0C@cisco.com> On 10 May 2007, at 14:56, Martin Ebourne wrote: > Put simply, checksums are very good at detecting differences, but > very bad at proving similarity. (And it is counter-intuitive that > these are not reciprocal.) Well put, and I completely agree. I also never said that when two checksums match, zfs should throw away a block without comparing the contents ;-) That said, I have it on good authority that a large company that sells Content-Addressable Storage does not do the compare phase, under the assumption that two random data blocks that have the same checksum _and also make sense_ are way too rare to support taking that performance hit. The odds of two data blocks of backed up data (ie non-random) having the same checksum are pretty low. That said, I'd prefer it if the hypothetical zfs-block-deduper would check before coalescing blocks. ;-) Wout. From boxbackup-dev at fluffy.co.uk Tue May 29 18:04:17 2007 From: boxbackup-dev at fluffy.co.uk (G.) Date: Tue, 29 May 2007 10:04:17 -0700 (PDT) Subject: [Box Backup-dev] Proposal: strong file content checksum for BoxBackup file change detection. Message-ID: <20070529170417.65554.qmail@web36714.mail.mud.yahoo.com> Hi everyone, Proposal: ------------------ Strong file content checksum for BoxBackup file change detection. Problem: ------------------ Under certain circumstances a file can change its content without changing its size or timestamp. An example might be a TrueCrypt secure, encrypted fixed-size volume with "plausible deniability" feature, which keeps a volume file timestamp constant, in order to hide usage patterns. Current BoxBackup folder-level and file-level change detection algorithms are not capable of detecting such modifications, resulting in inconsistent backups. Solution: ------------------ Include per-file MD5 signatures in folder-level checksum generation algorithm to detect changes at folder scan -level. Re-use per-file MD5 signatures for file-level checksum generation algorithm to detect changes at file scan -level. Persist file-level MD5 checksums in dynamically-sized file attribute stream (?) to avoid backup store upgrade and migration problems. Applicability: ------------------ The feature should be optional, but available for those who want to be sure of 100% consistent backup snapshots and are willing to sacrifice scanning cycle performance. Side-Effects: ------------------ The ability to run a very fast bbackupquery compare cycle, utilizing MD5 signatures, as opposed to the current "compare -aq", which needs to download all block-level checksums for all files from a remote server. Prototype: ------------------ A simple simulation that generates folder-level MD5 signatures for all files in all folders to detect content change. Evaluates raw performance degradation for a file scanning cycle and overall backup performance degradation for a file scanning cycle along with minimal, but significant, network traffic. Assumes low-end network connectivity: an over-the-Internet backup to a server half a world away. It should be noted that I define "true performance penalty" as a percentage of total backup time, thus the percentage of time one really sacrifices in respect to an entire backup cycle due to MD5 signature generation. Hardware: ------------------ * QX6700 quad-core * 10K Raptors * RAID1 * Windows Vista 32-bit * NTFS Sample Backup: ------------------ * ~2.5GB * ~15,000 files * ~3,000 folders Summary: ------------------ 1.) Raw performance degradation for a file scanning cycle. * no content changes * Vanilla: weak folder-level checksums examined (attributes, mod time, etc.) * MD5: strong file-level checksums examined to calculate strong folder-level checksums (attributes, mod time, etc. + content) * no network traffic a.) Vanilla: * runtime: 8 seconds total b.) MD5: * runtime: 1 minute, 45 seconds >> 105 seconds total * penalty: 97 seconds, ~825% 2.) Overall backup performance degradation for a file scanning cycle and minimal, but significant, network traffic. * changes reported artificially for all folder-level checksums * Vanilla: weak file-level checksums examined (attributes, mod time, etc.) * MD5: strong file-level checksums examined (attributes, mod time, etc. + content) * client/server ListDirectory commands executed for all folders, thus simulating ~20% potential delta * no file content sent a.) Vanilla: * runtime: 5 minutes, 56 seconds >> 356 seconds total b.) MD5: * runtime: 8 minutes, 29 seconds >> 509 seconds total * penalty: 153 seconds, ~42% Conclusions: ------------------ 1.) For small-delta, frequent real-time scanning environments, MD5 checking would incur large penalty of ~500% - ~800%. However, it is important to note that the 800% degradation in question applies to the total scanning time of mere seconds or a couple of minutes, resulting in an overall loss of a few additional minutes of scanning time. 2.) For infrequent scanning environments with excellent network connectivity, MD5 checking would incur raw penalty of ~30% - ~40%. It should be noted that even without any content change, BoxBackup client/server ListDirectory commands alone have decreased the significance of MD5 signature generation penalty by an order of magnitude. 3.) For massive delta, large file diffing time and content upload infrequent scanning environments, network traffic seems to be the most limiting factor. Overall MD5 checking penalty would be probably in the ~10% - ~15% range, or inconsequential. This is a guess-estimate, but consistent logic-wise and consistent with my earlier prototype experiments. As an example, the ~800% scanning-only penalty was 1.5 minutes. I have a single large-file diffing time-out set to 10 minutes, otherwise network traffic overwhelms the overall backup cycle. 4.) Potential scanning speed improvements could involve weaker, but still reliable signature generation algorithms (MD4, CRC-32, etc.) and parallelism for multi-core systems. --- Thoughts? Anyone else interested in such a feature? Gary ____________________________________________________________________________________ Need Mail bonding? Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users. http://answers.yahoo.com/dir/?link=list&sid=396546091