[Box Backup] Server redundancy and backup servers

Fri Sep 24 14:08:44 BST 2004

I think it would be a good idea to have the client choose the initial, and
failover server from the list in a random fashion, and then prefer the
last server it talked to.

So the algorithm would use the same code whenever it was unable to contact
the last used server, and in the case of the first run, there would be no
last used server.

This way you don't have to configure the client to start with a particular
server, it just works out.

Rick

On Fri, 24 Sep 2004, Ben Summers wrote:

>
> I have been discussing redundancy for servers off-list, and have come
> up with some plans and preliminary design notes. A copy is below for
> your comments.
>
> Ben
>
>
>
> Design objectives
>
> * Failure means the server cannot be contacted by the client. If a
> server can contacted by another server but not the client, then that
> server must still be considered down.
>
> * No central server. The objective above means server choice must be
> made by the client.
>
> * A misbehaving client should not cause the stores to lose
> syncronisation.
>
> * Assume that all servers have the same amount of disc space, and
> identical disc configuration.
>
> * Allow choice of primary and secondary on a per account basis.
>
> * Any connection can be dropped at any time, and the stores should be
> in a workable, if non-optimal, state.
>
> * As simple as possible. Avoid keeping large amounts of state about the
> accounts on another server.
>
>
> Server groups.
>
> * The client store marker is defined to change at the end of every sync
> (if and only if data changed) from the client. The client sync marker
> should increase each time the store is updated. This allows the server
> groups to determine easily if they are in sync, and which is the latest
> version.
>
> * Stores are grouped. Each server is a peer within the group.
>
> * On login, the server returns a list of all other servers in the
> group. The client records this list on disc.
>
> * When the client needs to obtain a connection to a store, it uses the
> following algorithm:
>
> Let S = last server successfully connected
> Let P = primary server
> Do
> {
> 	Attempt to connect to S
> 	If(S == P and S is not connected)
> 	{
> 		Pause;
> 		Try connecting to P again.
> 	}
>
> } While(S is not connected and not all servers have been tried)
>
> If(S is not connected)
> {
> 	Pause
> 	Start process again
> }
>
> Let CSM_S = client store marker from S
>
> If(S != P)
> {
> 	Attempt to connect to P again, but with a short timeout this time
> 	If(P is connected)
> 	{
> 		Let CSM_P = client store marker from P
> 		If(CSM == expected client store marker)
> 		{
> 			Disconnect S
> 			S = P
> 		}
> 		else
> 		{
> 			Disconnect P
> 		}
> 	}
> }
>
> This algorithm ensures that the client prefers to connect to the
> primary server, but will keep talking to the secondary server for as
> long as it's available and is at a later state than the primary store.
> (This gives time for the data to be transferred from the secondary to
> the primary and avoid repeat uploads of data.)
>
> * Servers within a group use the fast sync protocol to update accounts
> on a regular basis.
>
>
> Observations
>
> * The severs are simply peers. The primary server for an account is
> chosen merely by configuring the client.
>
> * If the servers simply use best efforts to keep each other up to date,
> the client will automatically choose the best server to contact.
>
> * Using the existing methods of handling unexpected changes to the
> client store marker, it doesn't matter whether a server is out of date
> or not. The existing code handles this occurance perfectly.
>
> * The servers do not need to check whether other servers are down. This
> fact is actually irrelevant, because it's the client's view of upness
> which is important.
>
>
> Accounts
>
> The accounts database must be identical on each machine.
> bbstoreaccounts will need to push changes to all servers. It will
> probably be necessary to change the account database, and store the
> limits within the database rather than in the stores of data. This is
> desirable anyway.
>
> Note: If another server is down, it won't be possible to update the
> account database.
>
> Alternatively, servers could update each other with changes to the
> accounts database on a lazy basis. This might cause issues with
> housekeeping unnecessarily deleting files which have to be replaced.
>
>
> Fast sync protocol.
>
> * Compare client store markers. End if they are the same. Otherwise,
> the server with the greater number becomes the source, and the lesser
> the target.
>
> * Zero client store marker on target
>
> * Send stream of deleted (by housekeeping) object IDs from source to
> target. Target deletes the objects immediately.
>
> * Send stream of object ID + hash of directories on source server to
> the target.
>
> * For each directory on the target server which doesn't exist, or
> doesn't have the right hash...
> 	- check objects exist, and transfer them
> 	- write directory, only if all the objects are correct
> 	- check for patches. Attempt to transfer by patch if new version exists
>
> * Each server records the client store marker it expects on the remote
> server. If that marker is not as expected, then the contents of the
> directories are checked as well, sending MD5 hashes across. This allows
> recovery from partial syncs. [This should probably be optimised if for
> when there's an empty store at one end.]
>
> * When an object is uploaded, the "last object ID used" value for that
> account should be kept within the acceptable range to allow recovery
> when syncing with the client.
>
> * Write new client store marker on target
>
> If a client connects during a fast sync, then that fast sync will be
> aborted to give the client the lock on the account.
>
>
>
>
> Optimised fast sync.
>
> It's undesirable for the fast sync to check every directory when it
> doesn't have to. During sync with a client a store
>
> * Keeps a list of changed directories by writing to disc (and flushing)
> every time a directory is saved back to disc.
>
> * Keep patches from previous versions to send to remote store
>
> * Connect after backup to remote stores, use fast sync to send changes
> over.
>
> This will allow short-cuts to be taken when syncing, and changes sent
> by patch.
>
> The cache of patches will need to be managed, deleting them when they
> are transferred to a peer or are too old.
>
>
> Housekeeping
>
> Deleted objects need to be kept in sync too. Housekeeping takes place
> indepedently on each server. Since housekeeping is a determinisitic
> process, this should not delete different files on different servers.
>
> A list of deleted objects is kept on each server during the
> housekeeping process.
>
> In the unlikely event that a server deletes an object that the source
> server doesn't, this object will be retrieved in the next fast sync.
> This is unlikely to happen because clients only add data.
>
> Typically, housekeeping on non-primary servers will never delete an
> object in that account.
>
>
>
>
> _______________________________________________
> boxbackup mailing list
> boxbackup at fluffy.co.uk
> http://lists.warhead.org.uk/mailman/listinfo/boxbackup
>