[Box Backup] Boxbackup for Win32 and accentuated characters

Nick Knight boxbackup at fluffy.co.uk
Thu Feb 3 09:28:49 GMT 2005


Hello Pascal,

The win32 client should be able to cope with the Unicode filenames, I =
suspect there maybe issues with one of my functions - I only ever tested =
it with the filenames - didn't cross my mind about the directory names - =
so never tested it - and they are handled by different functions, I will =
look into it...

Nick

-----Original Message-----
From: boxbackup-admin at fluffy.co.uk [mailto:boxbackup-admin at fluffy.co.uk] =
On Behalf Of Pascal Lalonde
Sent: 01 February 2005 19:07
To: boxbackup at fluffy.co.uk
Subject: [Box Backup] Boxbackup for Win32 and accentuated characters

Hi,

I've been having some problems regarding file/directory names containing
accentuated characters (which are quite common in the french version of
Windows XP).

Here is what happens:
First of all I must mention that BoxBackup does not descend in
directories with accentuated characters. Only the directory itself is
backed up, and anything under it is ignored. I get the following message
in the event viewer, for each such directory:
Backup object failed, error when reading L:\\profiles\pascal\Menu
D??marrer
(Substitute the two ?? for ISO8859-1 characters C3 and A9 respectively:
capital A with a little thing above it and the copyright symbol)
It should really be "Menu D=E9marrer".

Windows XP seems to store file/directory names in UTF-8. Thus,
accentuated characters are encoded on 2 bytes. For example, the letter =
=E9
(e with acute) in UTF-8 is encoded as "C3 A9" in hex. When browsing
files in boxquery, such name will show up with different characters, as
they are interpreted using CP850 or something like that (Windows
cmd.exe's default codepage, it seems). Instead of an e acute, you get
two symbols: the first is one of those border symbols used in old DOS
dialog-based apps, and the second is the "Registered" symbol (C3 and A9
in CP850 respectively). It is still possible though to restore such
files by restoring the parent directory (which is always the case, since
otherwise boxbackup would not descend in it). Now here's the strangest
part. Upon restore, here is what happens:

C3 becomes a capital A with the tilde above it (ISO8859-1's C3
character)
A9 becomes the Copyright symbol.

But if Windows stores filenames in UTF-8, this means that the individual
bytes were first interpreted as ISO8859-1 characters, then translated to
their UTF-8 equivalent. In fact, if you let Boxbackup take a backup of
your restored folder (now containing A-tilde and Copyright), these two
characters take 2 bytes each in the UTF-8 encoding. Upon restoring one
more time, the new folder now has 4 special characters instead of the
"=E9" in the first version of the folder.

Now, by rereading all this e-mail, I find it a little confusing. I think
the best way would be to try it yourself. Just create a folder with an
accentuated character in it (=E9 for example), and let boxbackup back it
up. Then restore it. The results should be:

1) Files with accentuated characters are OK
2) Directories are restored with special characters instead of the
original accentuated character
3) Nothing below the accentuated directory is backed up

Could anyone confirm this behavior ?

I'm using boxwin0.09b on XP Pro french with a 0.09 server on OpenBSD
3.4.

Thanks,
Pascal

_______________________________________________
boxbackup mailing list
boxbackup at fluffy.co.uk
http://lists.warhead.org.uk/mailman/listinfo/boxbackup





More information about the Boxbackup mailing list