[Box Backup] Regex exclusions

Peter Jalajas, GigaLock Backup Services pjalajas at gigalock.com
Thu Sep 29 17:50:00 BST 2011


Hi Achim,

I'm interested in collaborating on this. Thanks for raising the issue.

A few planning or philosophical topics we should consider:
1.  Windows Exclusions are not case-sensitive, so we no longer have to
add both upper and lower case versions of each Exclusion--yeah!
2.  Different use-cases will need different Exclusion sets, so we
should be prepared to offer at least a couple of different versions.
For example, my (intended) strategy is to Exclude _everything_, then
explicitly Include important files.  I do that because I've seen a lot
of junk, with lots of versions of each, being backed.  Do we prepare
for bare-metal restore?  Do we backup apps for which a CD is likely
handy?  Etc.
3. How should we merge together our different sets of Exclusion
suggestions?  Should we move this to a trac wiki page on the website?
I tried to do so before sending this, but, alas, I've locked myself
out of the trac wiki (I've asked Chris offline to help me out).  Here
is an old page that could be resurrected for this purpose:
https://www.boxbackup.org/trac/wiki/Win32Regex
4. Should we, or not, use regex option pipes to combine options?
5.  We should maybe alphabetize the Exclusions as much as possible.

(I wish we could push some common Exclusion strings into variables of
some sort that different BackupLocations could "import".  Exclusions
are fraught with typographic errors.)

For the record, below are my current default Exclusions--suggestions welcome.

Thanks,
Pete

Pete's Annotated Default Exclusion List.  Accumulated over years of
seeing junk files being backed up for no reason; I went through a very
long bbackupquery listing from one or two clients to create this list.
 I believe I am Excluding system files that would be useless during a
bare-metal restore, but I'm open to correction on that.  I don't
really believe in bare-metal restores; too many things change too
quickly; a major OS crash is a good time to clean out cruft, just
reinstall the OS and Apps from CD, then add your current users, and
then restore their data.  Customer directory structures frequently
change dramatically after a hard-drive crash, sometimes because an OS
upgrade is required.

ExcludeFilesRegex = .*   #Exclude everything, because there is a lot
of junk out there, then explicitly Include important files.

ExcludeFilesRegex = \.(dbx|mdb|pst|qbw|rar)$  #Being explicit that we
are not backing up these frequently business-critical files because of
file-locking issues.  Until VSS is working, user should instead use
application-specific backup tools.  .rar files tend to be larger and
of special use-cases; customer can add back in if desired.

AlwaysIncludeFilesRegex =
.+\.(7z|accd[abeprtu]|do[ct][mx]?|x?html?|lnk|mde|o[dpt][bgfmstp]|pdf|p[op][ats][mx]?|qbb|rdp|rtf|txt|url|wbk|wpd|x[al][abclmrst][bmx]?|zip)$
 #A long list of common business-critical files, including shortcuts,
bookmarks, app-specific backup files.

AlwaysIncludeFilesRegex = .*backup.*\.pst$  # Perhaps the most
business-critical files (Outlook email backup files).

ExcludeFilesRegex = pagefile\.sys$   #Windows memory swap file.
ExcludeDirsRegex  = .+\\pagefile\.sys$  #Years ago I think I had a
problem with Box Backup seeing this apparent file as a Directory.
ExcludeFilesRegex = hiberfil\.sys$  #Windows Hibernate file.
ExcludeDirsRegex  = .+\\hiberfil\.sys$  #Or was it this one that had
the File vs Directory problem?

ExcludeFilesRegex = boot\.ini$
ExcludeFilesRegex = ntldr$
ExcludeFilesRegex = thumbs\.db$
ExcludeFilesRegex = Perflib.*
ExcludeFilesRegex = NTDETECT\.COM$
ExcludeFilesRegex = ntuser\.dat$
ExcludeFilesRegex = ntuser\.dat\.log$
ExcludeFilesRegex = UsrClass\.dat\.LOG$
ExcludeFilesRegex = UsrClass\.dat$
ExcludeFilesRegex = \..*$  #Files that begin with a dot.

ExcludeDirsRegex  = .+\\\..*$  #Directories that begin with a dot.
ExcludeDirsRegex  = .+\\Application Data$  #NOTE:  Maybe some
important info under this directory?
ExcludeDirsRegex  = .+\.cab$  #Another one that is possibly confused
as a File or Directory...
ExcludeDirsRegex  = .+\\Cache$
ExcludeDirsRegex  = .+\\Common Files$
ExcludeDirsRegex  = .+\\Cookies$
ExcludeDirsRegex  = .+\\Default User$
ExcludeDirsRegex  = .+\\Downloads$  #I use my Downloads directory as a
kind of deletable-anytime area.
ExcludeDirsRegex  = .+\\Drivers$
ExcludeDirsRegex  = .+\\Installer$
ExcludeDirsRegex  = .+\\I386$
ExcludeDirsRegex  = .+\\IBMTools$
ExcludeDirsRegex  = .+\\IECompatCache$
ExcludeDirsRegex  = .+\\IETLDCache$
ExcludeDirsRegex  = .+\\InstallAnywhere$
ExcludeDirsRegex  = .+\\Local Settings$
ExcludeDirsRegex  = .+\\LocalService$
ExcludeDirsRegex  = .+\\MS.*Cache$
ExcludeDirsRegex  = .+\\My ebooks$
ExcludeDirsRegex  = .+\\My Media$
ExcludeDirsRegex  = .+\\My Music$
ExcludeDirsRegex  = .+\\My Pictures$
ExcludeDirsRegex  = .+\\My Received Files$
ExcludeDirsRegex  = .+\\My Videos$
ExcludeDirsRegex  = .+\\NetworkService$
ExcludeDirsRegex  = .+\\NetHood$
ExcludeDirsRegex  = .+\\PrintHood$
ExcludeDirsRegex  = .+\\PrivacIE$
ExcludeDirsRegex  = .+\\Recycl.*$
ExcludeDirsRegex  = .+\\Support$
ExcludeDirsRegex  = .+\\System Volume Information$
ExcludeDirsRegex  = .+\\Templates$
ExcludeDirsRegex  = .+\\Temporary Internet Files$
ExcludeDirsRegex  = .+\\Thumbs$
ExcludeDirsRegex  = .+\\UserData$
ExcludeDirsRegex  = .+\\Windows$


On Thu, Sep 29, 2011 at 9:07 AM, Achim J. Latz <achim+box at qustodium.net> wrote:
> Just to make sure: Nobody interested in collaborating on this?
>
> On 14/09/2011 22:34, Achim wrote:
>>
>> Hello list:
>>
>> I have a question and a proposal about exclusions:
>>
>> 1) I seem to remember that I read somewhere that the exclusions are
>> specific to every platform: is the syntax really different, even though
>> we use PCRE on all platforms? It would seem like a great idea to have a
>> universal set of exclusions that come by default with bbackupd.conf, and
>> can then be built upon by the user.
>>
>> 2) Having said that, let's compare notes on the exclusion lists that
>> might (hopefully) become part of the web page documentation [1, 2] or
>> even the "standard distribution". What do you think about the following
>> ones for Windows. Please note that some are commented, some might be
>> wrong, certainly not all have been tested:
>>
>> DIRS
>> ====
>> ExcludeDirsRegex = .*\\Temp$
>> ExcludeDirsRegex = .+\.cab$
>> ExcludeDirsRegex = .+\\\..*$
>> ExcludeDirsRegex = .+\\\.cvs$
>> ExcludeDirsRegex = .+\\\.fseventsd$
>> ExcludeDirsRegex = .+\\\.Spotlight-V100$
>> ExcludeDirsRegex = .+\\\.svn$
>> ExcludeDirsRegex = .+\\\.TemporaryItems$
>> ExcludeDirsRegex = .+\\\.Trashes$
>> # ExcludeDirsRegex = .+\\Application Data$
>> ExcludeDirsRegex = .+\\Cache$
>> ExcludeDirsRegex = .+\\Common Files$
>> ExcludeDirsRegex = .+\\Cookies$
>> ExcludeDirsRegex = .+\\Default User$
>> ExcludeDirsRegex = .+\\Downloads$
>> ExcludeDirsRegex = .+\\Drivers$
>> ExcludeDirsRegex = .+\\Google Desktop Search$
>> ExcludeDirsRegex = .+\\I386$
>> ExcludeDirsRegex = .+\\IBMTools$
>> ExcludeDirsRegex = .+\\IECompatCache$
>> ExcludeDirsRegex = .+\\IETLDCache$
>> ExcludeDirsRegex = .+\\InstallAnywhere$
>> ExcludeDirsRegex = .+\\Installer$
>> #ExcludeDirsRegex = .+\\Local Settings$
>> ExcludeDirsRegex = .+\\Local Settings\\.*\\Cache$
>> ExcludeDirsRegex = .+\\LocalService$
>> ExcludeDirsRegex = .+\\Microsoft\Search\Data$
>> ExcludeDirsRegex = .+\\MS.*Cache$
>> #ExcludeDirsRegex = .+\\My ebooks$
>> #ExcludeDirsRegex = .+\\My Media$
>> #ExcludeDirsRegex = .+\\My Music$
>> #ExcludeDirsRegex = .+\\My Pictures$
>> #ExcludeDirsRegex = .+\\My Received Files$
>> #ExcludeDirsRegex = .+\\My Videos$
>> ExcludeDirsRegex = .+\\NetHood$
>> ExcludeDirsRegex = .+\\NetworkService$
>> ExcludeDirsRegex = .+\\PrintHood$
>> ExcludeDirsRegex = .+\\PrivacIE$
>> ExcludeDirsRegex = .+\\RECYCLED$
>> ExcludeDirsRegex = .+\\RECYCLER$
>> ExcludeDirsRegex = .+\\\$RECYCLE.BIN
>> #ExcludeDirsRegex = .+\\Support$
>> ExcludeDirsRegex = .+\\System Volume Information$
>> #ExcludeDirsRegex = .+\\Templates$
>> ExcludeDirsRegex = .+\\Temporary Internet Files$
>> #ExcludeDirsRegex = .+\\Thumbs$
>> ExcludeDirsRegex = .+\\UserData$
>> #ExcludeDirsRegex = .+\\Windows$
>> ExcludeDirsRegex = .+\\Windows\\Prefetch$
>> ExcludeDirsRegex = .+\\WINDOWS\\system32\\spool\\PRINTERS$
>>
>> FILES
>> =====
>> #ExcludeFilesRegex = .*
>> #ExcludeFilesRegex = \.(dbx|mdb|pst|qbw|rar)$
>> #ExcludeFilesRegex = \..*$
>> #ExcludeFilesRegex = boot\.ini$
>> ExcludeFilesRegex = hiberfil\.sys$
>> #ExcludeFilesRegex = NTDETECT\.COM$
>> #ExcludeFilesRegex = ntldr$
>> #ExcludeFilesRegex = ntuser\.dat$
>> #ExcludeFilesRegex = ntuser\.dat\.log$
>> ExcludeFilesRegex = pagefile\.sys$
>> ExcludeFilesRegex = Perflib.*
>> ExcludeFilesRegex = thumbs\.db$
>> #ExcludeFilesRegex = UsrClass\.dat$
>> #ExcludeFilesRegex = UsrClass\.dat\.LOG$
>> ExcludeFilesRegex = .+\\\.DS_Store$
>> ExcludeFilesRegex = .+\\\.Spotlight-V100$
>> # ExcludeFilesRegex =
>>
>> .+\.([tT][mM][pP]|[bB][aAcC][kK]|[dD][bB][kK]|[bB][kK][~!1-9]|[mMtT][bB][kK]|[oO][lL][dD]|[sS][aA][vV]|[sS][wW][pP]|[cC][sS][mM]|[oO][bB][jJ])$
>>
>> ExcludeFilesRegex = .+\\~.*
>> ExcludeFilesRegex = .*\\\._.*
>>
>> [1]
>>
>> <http://www.boxbackup.org/trac/wiki/ConfiguringAClient#ExcludingFilesandDirectoriesfromtheBackup>
>>
>> [2] <http://www.boxbackup.org/trac/wiki/Win32Regex>
>>
>> _______________________________________________
>> Boxbackup mailing list
>> Boxbackup at boxbackup.org
>> http://lists.boxbackup.org/cgi-bin/mailman/listinfo/boxbackup
>
>
> --
> Achim J. Latz, Qustodium Internet Security
> achim.latz at qustodium.net · http://www.qustodium.net
> Data Encryption · Backup Automatisation · E-Mail Protection
> _______________________________________________
> Boxbackup mailing list
> Boxbackup at boxbackup.org
> http://lists.boxbackup.org/cgi-bin/mailman/listinfo/boxbackup
>



More information about the Boxbackup mailing list