You are on page 1of 12

Interwoven Best Practices

Backup White Paper

Version 1.0

John Zheng
Center of Excellence, 2001
Interwoven, Inc.
1195 Fremont Avenue
Sunnyvale, CA 94087

Tel: 408-774-2000
Fax: 408-774-2002
www.interwoven.com

© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jurisdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

Table of Contents
1 INTRODUCTION ...........................................................................................................3

2 BACKGROUND ............................................................................................................3

3 GENERAL PROCEDURE AND ISSUES ........................................................................3


3.1.1 Procedure......................................................................................................................3
3.1.2 Issues ............................................................................................................................3
4 CALCULATING AND MANAGING BACKING STORE GROWTH ..................................5
4.1.1 File System Configuration .............................................................................................5
4.1.2 File Addition and Modification Rate...............................................................................6
4.1.3 Editions and Number of Items per Directory .................................................................7
5 HEURISTIC APPROACH TO BACKING STORE GROWTH ..........................................8

6 RECOMMENDATIONS FOR THE WINDOWS PLATFORMS.........................................9

7 RECOMMENDATIONS FOR SOLARIS .......................................................................10

8 RECOMMENDATIONS FOR VERITAS .......................................................................10

9 NETAPP HARDWARE RECOMMENDATIONS............................................................11

10 WRAP UP ................................ ................................ ................................ ................... 12

Interwoven Client Services, 2/20/2002 Page 2 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

1 INTRODUCTION
This document describes the concepts and issues behind an effective backup system
for TeamSite. We will look at traditional backup methods and their limitations, and
examine some specialize solutions required for larger implementations.

2 BACKGROUND
Most TeamSite customers are able to use conventional backup methods and be able to
backup their TeamSite system in a limited time window. However, more and more
customers have very large backing stores, or have high availability requirements that
make conventional backup techniques ineffective. While Interwoven is not in the
business of providing a backup solution to its customers, this document provides some
guidance on what customers should look for when considering an enterprise wide
solution.

3 GENERAL PROCEDURE AND ISSUES


3.1.1 PROCEDURE
Backing up the TeamSite server generally involves the following:

1. Freeze the backing store with:

iwfreeze +<seconds>

Where <seconds> is the number of seconds it takes to backup the iw-


store

2. Backup <iw-store>

3. Unfreeze the backing store with:

iwfreeze --

4. Backup <iw-home> (can occur concurrently with backing up <iw-store>

On Solaris, also backup /etc

3.1.2 ISSUES
There can be various issues with this procedure depending on a system ’s
requirements:

Interwoven Client Services, 2/20/2002 Page 3 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

1. iwfreeze can take dozens of minutes:

a. This is unavoidable if the system was fairly busy right before the
backup starts. The system may have dirty cache points that need to
flushed out to disk before the backing store can be frozen. The st atus
of the cache points can be determined with <iw-home>/bin/iwstat – c,
for example:
[c:\]e:\iw-home\bin\iwstat – c
Status: Server running

ID Thread User Duration Operation


0xf 0xb94 aoeu\yien 0.000 GetArchiveStatus

Cache Active Available Dirty To Purge


Main 25000 5000 10000 5000
Workflow 0 1000 0 0

shows a system with 10,000 dirty points and 5000 to flush, quite a
busy system indeed! Systems with a large cachesize setting in iw.cfg
can take a long time to flush all the cache points when the system is
extremely busy.

b. Fortunately, this problem should not be too big of an issue as long as


the backup is done during idle hours, which is the usual procedure.

2. Backing up millions of files/inodes can take days

a. Some sites with large numbers of files or a rich version history can
takes days to backup all files in iw-store.

3. Users are shut out from TeamSite while backup occurs.

a. When the backing store is frozen, the server can operate in read-only
mode only, and users are unable to edit files in TeamSite.

4. Backing store constantly grows, and backups can take longer and longer
each week.

5. Customers often have 6-8 hour time window to do backups

a. For many backing stores, it’d take much longer than 6-8 hours to
backup all files in <iw-store>. For some customers, there’s concern
the allowable time window will quickly be surpassed as the backing
store grows each month.

Interwoven Client Services, 2/20/2002 Page 4 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

6. Conventional backup methods useful only for backing stores les s than 10-100
Gigs

a. Because of the above issues, conventional methods of backup will


work only for backing stores of a limited size.

b. Size of backing store can be anywhere from 10-100G depending of


speed of the backup subsystem and/or number of files in the backing
store.

7. Large sites with 100s of Gigs of data require specialized solutions.

4 CALCULATING AND MANAGING BACKING STORE


GROWTH
There are several factors that have a significant effect on the rate of backing store
growth.

1. The allocation unit size, inode/fragment size, or cluster size of the file system.

2. Rate files are added or edited in the backing store.

3. The number of editions published in a branch.

4. The number if items in a directory.

4.1.1 FILE SYSTEM CONFIGURATION


The first factor can easily be the most critical factor in managing backing store growth
due to the large number of metadata files kept in the backing store for versioning
purposes. The guidelines to manage this factor varies slightly depending on the file
system.

On NT, be sure to format NTFS using allocation units of either 512 or 1024 bytes. This
is achieved from the command line with format /A:512 d:, or format /A:1024 d:. You
can also use Windows Explorer or the Disk Management module in the Computer
Management MMC, and tell them to format using an Allocation unit size of either 512 or
1024 bytes.

Interwoven Client Services, 2/20/2002 Page 5 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

On Solaris, use either 512 or 1024 byte inode and fragment sizes when you create an
ufs partition using newfs. For example, newfs -i 512 -f 512 <device>, or newfs -i 1024 -f
1024 <device>. The inode and fragment size pose two different problems on UFS
partitions. In an UFS partition, each unique file takes up an inode, so the backing store
partition should be configured to have a large number of inodes by telling newfs to use
a small number of bytes per inode with the -i command line option. The fragment size
is the smallest amount of disk space a file will occupy. Having a fragment size that ’s
too big means a large number of small files will waste a lot of disk space.

With Veritas, make sure the file system uses 1K clusters instead of the default 8K
clusters.

Throughout the rest of this paper, the terms, file allocation unit size, fragment size, and
cluster size are synonymous.

4.1.2 FILE ADDITION AND MODIFICATION RATE


When you create a new file in TeamSite and submit it, 6-10 associated metadata files
are created along with that file. When a file is modified, a copy of the original file is
created, and the new file takes up additional disk space in the backing store. The rate
files are added and/or changed will have a dramatic effect on the growth of the backing
store, and can be approximated with a formula:

Growth = (n * 10 * f) + (n * nSize) + (m * mSize)

where Growth is how much the backing store will grow, n is the number of files added, f
is the fragment size of the file system, nSize is the average size of new files, m is the
number of files modified, and mSize is the average size of modified files. For example,
a backing store with a 1K fragment size that has 30,000 new files, and 10,00 0 modified
files that are 64K on average would grow by:

(30,000 * 10 * 1K) + (30,000 * 64K) + (10,000 * 64K) = 2,860,000 =~ 2.8GB

Interwoven Client Services, 2/20/2002 Page 6 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

in a month. Please note that these calculations are applicable only up to TeamSite
5.0X. Future versions of TeamSite will have a different backing store structure that will
require a different formula.

4.1.3 EDITIONS AND NUMBER OF ITEMS PER DIRECTORY


Publishing new edition will increase the size of the backing store slightly. This is rather
complicated to calculate, and depends on several factors. For every file that is
submitted and published, there are 2 metadata files created (each about 600 bytes),
and 1 standin file created (about 200 bytes). 2 more files are created that represent the
submit event, and 2 more files created to represent the edition created (each of these 4
are about 250 bytes). If the different files in different directories are touched, the
numbers of directories touched becomes another factor. As you can see, it can be
quite complicated to calculate how much the backing store will grow depending, and it’s
nearly impossible to take in factors such as how many times you submit a file between
publishing editions. Assuming you publish after every submit, you would have a rough
formula that goes something like this:

Increase = n*(nSize + 3*f) + 4*f + (d * dSize)

where Increase is the increase in size of the backing store, n is number of files
submitted and published, nSize is the average size of those files, f is the fragment size,
d is the number of directories touched, and dSize is the average size of directory
metadata files. For example, submitting and publishing 1000 new files that are 64K on
average, and assuming these files span 10 directories where dSize is 2K on average,
your backing store (with 1K fragment size) would increase by:

1000*(64K + 3*1K) + 4*1K + (10 * 2K) = 67,024K =~ 67MB

The data files account for 64MB of the growth, so the metadata files account for only
3MB of growth is this example. From the example, we can see that publishing a lot of
editions in a branch will result in a moderate growth of the backing store. A related
factor is that a directory with a large number of items will have a bigger directory
metadata size, and each time that directory is touched, the backing store wil l grow a bit
more than in this example. Furthermore, both of these factors, publishing a large
number of editions, and having a large number of items in a directory, can cause
severe performance problems over time. As a result, Interwoven recommends keep ing
less than 1000 editions per branch and less than 500 files per directory, but keep the
edition count under 500 if there are 1000 files per directory. Directories over 1000
items should be avoided.

Again, the above calculations apply only to TeamSite 5 .0X and earlier, and future
version of TeamSite will have a different backing store format that will require a different
formula.

Interwoven Client Services, 2/20/2002 Page 7 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

5 HEURISTIC APPROACH TO BACKING STORE GROWTH


As we can see from the previous section, calculating backing store growth accurate ly is
extremely complicated, and the formulas are only approximations of what really goes on
in the backing store. As a result, it may be easier, and more accurate, to calculate a
server’s anticipated backing store growth using a heuristic approach. Using this
approach, you would calculate the size of a backing store on a weekly basis for several
weeks, and based on those numbers, you can get a fairly accurate idea of how it will
grow in future weeks. Further more, these statistics should also tell you e xactly what
resources are being consumed so you can make sure the backing store file system is
optimally configured for the contents in that particular server.

On Windows NT and Windows 2000, the numbers you want to keep track of are as
follows:

1. Run chkdsk on the backing store drive to find out how many allocation units
are being used, verses how many are available on the disk. It will also tell
you how big the allocation unit is on the drive.

2. Right click on the folder that contains the backing store, and click Properties.
After the tally completes, you should get statistics about how the Size verses
Size on disk. The difference between these two numbers is the disk space
being wasted due to having files smaller than an allocation unit taking up a
whole allocation unit.

On Solaris, the numbers you want to keep track of are:

1. Run df – F ufs -o i to find out the inodes used verses inodes free. This will let
you know if you are in danger of running out of inodes on your UFS partition.

2. Run df -k on the backing store partition to get statistics about disk space
used.

3. Run df -g on the backing store partition to figure out the fragment size

After several weeks, you should be able to get a pretty good idea whether allocation
units are running out on Windows, or inodes are running out on Solaris, or the backing
store is simply growing due to a lot of new or modified files in the backing store.

Interwoven Client Services, 2/20/2002 Page 8 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

6 RECOMMENDATIONS FOR THE WINDOWS PLATFORMS


On Windows NT and Windows 2000, there are many commercial vendors to choose
from for conventional backup solutions. These same backup tools can be used in
conjunction with specialized solutions that will be discussed later in this paper. The
main requirements of these tools are:

1. Be able to recover state as of specific time

2. Allow the ability to freeze the backing store prior to a backup and to unfreeze
the backing store after a backup is finished

3. Make multiple copies of backups so that if one tape/copy was to go bad


another copy would be available

4. Full and Level (preferred) or Incremental backups

5. Tape management, i.e., ability to file and record tapes

6. Schedule management, i.e., schedule times when backups are made

You can freeze the backing store by running a batch file using at or a Windows
Scheduled Task just prior to a scheduled backup, and running another batch file just
after a scheduled backup. The following is a list of commercial software to look into
that should satisfy the above requirements:

- Veritas NetBackup

- Veritas Backup Exec

- Legato

- CA Arcserve

- Windows 2000 built-in backup

Most of these commercial backup tools should work fairly well with TeamSite. Windows
2000’s built-in backup tool should work but may not be as feature rich as other
commercial backup tools.

Interwoven Client Services, 2/20/2002 Page 9 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

7 RECOMMENDATIONS FOR SOLARIS


There are many commercial software vendors for Solaris as well. The requirements for
Solaris backup tools are the same as for NT. Similarly, you can freeze the backing
store by running a cron or at script just prior to a scheduled backup, and running
another script just after a scheduled backup.

Commercial software that should work with Solaris includes Legato and Veritas. Also,
the built-in tools ufsdump and volcopy can be used to backup or make a copy of the
backing store. Beware that tools like tar, which do a file system copy, can take
especially long to read the backing store. Sector by sector tools like ufsdump and
volcopy don’t need to read files through the filesystem, and can provide a fixed,
predicable amount of time to backup a file system regardless of the number of files in
the file system.

8 RECOMMENDATIONS FOR VERITAS


With the Veritas file system, be sure it uses 1K cluster size instead of the default, which
is 4K or 8K. Veritas volume manager is capable of having a 3 -way mirror which allows
you to break off a third mirror which you can perform a backup from. The procedure for
backing up a third mirror follows:

1. Freeze the backing store with iwfreeze +<seconds>

2. Break off the third mirror, keeping the other two mirrors for fault tolerance

3. Unfreeze the backing store with iwfreeze -- (users can start using TeamSite in
r/w mode again)

4. Backup the broken off mirror.

5. Reconnect the third mirror when backup has completed to resync with the
active mirrors.

The capability to have a third mirror with Veritas can be found on their web site at:

-http://www.veritas.com/us/products/volumemanagernt/prodinfo.html (NT)

-http://www.veritas.com/us/products/volumemanager2000/prodinfo.html (Windows 2000)

Please consult with Veritas to confirm whether they support 3 -way mirrors with Volume
manage for Solaris.

Interwoven Client Services, 2/20/2002 Page 10 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

On Solaris, Veritas volume manager supports data snapsh ots. The procedure for
backing up a snapshot follows:

1. Freeze the backing store with iwfreeze +<seconds>

2. Create a data snapshot of the backing store

3. Unfreeze the backing store with iwfreeze -- (users can start using TeamSite in
r/w mode again)

4. Backup off the snapshot

5. Delete the snapshot when backup completes

Veritas describes the snapshot capability of Volume manager for Solaris and Veritas file
system for Solaris at:

-http://www.veritas.com/us/products/volumemanager/prodinfo.html

-http://www.veritas.com/us/products/filesystem/prodinfo.html

There is a potential bug with the Veritas file system’s snapshot capability which causes
the backing store to grow abnormally fast. This bug is supposed to be addressed with
VxFS 3.4.

9 NETAPP HARDWARE RECOMMENDATIONS


NetApp filers have a snapshot capability which takes a point -of-time snapshot of the
filesystem. It takes anywhere from a few seconds to a few minutes to create this
snapshot depending on how big your file system is. By default, NetApp filers are
configured to automatically create these snapshot through the day. This auto -snapshot
capability should be turned off, as it impacts end users during the day as they are trying
to use TeamSite. The snapshot capability reduces downtime of NetApp filers to
minutes, and yet enable you to do a full backup.

The procedure to backup a NetApp filer hosting TeamSite’s backing store follows:

1. freeze the backing store with iwfreeze +<seconds>

Interwoven Client Services, 2/20/2002 Page 11 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.
CONSULTING SERVICES

2. create a snapshot using the snap command (i.e., snap create


/mnt/vol0/interwoven backup)

3. unfreeze the backing store with iwfreeze – (users can resume editing files in
TeamSite)

4. backup the snapshot image

5. delete the snapshot image with snap delete /mnt/vol0/interwoven backup

10 WRAP UP
We have seen the proper procedure to backup TeamSite, and the effectiveness of
conventional backup methods for smaller sites. Sites that need near 24x7 operation, or
have a backing store that is too large to backup in the allowable time window need to
use specialized solutions. These include options such as Veritas 3 -way mirroring,
Veritas data snapshots, and NetApp snapshots. Or eilly has a good book on Unix
Backup and Recovery, which can be found at their website,
http://www.oreilly.com/catalog/unixbr/

Interwoven Client Services, 2/20/2002 Page 12 of 12


© 2001 Interwoven, Inc. All rights reserved. Interwoven, TeamSite, OpenDeploy, SmartContext and the logo are
registered trademarks of Interwoven, Inc., which may be registered in certain jur isdictions. DataDeploy, the tagline and
service mark are trademarks of Interwoven, Inc. All other trademarks are owned by their respective owners.

You might also like