You are on page 1of 16

AUTHOR--- ANKIT VERMA

TOPICS

PAGE 1: INTRODUCTION TO FILE SYSTEMS

PAGE 2: MORE ABOUT FAT (FAT 32)

PAGE 3: NEW TECHNOLOGY FILE SYSTEM (NTFS)

PAGE 4: FILE AUDITING

PAGE 5 -SELF-HEALING NTFS

What is a File System?


First, let's understand what a file system is. A file system can be
thought of as the way your computer goes about managing the files
that gets stored on your hard drive. Your computer has thousands
upon thousands of files. If there were no organized way of managing
them, your system would be infinitely slow, provided that it works at
all. This is understandable if you just consider how much stuff you
have piled in your office, and how much time is wasted finding stuff
that's buried under a ton of paper. Now take that mess, and multiply it
by a thousand. That is what your computer would be going through if
an efficient file system didn't exist. And just like there are all kinds of
people in the world who organize things differently in the office, there
are many file systems out there with varying features. However, there
are several key functions that no file system should be without:

• Efficiently use the space available on your hard drive to store the
necessary data
• Catalog all the files on your hard drive so that retrieval is fast
and reliable.
• Provide methods for performing basic file operations, such as
delete, rename, copy, and move.
• Provide some kind of data structure that allows a computer to
boot off the file system.

There are of course other file systems that go beyond meeting basic
requirements by providing additional functionality, such as
compression, encryption, password/permissions, filestreams, etc. Later
on in this article, I will discuss some of the extra features in relation to
Windows NT's NTFS.

1 byte (B) =8 bits (2^3)


1KB= 1024 bytes (2^10) bytes or >10^3
1MB=1048576 (2^20) bytes or >10^6
1 GB = 1073741824 (2^30) bytes OR >10^9
1TB =1099511627776(2^40) bytes OR >10^12

FAT in Detail
Note: This section is more technical in nature than the rest of the
article. Feel free to skip if you'd like. But be warned that you'll miss
some interesting tidbits about the FAT you probably never knew.

So what is FAT, and how do file systems work? The answer is quite
simple in fact. The space on your hard drive, at its most basic level, is
divided into units called sectors. Each sector is 512 bytes. So if your
hard drive had 10 Kilobytes (10*1024) worth of total disk space, that
would mean it is divided into 20 sectors. (1 BYTE = 8 bits),
(10*1024) / 512=20) But the file system doesn't directly deal with
the hard drive on a sector by sector basis. Instead, it groups a bunch
of sectors together into a cluster, and it deals with the cluster. These
clusters are also called allocation units by DOS. So another way of
thinking about this is to suppose that each sector on your hard disk is
a person carrying a bag, where you can store 512 bytes of information
into each bag. Now instead of numbering each person as 1,2,3, etc ...
The file system first takes several people and put them into a group,
and call that group 1. So if you had 400 people, and the file
system decided to put 4 people to a group, then you'd have 100
groups(400/4=100). In other words, on a drive with 400
sectors (or roughly 200K of space (512*400=204,800 BYTES
OR 204800/1024=200K), and with an allocation size of 4
sectors OR 2K , (4*512=2048/1024=2K), there would be 100
clusters (GROUPS) . So then when the file system needs to
access a particular sector, it would first find the cluster number
of the sector, and then within that cluster, it would access that
particular sector by its sector index. This is akin to saying to
find a person, say Jon, I would find Jon's group number first,
and then go to his group and look for him.

All three of the file systems (FAT16, FAT32 and NTFS) work like this.
So what is the difference between FAT16 and 32? The major difference
lies in how much space each file system can handle and how efficiently
the file system does it. The problem with file efficiency arises
because each cluster on a hard disk can only store one file!
That means each group can only be made to handle one item.
To illustrate my point, consider the following situation:

The file system decides to divide all the people into groups of 8 (we'll
get into how this number of chosen later). Each of these 8 people has
a bag that can store stuff.

Now the file system hands the first group a huge box of pencils and
says "store this." The eight people start to put the pencils in their
bags, and after one fills up, they move on to the next. The box of
pencils fills 7 bags.

The file system tries to hand the group another small thing to put into
the last 8th bag which is empty. But the group says "sorry, we can
only handle one thing. You gave us one already." The file system says
"fine, but you are wasting 12% of your space (1/8 = 0.125)" The
group tells the file system "sorry, we can't help it." The file system
moves on.

Now the file system gives the next group of 8, only a single pencil to
store. The group stores it and refuses to take anything else. The file
system informs the group that they are wasting almost 100% of their
storage space. But there is nothing they can do.
These stories may seem silly, but they do get the point across,
which is that as the size of the clusters increase, the amount of
space you waste will increase. It is true that if you can make all
your files precisely the same size as your cluster, then you'd
have 0% waste. But that is not possible. Most typical files are
not very big, and if the cluster size gets huge, then the waste
can be quite alarming.

So now the question becomes how does my computer figure out the
size of each cluster? The answer is simple, take the size of your hard
drive, and divide that by the number of clusters involved. So what I
am saying is this:

That means your max FAT16 size is = 65535 * 128 * 512 = 4 GB


(429457408)

“”””Wait a second? That's not right! I thought the limit was 2GB?”””

(AS FAT 16 SUPPORTED ONLY PARTITION NOT MORE THAN


2GB)

And I thought each cluster in FAT16 can be only 32K, not 64K! And
you would be right. The problem is that 128 sectors * 512 bytes per
sector is 65536, which is one more than a 16 bit number can
handle.(2^16-1= 65535 ONLY AND HERE IT IS 65536) So again,
we decrement to 64 sectors per cluster, which yields us 32K per
cluster. And 32K per cluster * 65535 is roughly 2GB.

File Systems: FAT, FAT16, FAT 32


Now FAT32 solves this problem by removing the 65535 clusters per
disk limitation. FAT32 now uses 32bit number, which is a number with
32 digits.= That allows it to count much higher. And since it can handle
a bigger number of clusters, its cluster size is much smaller than that
of FAT16 for bigger disks. In fact, FAT32's maximum disk size is 2
Terabytes.

(THUS FAT32 CAN SUPPORT UPTO 2 TERABYTE OF


PARTITION )
To get this number, you take the total number of sectors addressable
(and I do mean sectors), which would be 2^32 – 1=
(42,94,967,295) and multiple that by 512 bytes per sector.=

(2^32-1*512= 2.199023256*10^12)

FOR FAT16 IT WAS (2^16-1=1310,71) WITH DIFFERENCE OF


(42,94,967,295-1310,71= 42,948,362,24)

. That's a whopping 2048 Gigabytes, or 2 Terabytes

. At this point, some of you may be scratching your heads trying to


figure out the inconsistencies in my explanation. The first item to
address is that even though the file system accesses the sectors by a
cluster count first, that still doesn't alleviate the need to number the
sectors individually. Even in FAT16, the sectors are numbered. And
that leads to the second concern some of you may have. Since FAT16
uses 16 bit numbers, doesn't that mean that there can be only 2^16 -
1 sectors? Wouldn't that translate into 32 megs? Yes. You are right.
But unknown to most is the fact that since DOS 4.0, the underlying
sector numbering had already been changed to a 32bit value! The limit
placed on the disk size was purely due to the 16bit numbering of the
clusters, and the limit of the numbering system for the sectors in each
cluster, as discussed above.

Ok, so we know what sectors and clusters are. But how does that get
translated into files? That is where the File Allocation Table comes in.

The FAT is a huge database that contains records of where each file is
on the disk. In fact, it would not be too much of a stretch to just think
of the FAT as a table with several columns that each record something
about the files on the drive. Each record inside the FAT will take
up 32 bytes of space. In other words, if I had 100 files on the
computer, it would take the system roughly 3200 bytes to
record all of that information into the FAT. Just for fun, let's
take a look at what is stored in these 32 bytes:

Byte Range Info Stored


1 to 8 Filename
9 to 11 Extension
Attributes (i.e. read-
12
only, archive, hidden)
Reserved bits for latter
13 to 22
features
23 to 24 Time Written
25 to 26 Starting cluster
29 to 32 File Size

Interesting list isn't it? Some of the entries are self-explanatory. But
there are two that are rather interesting. The first thing to look at is
the Starting Cluster field. Some of you may have been wondering how
the system translates cluster and sector indices into filenames and
such. The answer is that for each file, there is a field in the FAT that
indicates the first cluster of the file. The system would read that FAT
entry and then find the starting cluster and read the file. Now the
question is how does the system know when to stop reading?
Furthermore, even before that, 2how does the system know where to
read next after this cluster? The answer is that written within each
cluster is the address of the next cluster that contains information
from this file. So a computer reads the current cluster and checks to
see if there are any other clusters after it. If there is, it MOVES to that
cluster and reads it, and checks for the next one. This process repeats
until it finds a cluster with no pointers. The CS majors reading this
would recognize this as a Linked List implementation.

The other interesting feature of this table is that each directory entry
(record in the FAT) uses 4 bytes to store the size of the file. This may
not seem like much at first. But what it actually tells you is the
maximum size possible for any single file. The fact that we use 4 bytes
to store a file size tells us that the largest number that can be
represented is 32bits (2^32-1=4,294,967,295 IS SAME AS 4 GB
=4*2^30-1). So what is the largest 32bit number? That would be
2^32 - 1. So a file can have a maximum of 2^32 -1 bytes, or 4
Gigabytes(THE MAXIMUM SIZE OF THE FILE IN FAT 32 THAT
COULD BE STORED ON TO THE DISK FORMATED WITH FAT32).
This calculation is obviously done under the assumption that we are
using FAT32.

The last two fields I'd like to take a look at are the filename field and
the reserved bytes field. The interesting thing about the filename field
is that DOS uses that field to perform undelete. When you erase a file
in DOS, you aren't actually erasing the file. All you are doing is
changing the first letter of the filename field into a special character.
And as far as the file system is concerned, the file isn't there, and the
next time a file is written to this cluster, the current file is erased.
The way DOS performs an undelete is to simply change that
first letter back to something else. That is why when you used
undelete in DOS, it always asked for the first letter of the
filename before it could restore the file. Mystery solved.

Now let me just make a quick mention of the reserved fields. The
reserved fields didn't do much in FAT16, but it became rather useful in
FAT32 and in NTFS. Since FAT32's cluster numbering used 32bit
numbers instead of 16bit, as was the case in FAT16, the system
needed two extra bytes to accommodate the added digits. Those two
bytes were taken out of the reserved field. And in NTFS, compression
attributes, some security information was also written into the
reserved field of the FAT.

Before I move on, I'd like to point out a few of the other differences
between FAT16 and FAT32. In FAT32, the root directory is unlimited in
size. What this means is that you can have as many files and
directories in C:\> as you'd like. In the days of FAT16, you could
have a maximum of 255 directory entries. That means that if
you had normal filenames of 8 letters + 3 extensions, you have
a maximum of 255 directories + files. That may seem like more
than you'd need to put in the root directory. And it probably is ,
if you had 8.3 filenames. But in Win95, the system can support
long filenames. The trick is that Win95 combines multiple
directory entries to support long filenames. So consider a file
that's named "My English Paper". That is 16 letters long. So it
takes 2 directory entries, at least. Actually, it takes 3 directory entries.
It takes 2 for the long filename, and another one for the short 8.3
filename to be compatible with DOS and Win3.1. As you can see, long
filenames can quickly deplete directory entries.

Another nice feature is that FAT32 has better FAT redundancy.


Both FAT32 and FAT16 store two copies of the file allocation
table on disk. But traditionally, the file system only read from
one of them. In FAT32, the system could choose to read from
either one, which provides a better failsafe for freak accidents
involving corrupt file tables.

It is apparent that FAT32 is a superior file system then FAT16.


Unfortunately, FAT32 is not supported by every operating system. The
original version of Windows 95 couldn't read FAT 32. It wasn't until
version B (OSR2) did Win95 gain that ability. And all versions of WinNT
before 5.0 (named Windows 2000 or short Win2K) could not read
FAT32 drives either.

New Technology File Systems: NTFS

Now that I've covered FAT16 and FAT32 both in excruciating detail,
let's turn our attention to NTFS, the proprietary file system for WinNT.
While FAT32 was a decent system, it lacked some of the more
advanced features that many businesses need to run a network. Chief
among them are file level security, encryption, event logging, error
recovery and compression. NTFS (5.0) provides all of these features in
a nicely optimized package.

Permissions:
The feature that NT is probably best known for is its file level security.
With NTFS permissions, you can control which users have what kind of
access to which files. This is a stark contrast to the "security" in
Windows 9x, where the system policy editor affords the only measure
of protection. Once a knowledgeable user gets past the policy
protections, which is only skin deep (or interface deep), every file on
the system is his for the taking. In Windows NT, even if you get past
the interface lockouts, you'll still have a hell of a time accessing other
people's files, because they are locked at the file level.

Before I discuss how to set file permissions, we need to take a step


back and look quickly at how permissions in general work on Windows
NT. Windows NT's security model is an entire topic onto itself. So I will
not cover it in detail here. However, a general overview will prove
beneficial.

With Windows NT, you can assign security at two different levels - on a
per user basis, and on a group basis. So, if there is a user called Jane
who belongs to the Marketing User Group, you can affect Jane's access
permissions by either assigning permissions to her account, or to her
group. So what happens if Jane's group has Modify Access to a
document, but Jane is only assigned Read Access? Surprising enough,
the least restrictive of the two permission sets takes precedence, in
this case, the Modify Access. The one glaring exception is the No
Access permission. If a No Access permission is assigned at any level,
the user has no access, regardless if any other permissions assigned.
So if the Marketing Group is assigned No Access, Jane would have no
access even if her account is assigned Full Control.

So there you have it, NT file level security at a glance. There is so


much more to it, but as this is an intro article, a more in-depth
exploration of NT file security seems more appropriate in a separate
article. With that said, let's take a look at some of the other features of
NTFS.

Compression:
Another useful feature is compression. It works transparently (like
DriveSpace), and can be assigned to individual files (unlike
DriveSpace). To turn on compression for a file, right-click on it and
choose Properties. From the Properties menu, you can check the
compressed attribute. The same can be done on a directory.

Encryption:
But what is even more useful is the encrypted file system (EFS)
included in NTFS 5.0. With EFS, you can actually encrypt a file, rather
than just protect it via permissions. This is a long overdue feature
since there are other operating systems on the market which will read
files on an NTFS volume while bypassing the NT security. BeOS is one
example, one which I have used. Various flavors of Linux might also
provide similar functionality, though I have yet to personally encounter
one. However, if a file is encrypted, then such dangers are drastically
mitigated. NT5's EFS is a system level service, which means it runs
even when all users are logged off. This also prevents hackers from
easily disabling the program, as is the case with user mode encryption
programs. Moreover, the encryption system works transparently with
respect to the user. What that means is that if a user has permissions
to decrypt the file, then when the file is being accessed, it will be
decrypted seamlessly, without any user intervention. On the other
hand, if the user does not have appropriate access, then an "Access
Denied" error will pop up.

In principle, EFS works on a public/private key system (via CryptoAPI


if you are interested). When a file is encrypted, a file encryption key
(FEK) is automatically generated. That randomly generated FEK is
used to encrypt the file(or folder). The FEK is then, itself, encrypted
using the user's public key. A list of encrypted FEKs is stored as a part
of the file content. When the user tries to access the file, the system
will attempt to decrypt the FEK with the user's private key. If it
succeeds, then the decrypted FEK is then used to decrypt the actual
file. However, if a file is copied to a non-NTFS partition, then a
plaintext version of the file is created.

To activate encryption, simply right-click on a folder and choose


Properties from the popup. Then simply check the Encrypt checkbox.
By default, Windows Explorer will only allow folders to be encrypted
(which is the recommended method). However, the command CIPHER
can be used to encrypt on a per file basis. To encrypt:

CIPHER /e myfile.txt

To decrypt:

CIPHER /d myfile.txt

The other nice thing about EFS is that it offers a data recovery
mechanism. A data recovery agent is automatically configured. In a
Windows 2000 domain, that defaults to the domain admin. The
assigned security agent could then decrypt any file that is under his
scope. It is important to note that when recovery occurs, only the file's
FEK is revealed, and NOT the user's public key. This way, it prevents
the security agent from accessing files that are not under his scope. As
always, the domain admin will have the power to delegate security
recovery rights to other user groups as to provide both flexibility and
redundancy.

File Auditing

However, just protecting the file against possible intruders is not


enough. There must be a way for an admin to know if a file hack has
been attempted. This is where File Auditing (event logging, if you will)
comes in handy. With NTFS, you can keep track of who has tried to
access what file, and if they succeeded. To enable file auditing, use the
following steps:

1. First, make sure that File access auditing is turned on via User
Manager.
2. Then simply go into the Security tab of any file you wish to audit
and click on the Audit button.
3. Now simply add the users whom you wish to audit for the given
file, and then click OK.
4. Now select the events you wish to audit. Click on OK.
5. To view the audited events, go through Event Viewer and look at
the security logs.

Data Recovery:
But what good is protecting your data if it simply gets corrupted when
the system crashes? Here too, NTFS has a solution. NTFS has superior
data recovery capabilities (compared to FAT and FAT32). Each I/O
operation that modifies a file on the NTFS volume is viewed by the file
system as a transaction and can be managed as an atomic unit. When
a user updates a file, the Log File Service tracks all redo and undo
information for the transactions. If every step of the I/O process
succeeds, then the changes are committed to disk. Otherwise, NT uses
the Undo log to roll back the activity and restore the disk to a state
before the changes were made. When Windows NT crashes, NTFS
performs three passes upon reboot. First, it performs an analysis
phase where it determines exactly which clusters must now be
updated, per the information in the log file. Then it performs the redo
phase where it performs all transaction steps logged from the last
checkpoint. Lastly, it performs the undo phase where it backs out of all
incomplete transactions. Together, these steps ensure that data
corrupt is kept to a minimum.

Yet Another Cool (But Scary) Feature:

At this point, you probably think NTFS is pretty cool. But there is one
other cool feature in NTFS that is documented, but not very well
publicized (for obvious reasons as you will see). What I am referring to
is filestreams (Unix users will be familiar with this feature). To
illustrate the concept if filestreams, let's first picture any file (whether
it be a document, a exe or a jpeg) as a garden hose. When you access
the data in the file, that data flows through the file in a continuous
stream, like water flows through a garden hose. In a typical file, there
is only a single data stream, the default stream. All data written to and
read from the file comes out of that stream. When Explorer displays
(or the command interpreter) reads the size of the file, it is reading
the data stored in that stream. In FAT and FAT32, this fact was of little
concern since any file could only be given a single stream (the
default). However, this all changes in NTFS, which allows any given file
to have multiple data streams. This is akin to a garden hose that has
within it multiple smaller hoses, each with its own stream of water
flowing. In fact, each stream can contain different types of data. One
data stream could be a text document, while another could contain
WAV file data, another that contains executable code, and yet another
that contains jpeg data. You can almost think of files with multiple data
streams as a special kind of folder with multiple files stored within it.

To illustrate my point, let's create a text file with multiple filestreams:

1. Go to Windows NT's Command Interpreter (type cmd at the Run


prompt)
2. Switch to a partition that is NTFS.
3. Type the following:
echo This is what you'll see >> stream.txt [Press Enter]
echo This is what you won't see >> stream.txt:hiddenStream
[Press Enter]
4. Now, open the file up in Notepad

What you'll see is the text "This is what you'll see." The other string of
text "This is what you won't see" is in the file, but it is stored in a
separate file stream called hiddenStream. And since most programs do
not read data from any stream other than the default stream, that
data is hidden from the user. To view the contents of the hidden
stream, do the following:

1. Go to the NT Command Interpreter.


2. Type the following:
more < stream.txt:hiddenStream
3. And viola! There is your hidden stream

At this point, you should be getting chills, because filestreams brings


up some very disturbing possibilities for writing viruses and such. A
virus writer could conceivably write the executable code for his virus
into a hidden stream of a text file! This way, normal virus scanners
would not find the harmful code. To activate the virus, the malicious
programmer need only to write a catalyst program that performs a
seemingly innocuous file read operation from a text file. The worst part
of all of this is that hidden streams are difficult to detect because data
written into the file stream is NOT calculated as a part of the file's size.
So you could have a text file that contains 20 bytes of text and 2 megs
of executable code and show up as 20 bytes. Even worse, any user
could create files with hidden streams, even your guest account users
(assuming they can write to a directory).
Thankfully, the situation is not hopeless. For one, hidden file streams
can be detected via the use of Windows APIs. Secondly, all hidden
streams are lost when the file is copied to a non-NTFS partition. So
conceivably, anti virus firms can write scanners that scan form hidden
streams. To the best of my knowledge, there haven't been any serious
viruses written to take advantage of this particular feature in NTFS. For
now, you can rest easy knowing that the end isn't quite here yet. But
definitely keep filestreams in mind, for if there is a security weakness,
somebody will find it sometime.

Conclusion:

There you have it - the three most common file systems in a nutshell.
I hope this article has been at least mildly entertaining for some of
you.

Self-Healing NTFS

Updated: January 21, 2008

Traditionally, you have had to use the Chkdsk.exe tool to fix


corruptions of NTFS file system volumes on a disk. This process is
intrusive and disrupts the availability of Windows systems. In the
Windows Server® 2008 operating system you can now use Self-
healing NTFS to protect your entire file system efficiently and reliably,
without having to be concerned about the details of file system
technology. Because much of the self-healing process is enabled by
default, you can focus more on productivity, and less on the state of
your file systems. In the event of a major file system issue, you will be
notified about the problem and will be provided with possible solutions.

What does self-healing NTFS do?

Self-healing NTFS attempts to correct corruptions of the NTFS file


system online, without requiring Chkdsk.exe to be run. The
enhancements to the NTFS kernel code base help to correct disk
inconsistencies and allow this feature to function without negative
impacts to the system.
Who will be interested in this feature?

Self-healing NTFS is intended for use by all users.

What new functionality does this feature provide?

Self-healing NTFS provides the following functionality:

• Helps provide continuous availability. The file system is


always available, NTFS corrects all detected problems while the
system is running, and Chkdsk.exe does not have to run in its
exclusive mode except in extreme conditions.
• Preserves data. Self-healing NTFS preserves as much data as
possible, based on the type of corruption detected.
• Reduces failed file system mounting requests that occur
because of inconsistencies during restart or for an online
volume. Self-healing NTFS accepts the mount request, but if the
volume is known to have some form of corruption, a repair is
initiated immediately. The exception to this would be a
catastrophic failure that requires an offline recovery method—
such as manual recovery—to minimize the loss of data.
• Provides better reporting. Self-healing NTFS reports changes
made to the volume during repair through existing Chkdsk.exe
mechanisms, directory notifications, and update sequence
number (USN) journal entries.
• Allows authorized users to administer and monitor repair
operations. This includes initiating on-disk verification, waiting
for repair completion, and receiving progress status.
• Recovers a volume if the boot sector is readable but does
not identify an NTFS volume. In this case, the user needs to
run an offline tool that repairs the boot sector. Self-healing NTFS
can then initiate whatever scan is necessary to recover the
volume.
• Validates and preserves data within critical system files.
For example, NTFS will not consider Win32k.sys to be a special
file. If it repairs corruption in this file, it might leave the system
in a state where the system cannot run. The user might be
required to use system restore and repair tools.

You might also like