You are on page 1of 11

Control of File Types in Ubuntu

What is a file type?


Let's not go as far deep as "What is a file?", but before we start, let's take a look at file types. File
types are determined by the contents of the files themselves, and are used to allow the opening
program to be chosen wisely. In Microsoft Windows, file extension globbing is the sole method of
identifying file types. Users must provide a common phrase at the end of files, and the files would
be searched by their name, in turn providing the correct icon and program.

Things are a little different in Ubuntu. Of course, globbing (the most basic method) is present for
file types, but Ubuntu has a few other tricks up its sleeve. One of these is magic numbers. The
magic number of a binary file is the first few bytes, which identify the file type. The definition of a
"magic" number has somewhat loosened in recent years; it can now mean any piece of data,
generally near the beginning of a file, that can be used to uniquely identify the type.

Another, more powerful, but too rarely used feature, is XML namespace matching. Without this
feature, all XML files wouldn't be able to be more specifically identified, with the exception of
extension globbing, of course. Namespace matching allows for quick detection of a XML-based
format based on not only the namespace, but also the root element. For example, XHTML files
(application/xhtml+xml) can not only be matched by an xhtml file extension, but also by its
namespace URI (http://www.w3.org/1999/xhtml) and its root element (html).

How are file types detected?


In Ubuntu, programs such as Nautilus use the shared-mime-database as the sole location for file
type information. Unfortunately, other Gnome facilities such as file open and save dialogs only use
extension globbing, and are independent from the MIME database. These databases are stored in a
similar way to how programs can be located in four tiers, /bin, /usr/bin, /usr/local/bin and ~/bin.
These databases can be found in the following directories:

/usr/share/mime
/usr/local/share/mime
~/.local/share/mime
Like the program tiers, it is generally agreed that only MIME types installed from Ubuntu
packages should be located in the first level. System-wide changes by the user or programs
installed via make install are placed in the second tier, while changes local to the user are in the
third.

The directories inside these MIME databases represent MIME groups, for example ./video for
video/* MIME types, and ./application for application/* types. Not all of these directories may exist;
they'll be created on demand for file types. In these directories, there are multiple XML files, each
named by their MIME suffix. They contain nodes with information about magic numbers, extension
globs, parent types, child and alias types, and the file type description (often in multiple
languages).

The update-mime-database command, invoked manually or as a trigger opened when packages are
changed, draws upon the information in these files and turns them into fast-seeking formats that
aren't as friendly as XML. These real databases are in the following files:

aliases: alternate names for MIME types


generic-icons: system icons to be used for files
globs: extension globbing without priority values (deprecated)
globs2: extension globbing with priority values (current)
icons: custom icons for odd file types
magic: magic number database
mime.cache: master cache with the entire database
subclasses: child file types
treemagic: detection of directory structures
types: a list of MIME types
XMLnamespaces: detection through XML namespaces and elements

A long time ago, when I was learning about the MIME database, I used Bless to directly edit these
files to create changes, but was always confused by my changes immediately disappearing. This is
because the information is converted one way from the XML files to the cache files.

The structure of the XML files

Before we use programs to modify the MIME database for convenience, here's a quick breakdown
of the format of the XML files in the database. The root element is mime-info, with the shared
MIME info namespace:

<mime-info xmlns="http://www.freedesktop.org/standards/shared-mime-info"/>

This root element contains any number of mime-type nodes, providing detection information about
a file type. You could even have an empty mime-info node, but that isn't productive at all.
The following are a selection of the most important elements that can be found in mime-type
nodes:

glob nodes with a simple wildcard glob in a pattern attribute. A weight attribute from 0 to 100 is optional, and defaults to
50:

<glob pattern="*.mkv" weight="55"/>

glob-deleteall and magic-deleteall nodes, which clear any cascading of globs or magic numbers from previously parsed
files and starts afresh
magic nodes with an optional priority attribute from 0 to 100 (again defaulting to 50). These contain match nodes, which
define rules for matching using magic numbers. These are the attributes to be used with match elements:

type: one of string, host16, host32, big16, big32, little16, little32 or byte
offset: where to check for the magic, using a single numeric offset or a range notated start:end
value: the value to match with (numeric for any type other than string)
mask: an optional attribute, this can be used for more detailed matches by running a bitwise AND on the potential
match before testing. The value is either numeric (in the type specified) or strings, which are hexadecimal values all
starting with 0x

<magic priority="60">
<match type="string" offset="0" value="DVDVIDEO">
</magic>

alias nodes, with a type attribute specifying alternate or deprecated MIME types that are equivalent

<alias type="video/x-matroska-mkv"/>

sub-class-of nodes, with a type attribute specifying the parent MIME type
comment, acronym and expanded-acronym nodes that help describe the file type to people; xml:lang attributes can be
used to distinguish language
root-XML elements which determine types using XML namespaces have namespaceURI and localName (root element)
attributes

Here's an example XML source file that uses a couple of these features (this file type is bogus, I
just created it for the example):

<mime-info xmlns="http://www.freedesktop.org/standards/shared-mime-info">
<mime-type>
<comment xml:lang="en-AU">DML source document</comment>
<acronym>DML</acronym>
<expanded-acronym xml:lang="en-AU">Delan's Markup Language</expanded-acronym>
<sub-class-of type="application/xml"/>
<glob-deleteall/>
<glob pattern="*.dml"/>
<root-XML namespaceURI="http://azabani.com/dml" localName="dml"/>
</mime-type>
</mime-info>

Assogiate: a GUI editor for the Gnome MIME database


Assogiate is a neat little program that allows you to create and modify file types, modifying the
database in a very user-friendly and quick way. It can access the user database, ~/.local/share
/mime, or the system override database, /usr/local/share/mime. Changes are not, however, placed
in XML files with the file name structure of the MIME type, instead they are placed in ./packages
/Override.xml allowing for a memory of the user-changed file types.

Assogiate can be found in the Ubuntu universe repository:

sudo apt-get install assogiate

In the case that it is not, you can download and compile it:

curl http://azabani.com/files/apps/assogiate-0.2.1.tar.gz | tar xvz


cd assogiate-0.2.1; ./configure; make; sudo make install

You aren't allowed to change the system override database without running the program as a
privileged user, so always run it as root:

gksu assogiate

In the Assogiate window, you can use the toolbar buttons to add and modify selected file types,
remove and revert changes, or search for file types. The left pane allows you to narrow your view
to groups of MIME types, or user modified types.
Adding and editing file types

The process for these two actions is very similar. When you are in the Edit Type dialog, you can
edit canonical information, alias and parent types, globbing, magic numbers and XML namespace
matching each in its own tab.

Bless: a useful GUI-based hex editor


Bless is a very powerful hex editor for inspecting and modifying binary files. A very important use
of this program is in assisting in finding magic numbers for file types. It also can be found in the
Ubuntu repositories as bless:

sudo apt-get install bless

To inspect a file, simply pass the file as the first argument:

bless myfile

For example, in the image above (a Quicktime mov container video), common candiate for a magic
number could be located at offset 4, and is the string "ftypqt". Always check in multiple files of a
known type to make sure that the magic number is consistent. Once a magic number has been
identified using Bless, Assogiate can now be used to add the magic number:

Once information about a file type is altered, the changes take effect immediately. If you are adding
a type, you can reload a directory with a file of the type in Nautilus and see the icon update, or
right click the file and choose Properties to clarify the file type.

Example: adding a magic number for Matroska video


This is a good example because by default, Ubuntu only matches Matroska video with extension
globbing and a magic number that no longer works. In this example, we'll research the magic
number for this file type, and use assogiate to allow Ubuntu to detect the file type of Matroska
video without using an extension on the file.
Firstly, let's open two Matroska files in Bless:

bless Owl\ City:\ Fireflies


bless Remo\ Giazotto's\ Adagio\ in\ G\ minor

Looking at the lower half of the window, you can see that the two Matroska video files share the
first four bytes, which is a convenient place for a magic number. Note that the "Show little endian
decoding" checkbox is not checked; we will be using a 32-bit big endian value.

Opening Assogiate, search for Matroska and open the editor for the video/x-matroska file type:

gksu assogiate
In the File contents tab, click Add. The priority should be higher than 50 as this is a more certain
way of determining the file type; but changing this is essentially optional. The value to be added is
a big32 value of 440786851 at the very start of the file:

That's it! Ubuntu (and programs that use the MIME database, such as Nautilus) will now detect
Matroska videos using their magic number, without the need for an extension.

Example: adding a magic number for .IFO files


It has always annoyed me that Ubuntu didn't have the magic number for the DVD structure files
(IFO files); I couldn't open them automatically in SMPlayer without using the "Open with Other
Application" dialog. Even then, it seemed messy as the files were then matched by their extension
even though the .BUP files were identical and weren't matched. In this example, I will create a new
file type with the magic, so .IFO and .BUP files are detected.

To start off, open Assogiate:

gksu assogiate
Create a new file type. We must choose a MIME type name; there is no formal name for this format
so I chose video/x-dvd-information:

In Bless, I observed that these files shared a magic number where the files all start with the string
"DVDVIDEO". In the File contents tab, add the magic number:

The changes will take effect immediately and that's all that's needed to allow these files to be
recognised. However, upon opening the file, Ubuntu won't know what program to use. You can
choose the application to open with, or use Ubuntu Tweak to associate groups of MIME types
easily in one place.

Enter Ubuntu Tweak


Ubuntu Tweak is an all-in-one solution for modifying various settings about your computer. It
handles everything from software sources to package cleaning, session control to theme settings,
power management to templates and security settings, even file type associations. Note that
Ubuntu Tweak cannot be used to add or modify file type identification info; that can be done with
Assogiate. Ubuntu Tweak allows you to associate a default program to a MIME type.

To install Ubuntu Tweak, download the package file or add the Ubuntu Tweak PPA for automatic
updates. First, authenticate the signature:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys e260f5b0

Then you can add the following line to /etc/apt/sources.list: (replace the RELEASE with your
distribution codename, e.g. lucid)

deb http://ppa.launchpad.net/ubuntu-tweak-testing/ppa/ubuntu RELEASE main

Finally, install the program:

sudo apt-get update; sudo apt-get install ubuntu-tweak

The file/program associations are per-user; run Ubuntu Tweak as yourself:

ubuntu-tweak

All that needs to be done to associate an application for your account against a MIME type is to
visit the File Type Manager page, and associate the program with the file type just as you would in
Nautilus. Again, the changes take effect immediately, though if it doesn't, your local association
database may be overriding the changes.

Summary

Congratulations! You now know everything you need to add, modify and associate file types,
understand how the MIME database works, and have the power of Ubuntu at your hands. The
instructions here apply to many Linux distributions that share similar software without a change,
so this tutorial applies to Linux on the whole as well.

You might also like