Oct 04

TagFS, tracking progress in the field of semantic file systems

Category: Linux,WIP   — Published by tengo on October 4, 2009 at 5:38 am

Having an eye on recent developments in the field of storage, data archival and file systems, I think one can say that one of the more interesting approaches towards saving data on disk is the idea to flatten folder hierarchies and in turn introduce a filesystem that is based on tags. Each file can be "saved" under different "categories" of metadata descriptions, so you never have to search for files in deep directory hierarchies again.

Why using directories to organize files and data is (or can be) a bad idea is nice illustrated on John's blog code@work here. The basic idea being a number of files belonging to different tags:

My Test Doc, belonging to Document John’s Report, belonging to Document, Report
Yearly Report, belonging to Document, Report
My Recruiting Report, belonging to HR, Document, Report
Victor’s Resume, belonging to HR, Document, Resume
HR Evaluation, belonging to HR, Document, Report

Now each of these tags, like "Document", in turn is its own directory, entering it will list you all the files belonging to the "Document" group. The concept further clarifies when reading questor's tage on it. He also provied a Dokan based prototype of an implementation there. Other sources discuss the topic, one being an Ubuntu idea, but still, as far as I know, there are few projects beyond a proof-of-concept or early dev stage. Another thread discussing the factual implementation points out that "database oriented filesystems are the future".

Further reading:

Below a list of current projects, beta implementations and the like, in no particular order:

  • SemFS, formerly TagFS (University of Karlsruhe)
    "SemFS is a semantic file system based on "RDF":http://www.w3.org/TR/rdf-primer/. On Windows, it can be mounted as a WebDAV drive. For Linux, SemFS can be used as a user level file system via FUSE. SemFs currently supports tagging of files and browsing according to various ontologies.", described here and here in slides - untested, Linux, Win32 in development
  • DBFS or kdbfs (University of Trente)
    "It is a new type of file system that does away with places where you store your files. Actually do not think of it as a file system, instead think of it as a document system.", described in slides here, includes a custom browser application, untested, Linux only
  • Tagsistant
    "A reasoning semantic filesystem for Linux and BSD ", also see the Wikipedia page about it, appears to be quite mature, is now based on SQLite - untested
  • Stratus
    "Stratus is a filesystem overlay that allows you to organize your files by tags, in the style of Flickr, YouTube, Danbooru, etc.", Linux+FUSE+SQLite, (SF page) - untested
  • TaggedFS, works without a central db but with sidecar files - untested
  • NHFS
    "NHFS allows you to file any file into any number of directories. Likewise, you may place any directory into as many directories as you like. NHFS therefore allows you to create a nonhierarchical directory structure with polyhierarchically connected files." Linux+FUSE, untested
  • TagLayer (github)
    A working implementation of a read-only tag-filesystem, which can be mounted alongside the traditional hierarchical dir-tree - FUSE+Perl+SQLite
  • xtagfs
    "XTagFS is a FUSE filesystem that organizes files/folders in Mac OS X using 'Spotlight Comment' tags. Tags are represented as folders in XTagfs and tagged files are stored as links within them." - untested
  • LCARS Deskworks
    a tag based approach under the gui of the Star Trek LCARS desktop - untested
  • Nascent's TagFS
    "This is an attempt at a proof of concept. The idea is to show that browsing files via tags is a viable option. This is in alpha and as it is only a sim, it is likely to stay that way." early stage, Linux - untested
  • tag-fs
    "This is a small, userspace filesystem based on tags, rather than on directories." - untested
  • MetaFS "
    MetaFS is an enhanced filesystem layer for Linux. It provides additional information about files (such as MP3 tags) to the user through standard filesystem interfaces (such as extended attributes). It also provides more flexible file management, by building search and other such services into the filesystem itself." - untested
  • MetaFS-Prototype
    From the developer: "I've implemented tags, but the tags can contain also values. For example the title-tag of an mp3-file can contain the title of the music-file. With this enhancement it's much easier to filter for tags because you can specifiy two things(tag and value)." - untested
  • tagfs
    "An AI that will take the tags associated with files to create a hierarchy of tags that would then be browsed" - untested, work has halted in unusable state
  • labelfs
    "...in addition to files (uris), labels can also be labeled with other labels, creating a Directed Graph of labels and uris. So you can create a label hierarchy (tag hierarchy)." Screenshots of the GUI here. -- added on Feb 24, 2012
  • dhtfs
    "Tagging based filesystem, providing dynamic directory hierarchies based on tags associated with files" a usable implementation, last release 2007,based on Python and Fuse, on code.google.com
  • Leaftag
    "Tagging for the Linux desktop" another implementation, last release 2006 http://www.chipx86.com/w/index.php/Leaftag (offline!)
  • Windows Vista and Windows 7 allow tagging of files (and folders?). I don't know how it is implemented, as Microsoft pulled the plug on Windows Future Storage, WinFS. Read.

Related:

  • Disco Distributes Filesystem probably overkill for desktop usage, DDFS is the storage component of Nokia's actively maintained map/reduce framework. DDFS is designed to run on clusters and relies on tags alone to locate blobs of data. Exposes only an API. Mostly in Erlang.
  • WinFS (short for Windows Future Storage, see Wikipedia page)
  • Apple Spotlight
  • Nepomuk
  • Facetmap
  • iTunes music library browsing
  • Comment "...stores comments on a per directory basis"
  • Xapian
  • Amazon S3 files are accessed by a key-value approach, leaving the organisation to metadata

This page is sort of awork inprogress page, so it might receive updates in the future.
Comments welcome!