Filesets are an important part of Moa - they are used to define input and output files for Moa jobs. In principle, a fileset is not much more than a collection of files. They are three different types:


Type “set”

A “set” fileset is given a filesystem glob, checks the filesystem and returns a list of files that conform to the glob pattern. Type “set” filesets are typically used to define input of a Moa job. A “set” fileset can (currently) contain only one * wildcard. A correct example would be:


This glob does exactly what you expect. Lets assume that there are three sequences in this directory, the set would contain three filenames:


More complex patterns, and wildcards other than * are not supported (yet). Each Moa job can have at most one “set” fileset.

Type “map”

A “map” fileset converts a “set” fileset (the source) to a related fileset, typically to calculate the output of Moa job. A “map” fileset must be linked to “set” fileset and uses a glob like pattern to convert the input “set” fileset to the resulting fileset. For example, if we take the example fileset defined above, and apply the following pattern:


we would end up with the following “map” fileset:


A potential pitfall is the following situation, where we have a “set” fileset defined as follows:


This would result in exactly the same fileset as above. But if we now apply the same “map” pattern, the resulting output fileset would be:


This is because the * from the “set” glob maps the the * in the “map” pattern, the rest is omitted. This can be useful, for example if you would be using this in a Blast job, you could specify the following “map” pattern:


which would result in the following output:


In the case of a “map” set it is allowed to use a second wildcard in the pattern, for example:


in which case the first wildcard is replaced with the original path. In the above example this would result in:


(note . you might not want to do this)

Type “single”

Is a very simple fileset, pointing to a single file. No wildcards are allowed.


Moa has to keep track (using Ruffus) of in- and output of a job - it does this by tracking filesets. The category defines in a file(set) is considered “input”, “output” or a “prerequisite”. In- & output speaks for itself, a prerequisite is also considered input (i.e. if it changes the job will be repeated), but is typically kept out of the one-on-one file mapping that takes place for in- and output files.

Defining filesets

If you are developing a template, there is whole section devoted to filesets. The following example is taken from the Moa BLAST template, and contains almost everything that you will come across:

        category: prerequisite
        help: Blast database
        optional: false
        pattern: '*/*'
        type: single
        category: input
        help: Directory with the input files for BLAST, in Fasta format
        optional: false
        pattern: '*/*.fasta'
        type: set
        category: output
        help: GFF output files
        optional: true
        pattern: gff/*.gff
        source: input
        type: map
        help: XML blast output files
        category: output
        optional: true
        pattern: out/*.out
        source: input
        type: map

Most of this speaks for itself. A few things to note are:

  • Both “outgff” and “output” are category “output”, type “map”, filesets mapping to the same input, type “set”, fileset. This is common practice. If you have a look at the map22 template, you can even see an example of category “input”, type “map” fileset.
  • If a fileset has reasonable default patterns (values) (typically goes for output fileset), it is possible to make them optional.
  • Please specify a good help text