Welcome to Moa!¶
Command line workflows in bioinformatics
Moa aims to assist a bioinformatician to organize, document, share, inspect, execute and repeat workflows in a command line environment - without losing any of the flexibility of the command line (see Goals).
Download and installation instructions can be found in: Installation.
- *NOTE: The software (and manual) are under development. Things might
- still change.*
Introduction¶
These days, generating massive amounts of data is an everyday element of biological research; and almost all projects have a computational biology, or bioinformatics, components. Such embedded work commonly consists of chaining a number of 3rd party tools together, often with some data manipulation in between the steps. It is important to have such projects properly organized, particularly when a projects grows bigger.
There are many different ways to organize a bioinformatics project. Many bioinformaticians use the command line, scripts or Makefiles to organize and automate their work. This approach has obvious advantages, most importantly flexibility. With almost any approach, meticulous care needs to be taken to keep a project well organized and documented. If this is not done, it is easy to lose track, certainly when others have to try to make sense of your project.
Moa hopes to make meticulous organization of a command line project much less of a burden - leaving you to focus on the fun parts.
Thoughts on workflow organization¶
Most (bioinformatics?) projects start small, and grow over time. From that perspective it is advisable to give the organization of your project some thought on forehand.
When using Moa a workflow resides in a directory tree, with each directory containing the separate analysis steps. A Moa job is linked to a directory, and one directory can contain only Moa job. In- and output data of each analysis typically resides in the same directory structure. Having bot structure and data as regular files on your file system makes a workflow extremely accessible. It is however important that the directory structure represent the workflow in a logical manner.
There are likely multiple ways of achieving a healthy organization of a bioinformatics (Moa) project, we proposes the following organization:
- On the highest levels organize your project according to fundamental divisions in the project or data source. For example, if you work with data from multiple organisms, that might be a good top level division.
- On lower levels start organizing your annotation pipeline. Since most
Table of contents:¶
Goals¶
The objective of Moa is to assist a bioinformatics project by keeping it;
Organized:
Moa facilitates project organization in many (smaller and more major) ways by providing a uniform way to capture commands as Moa jobs. Each Moa job is linked to a specific directory, which contains all configuration, template, data, and intermediate data available as files within the directory structure.
Documented:
Moa provides the possibility to add a title, description and changelogs to each job.
Reproducible
By having all templates and configuration copied into a workflow - the workflow never changes (unless the user wants it to), even if templates in the repository change. Moreover, all templates are easy to find and inspect so it is always clear what happened.
Reusable & Shareable:
Moa provides reusable templates. New templates are easy to create, adapt and share. Workflows can be archived and reused with different data.
Flexible:
Moa provides a good number of hooks to insert custom code into a workflow, making that code part of the workflow. This ensures maximum flexibility.
Installation¶
Prerequisites¶
Moa is developed and tested on Ubuntu and RHEL and is expected to operate without much problems on most modern Linux distributions. Moa requires Python (2.6 or 2.7). Moa will not work with earlier versions or with Python3.
Recommended prerequisites are:
python-dev: The Python development libraries. A number of prerequisites installed by pip or easy_install will try to compile C libraries, and need this. Although all have a backup, Python only, version - the performance of the C, performance will suffer. On a debian based distribution, call:
sudo apt-get install python-dev
While on RHEL flavoured distribution users might run:
sudo yum install -y python-devel
- python-yaml: This will install a faster YAML parser, as opposed to
the python only YAML parser you would probably get when installing through pip or easy-install. On a debian based distro:
sudo apt-get install python-yaml
While on RHEL flavoured distribution users will find this in the EPEL repository and might want to run:
sudo yum install -y pyyaml
Git integration¶
One feature of Moa is the ability to integrate with Git to keep track of your workflow. If you want to use this, you (obviously) need Git installed. For most applications the package manager version is fine. However, Moa is able to pull templates from git repositories. If you want to use that feature, you must install git subtree. This application comes bundled with recent version of Git (certainly with 1.8) but still needs to be installed separately. Otherwise, it can be downloaded from the “apenwarr” repository.
Blue Ringed Octopus¶
Blue Ringed Octopus is a (randomly named) repository with a number of helper scripts, used by a number of templates. You might want to install this - just check out the repository and either add it to your PATH, or copy the scripts to a location in the PATH. The repository can be found here:
https://github.com/mfiers/Blue-Ringed-Octopus
Installation of Moa¶
It is most convenient to install Moa from the Python package index:
pip install Moa
(You might need root rights to do this, also - pip is similar to easy_install, so if you want you can run easy_install Moa)
You will definitely need pip installed to run the pip command above which is a replacement for easy_install.
Note that it is possible, and even recommended, to install Moa within a virtual environment.
Moa should now work, try moa –help.
Bash integration¶
Moa comes with a number of functions to improve integration with Bash. To turn these on, execute the following command (or add this to your ~/.bashrc):
source $(moainit)
This does a number of things:
adds an alias msp for moa set process
adds tab completion
records a bash history for each separate moa job, equivalent to your bash history, but stored with your job.
Note that this is a possible privacy concern. Commands that are not related to your workflow will be recorded (and possibly shared) as well. If you want to remove your history, delete .moa/local_bash_history. For a complete workflow run (in the root of that workflow):
find . -name local_bash_history | xargs rm
The local_bash_history is, however, not tracked by the Git module (unless specified explicitly)
Manual installation (from Github)¶
When installing manually, you’ll need the following prerequisites:
Once these are installed, you can get Moa from, Github. Run the following command (in an appropriate location):
git clone git://github.com/mfiers/Moa.git
To install Moa, run:
cd Moa
python setup.py install
If this is for a global installation, you’ll need to be root, or use sudo.
Moa should now work, try moa –help.
Troubleshooting¶
A potential problem could be that your python version is NOT python2.6 or python2.7 there are a few options that you can pursue:
- Make sure python2.6 or 2.7 is installed.
- define an alias in your ~/.bashrc: alias moa=’python2.7 moa’
- create a symlink to python2.7 in your ~/bin directory and make sure that that is first in your path - but note that this will change the Python version for all you user scripts.
Tutorial 1¶
(note, to fully use the blast template - you will need the blastReport script from the Blue Ringed Octopus repository).
This quick start aims to help you understand how Moa can help to organize a command line bioinformatics project.
Each Moa workflow consists of separate Moa jobs. An important feature of Moa is that each Moa job resides in a directory, and each directory can hold only one Moa job. A workflow is organised as a directory tree, where the structure of the directory tree reflects the structure of the project. This will (hopefully) stimulate a user to break a workflow down into atomic parts, which will be beneficial to the organization and coherence of a workflow. So, starting a Moa project starts with creating a directory to hold the workflow:
$ mkdir test.project
$ cd test.project
$ mkdir 00.proteins
## copy some protein sequences in 00.proteins
$ mkdir 10.blast
$ cd 10.blast
The order of steps can be ordered by prefixing directory names with a number. Note that this not enforced by Moa. Once a directory is created, a Moa job can be created (see moa new):
$ moa new blast -t "demo run"
All interaction with Moa is done through a single command: moa. It is, at all times, possible to get help on the use of the moa command by invoking moa -h or moa –help. The command above creates a BLAST job titled “demo run” in the current directory. All Moa related files are stored in a (hidden) sub-directory named .moa (go and have a look!). A Moa job consists, amongst others, of a configuration file (.moa/config) and a number of template files (.moa/template and/or .moa/template.d/*). All template files are copied into the .moa directory. This ensures that a workflow remains the same over time, even if the templates are updated. If you want to copy the latest version of a template to a Moa job, use moa refresh.
Moa also tries to assist in embedding documentation. In the above command line the -t parameter sets a mandatory project title (a job won’t execute without a title). Moa also automatically records a changelog (in .moa/doc/change). You can add your own changelog messages by using the -m argument (before the command!) or by using command_moa_change. Additionally, you can keep a “blog” (command_moa_blog) for a higher level record on the development of the work, and a “readme” (command_moa_readme) to create a document for each job.
Back to the blast job - it is obviously not enough to tell Moa to do a BLAST analysis. Some extra information is necessary (see moa set):
$ moa set db=/data/blast/db/nr
A few points are important to note: do not use spaces around the = sign. If you want to define a parameter with spaces, use quotes (key=”value with spaces”), and be very aware of bash expansion. A safer way to enter parameters is by running moa set db and Moa will query you for the value (note that in both cases you can use tab-completion).
If you want to check what the parameters are, you can use moa show. which will give you a list of parameters known to Moa:
$ moa show
db l.M /data/blast/db/nr
input d.M */*.fasta
jobid s.o blast
title l.M demo run
...
Note the variable db and title, which were set earlier. If you run show -a, more parameters will be revealed, amongst which is program. The flags between the variable key and value are explained in: moa show.
We will now set two more variables:
$ moa set program=blastp
$ moa set input=../00.proteins/*.fasta
The last statement defines the input files to blast. Once all is set you can actually run (see moa run) the BLAST analysis with:
$ moa run
Moa now performs the BLAST analysis on each of the input files. The output can be found in the out sub-directory. As an extra, the Moa blast template generates a blast_report file with simple one line report for the best five hits of each query sequence.
To illustrate how easy it is to embed extra command lines into your workflow, we will check for the presence of any dicer genes in the query set by employing grep:
$ grep -i dicer blast_report
To embed this in the workflow, execute:
$ moa set postcommand
and, at the prompt enter:
postcommand:
> grep -i dicer blast_report > dicer.out
If you now rerun moa, the BLAST job will not be repeated, but the postcommand will be executed and a dicer.out file will be generated. (note, there is also a precommand)
If this is all clear, continue with Tutorial 2.
Tutorial 2¶
Note, you will need to have git and git subtree installed for this tutorial.
Configuring Moa¶
Moa is configured using the command line tool moa. For example, you are creating a simple job somewhere:
$ moa simple -t 'test job' -- echo "Hello"
and you would like to change the title, you can do this with moa set:
$ moa set title='I mean: Hello!!'
When setting parameters on the command line you need to consider the fact that bash might try to expand or interpret the command line. For example, if you would like to set the process parameter to echo “it’s complicated”, you would need the following command line:
$ moa set process='echo "it'\'' complicated"'
only to be able to use a single quote. Similar care needs to be taken with, for example, the $ character, as that will be expanded by bash, unless placed between single quotes or properly escaped. An alternative way of setting variables would be by running:
$ moa set process
which will prompt you for the value of process, without tyring to expand any variables.
It is at all times possible to check what the current configuration is by running:
$ moa show
which will give you (possibly with more color):
postcommand o (undefined)
precommand o (undefined)
process L echo "it's complicated"
project o (undefined)
title L I mean: Hello!!
The first column has the parameter name, followed by a single letter, and the value of the parameter, or (undefined) if no value is specified. The letters in the second column signify the state of the parameter:
- o: Undefined, but Optional.
- L: Locally defined
- R: Recursively defined
- E: Error - undefined and not optional.
Execution¶
What is executed upon a Moa run can either be defined by a plugin, or by a template. Most Moa commands (such as moa show are plugin defined). Only moa prepare, moa run and moa finish call code defined in a template. A number of steps are:
Main invocation¶
The complete Moa Invocation is embedded in a try / except.
On an error, Moa tries to execute a post_error hook and then attempts to fail quietly. If you are interested in the actual error, run moa with the ‘-v’ flag
Upon a keyboard interrupt, Moa executes the post_interrupt hook and exists with a return code of -2.
Background execution¶
The first thing Moa does is to check if ‘–bg’ is defined on the command line? If so, fork, let the child thread continue and let parent thread exit.
Before continuation, the parent thread executes the background_exit hook before exit. The child thread executes the post_background hook before continuing.
Recursive execution¶
Moa used to have the ‘-r’ flag for all operations, allowing recursive operation of Moa. This was rather confusing and has been removed. Some commands still define -r (such as ‘moa cp’), but for the majority of commands, you will need to use bash (find, xargs, etc), or use the new, stand-alone, helper script ‘moar’. Using ‘moar’ is very simple:
moar -- moa run
runs ‘moa run’ in this directory, and all (non hidden) sub-directories. If you would like to limit execution to a certain depth, for example only first level sub-directories, you can run:
moar -d 1 -- moa run
Filesets¶
Filesets are an important part of Moa - they are used to define input and output files for Moa jobs. In principle, a fileset is not much more than a collection of files. They are three different types:
Types¶
Type “set”¶
A “set” fileset is given a filesystem glob, checks the filesystem and returns a list of files that conform to the glob pattern. Type “set” filesets are typically used to define input of a Moa job. A “set” fileset can (currently) contain only one * wildcard. A correct example would be:
/data/sequences/*.fasta
This glob does exactly what you expect. Lets assume that there are three sequences in this directory, the set would contain three filenames:
/data/sequences/input_01.fasta
/data/sequences/input_02.fasta
/data/sequences/input_03.fasta
More complex patterns, and wildcards other than * are not supported (yet). Each Moa job can have at most one “set” fileset.
Type “map”¶
A “map” fileset converts a “set” fileset (the source) to a related fileset, typically to calculate the output of Moa job. A “map” fileset must be linked to “set” fileset and uses a glob like pattern to convert the input “set” fileset to the resulting fileset. For example, if we take the example fileset defined above, and apply the following pattern:
./*.output
we would end up with the following “map” fileset:
./input_01.output
./input_02.output
./input_03.output
A potential pitfall is the following situation, where we have a “set” fileset defined as follows:
/data/sequences/input_*.fasta
This would result in exactly the same fileset as above. But if we now apply the same “map” pattern, the resulting output fileset would be:
./01.output
./02.output
./03.output
This is because the * from the “set” glob maps the the * in the “map” pattern, the rest is omitted. This can be useful, for example if you would be using this in a Blast job, you could specify the following “map” pattern:
./blast_*.out
which would result in the following output:
./blast_01.out
./blast_02.out
./blast_03.out
In the case of a “map” set it is allowed to use a second wildcard in the pattern, for example:
*/blast_*.out
in which case the first wildcard is replaced with the original path. In the above example this would result in:
/data/sequences//blast_01.out
/data/sequences//blast_02.out
/data/sequences//blast_03.out
(note . you might not want to do this)
Type “single”¶
Is a very simple fileset, pointing to a single file. No wildcards are allowed.
Categories¶
Moa has to keep track (using Ruffus) of in- and output of a job - it does this by tracking filesets. The category defines in a file(set) is considered “input”, “output” or a “prerequisite”. In- & output speaks for itself, a prerequisite is also considered input (i.e. if it changes the job will be repeated), but is typically kept out of the one-on-one file mapping that takes place for in- and output files.
Defining filesets¶
If you are developing a template, there is whole section devoted to filesets. The following example is taken from the Moa BLAST template, and contains almost everything that you will come across:
filesets:
db:
category: prerequisite
help: Blast database
optional: false
pattern: '*/*'
type: single
input:
category: input
help: Directory with the input files for BLAST, in Fasta format
optional: false
pattern: '*/*.fasta'
type: set
outgff:
category: output
help: GFF output files
optional: true
pattern: gff/*.gff
source: input
type: map
output:
help: XML blast output files
category: output
optional: true
pattern: out/*.out
source: input
type: map
Most of this speaks for itself. A few things to note are:
- Both “outgff” and “output” are category “output”, type “map”, filesets mapping to the same input, type “set”, fileset. This is common practice. If you have a look at the map22 template, you can even see an example of category “input”, type “map” fileset.
- If a fileset has reasonable default patterns (values) (typically goes for output fileset), it is possible to make them optional.
- Please specify a good help text
Three core templates¶
Moa comes with a list of templates (see templates). The three most important, flexible templates of these that allow you to embed custom code (called process) in your project are:
simple:
Simply executes process as a bash one-liner
map:
Takes a set of in- and output files and executes the custom commands for each in- and output file (using the Jinja2 template language).
reduce:
Takes a set of input files and a single output file and executes the custom commands with all input file, generating the output files.
Since simple, map and reduce have proven to be quite central to how Moa operates they come with their own shortcut commands (moa simple, moa map and moa reduce). These command query the user directly for the parameters instead of having to define this manually.
For example, a simple job:
$ mkdir simple_test && cd simple_test
$ moa simple -t 'Generate some files'
process:
> for x in `seq 1 5`; do touch test.$x; done
$ moa run
$ ls
test.1 test.2 test.3 test.4 test.5
Note that you can make your process as complicated as you like. Alternatively, you can write a script that you call from process.
A map job would work like this:
$ mkdir ../map_test && cd ../map_test
$ moa map -t 'Map some files'
process:
> echo {{ input }} ; echo {{ input }} > {{ output }}
input:
> ../simple_test/test.*
output:
> ./out.*
$ moa run
../simple_test/test.3
../simple_test/test.1
../simple_test/test.5
../simple_test/test.2
../simple_test/test.
Moa: Success executing "run" (<1 sec)
$ ls
out.1 out.2 out.3 out.4 out.5
$ cat out.1
../simple_test/test.1
Moa tracks which input file generates which outputfile. So, if you would like to repeat one of the jobs - you’ll need to delete the output file & rerun moa:
$ rm out.3
$ moa run
../simple_test/test.3
Moa: Success executing "run" (<1 sec)
And a reduce example:
$ mkdir ../reduce_test && cd ../reduce_test
$ moa reduce -t 'Reduce some files'
process:
> echo {{ input|join(" ") }} >> {{ output }}
input:
> ../map_test/out.*
output:
> ./reduce_out
$ moa run
Moa: Success executing "run" (<1 sec)
$ ls
reduce_out
$ cat reduce_out
../map_test/out.1 ../map_test/out.3 ../map_test/out.4 ../map_test/out.5 ../map_test/out.2
Synchronizing jobs¶
It is quite often usefull to repeat a jobs on a number of different input files. For simple operations, one liners, this ca be accomplished using moa map. More complex operations, or those requiring a template other than map can be replicated using job syncronization. Assume you have a set of fastq libraries, each in it’s own directory:
./fq/set1/set1_1.fq
./fq/set1/set1_2.fq
./fq/set2/set2_1.fq
./fq/set2/set2_2.fq
./fq/set3/set3_1.fq
./fq/set3/set3_2.fq
And you want to run a bowtie alignment for each separately. The approach to take is to create a directory containing all alignments:
mkdir bowtie
cd bowtie
and, in that directory, create one job running bowtie, in a directory named exactly as the input directories:
mkdir set1
cd set1
moa new bowtie -t 'run bowtie for {{_}}'
Note the magic variable {{_}}. This variable is replaced by the name of the current directory. So when running moa show, the title would show up as “run bowtie for set1”. This magic variable can be used in all variables, and we’ll use it here to set this job up in such a way that it can be reused for the other datasets:
set moa fq_forward_input='../../fq/{{_}}/*_1.fq'
# .. configure the remaining variables
Now - we replicate this directory in the following manner. We’ll move one directory up, to the bowtie directory, and create a sync job:
cd ..
moa new sync -t 'run bowtie for all fq datasets'
moa set source=../fq/
The sync template keeps directories synchronized, based on the source directory. If you now run moa run in the bowtie directory, two more directories will be created: set2 and set3, each containing a verbatim copy of the original bowtie job created.
If, at a certain moment you obtain more fastq datasets:
./fq/set4/set4_1.fq
./fq/set4/set4_2.fq
you can repeat moa run in the ./bowtie sync directory, and a new directory will be created. Note that the sync template will not remove directories. Also if you want to update the configuration of the syncronized bowtie jobs, you only need to change the configuration in one directory, run moa run again in the ./bowtie directory and the configuration is synchronized across all jobs.
Git integration¶
Note: * Integration with Git is a relatively new feature - there might be
dragons.
- Moa/Git will try to keep track of the structure of your workflow, not of the data you are processing.
Moa integrates with Git for a number of reasons:
- to automatically keep your workflow under version control. Having your work under version control means that at all times you can find out what your workflow looked like at a certain date.
- to allow you to share your workflow using Git. For example, you could publish your workflow to Github, and allow other researchers to clone and improve on your work (which you can then import into your own workflow again).
- to pull templates from remote git repositories. This allows anybody to create, maintain and distribute templates for you to use.
To make this work, you must make sure that git and git subtree are installed and that the moaGit plugin is enabled. Also make sure that your workflow is under Git version control. If you create ‘Moa projects’, Moa will try to create a new repository for you. Otherwise, you must run git init to create a new repository.
A workflow under Git control¶
If you’ve created a new Moa project & made sure it is under git control, Moa will try to automatically commit all changes. One important thing to notice here is that if you make manual changes to the workflow - you will need to commit them yourself. If you fail to do so, they will likely be automatically commited by the next Moa operation. In which case they are under version control, but the commit message will not make any sense.
Sharing a workflow using Git¶
Your workflow is a normal git repository. See the excellent documentation of Git & Github how to share git repositories.
Getting templates from a remote git repository¶
If you want to install a template froma remote git repository, Moa will merge the template repository with the workflow’s repository using git subtree. This approach has a a number of nice properties:
- The template is integrated in the local workflow and can be copied around and changed (as one normally would do within a git repository).
- The template code does not change unless requested (as with regular templates). Even when sharing or duplicating your repository - the template remains unchanged.
- If required, it is possible to update to the latest version of the remote template repository (using a regular moa refresh)
- It possible to upload the template changes upstream.
Note - the git submodule approach was another candidate for implenting this, but submodules are difficult to be copied once they are checked out. Additionally, the git subtree approach has as an advantage above the git subtree merge strategy that it is easier to upload changes upstream.
Define a template provider¶
To set up moa to work with git template you need to define a template provider, for example:
$ moa set -s template.providers.gtp.class=gitmodule
$ moa set -s template.providers.gtp.base='https://github.com/mfiers/moa_template_%s.git'
$ moa set -s template.providers.gtp.enabled=true
This defines a provider (called gtp) that pulls templates from github (but any other git server can be used). Note that it is probably advisable to set *.enabled=true as last - to prevent an incomplete import.
After doing this you can run:
$ mkdir 30.run_bowtie
$ cd 30.run_bowtie
$ moa new gtp:bowtie -t 'a sensible title'
which would expect and merge a repository in the following location:
Moa requires to have template name (bowtie in the example above) and the configured base resolve to a valid git repository url. This provides a user friendly syntax and the ability to use any git repository required.
Contribute¶
Any contribution is more than welcome!
Documentation¶
You can find the documentation source on Github Moa repository in the Sphinx subdirectory. The documentation is formatted in Restructured Text and generated using Sphinx.
If you would like to work on the documentation, you can clone the repository and send pull requests for any change you make. However, there is an easier way. Make sure you have a github account first. Once you’re logged into Github, go to the moa sphinx subdirectory, click on the *.rst file you would like to edit and subsequently click the edit button. You now get a warning that Github is forking the repository - which is fine - there should be an edit window that allows you to edit the text. Once you’re done editing, please write a little (commit) message describing what you changed, and if necessary, am more elaborate comment, and click “Propose File Change”. On the next page you get the option to “Send a pull request”, which pings me that you’ve made changes. (note that you now forked Moa)
Acknowledgements¶
The following people have contributed to Moa by writing code, using and testing, submitting bug reports and feature requests, writing documentation or engaging into general discussion.
John McCallumn, Roy Storey, Marcus Davy, Susan Thomson, Ashley Lu, Cecilia Deng, Helge Dzierzon, Yogini Idnani, William Demchick and Mark Fiers
Plant & Food Research New Zealand has provided financial support for development
NOTE: both the software and the manual are under development. Expect things to change.
How to write a template¶
A MOA template is made up of a .moa file
and a .jinja2
(or .mk
) file.
The .moa
file mainly contains input-output file sets and parameter options used for
the bash command(s). Some of these options have default values which the user can change while
constructing the job.
The .jinja2
file includes information to structure the command(s). It is written in jinja,
which is a templating language for python and is simple to write and easy to understand.
These files are used by the backend, currently ruffus, that manages file set and parameter dependencies to make pipelines and render commands to the bash prompt. Initially, GNU make was the backend used. It is very powerful but some of its limitations and its complexity led to including ruffus as an option for the backend as well.
The easiest way to write a moa template is to edit an existing template to suit your requirements. This involves understanding the parts of an existing template.
The bwa_aln
template is used as an example below. Just as a background,
the bwa aln command takes a FASTQ file as input and aligns it to a reference genome that
was previously indexed. The output is a .sai file with the alignments.
The bwa_aln.moa
file has some main components:
Backend
backend: ruff
This is ‘ruff’ which means that ruffus is used in the python script at a lower level to read the template .moa and .jinja2 file, and render the corresponding commands to the bash prompt.
Commands
commands: run: mode: map help: run bwa aln clean: mode: simple help: Remove all job data, not the Moa job itself, note that this must be implemented by the template.
This indicates the function names that you will later define. In the example above, there are 2 commands- run and clean, so
moa run
ormoa clean
on the command prompt in the job directory would execute these functions.Filesets
filesets: input: category: input extension: fq help: Fastq input files glob: '*' optional: false type: set output: category: output dir: . extension: sai glob: '{{ input_glob }}' source: input type: map
Like the name, each filesets refer to a set of files in a single directory. The bwa_aln template shows 2 filesets:
input
andoutput
.Category: is essentially used to separate input from output.
Extension: refers to the type of file(s) required or generated.
Glob: searches for files with a specified pattern. Moa, by default (glob= *) automatically processes all files of the specified input extension in the directory specified. By specifying a glob, Moa will only process those files whose name pattern matches what is in the glob.
Type: refers to the data type of the fileset or parameter.
A fileset can either be of
set
ormap
type. The typeset
refers to a simple set of files in a directory. The typemap
refers to a set of files that are linked to what theirsource
value is. In the above code, the output fileset is mapped to the input fileset.Dir: the directory of the output fileset is ‘.’, which means that the output files will be placed in the current working directory.
Parameter category order
parameter_category_order: - '' - input - system - advanced
Parameters
mismatch_penalty: category: '' default: 3 help: mismatch penalty optional: true type: integer
They are the variables/options that specify a command.
- Category:
- Default: is the value that is used by default if not changed by the user.
- Optional: specifies if it is necessary for the user to fill in a value for the variable.
If
optional
is false, the user has to indicate a value for the parameter in order to execute the job. - Type: specifies the data type of the variable eg. integer, string, boolean.
Moa_id
moa_id: bwa_aln
is supposed to be the same as the filename. Ideally something descriptive (eg. bwa_aln). This is used to later link to the other template file.
The other template file is ‘’bwa_aln.jinja2’’ which is written in jinja, a templating language for python. Note that the jinja2 file name is the same as the moa file name.
Important features of the bwa_aln.jinja2 file are:
The three hash’s (###) specify the start of a function and are followed by the function name. In our bwa_aln example, we have defined 2 funtions:
run
andclean
.### run
This defination is followed by a set of commands which you would want to be executed when you type
moa run
ormoa_clean
in the bwa_aln job directory. The commands in our example file look the same as what you would put in the command prompt but the values of the parameters are bought from the .moa file and hence it’s value is replaced by the parameter name.bwa aln {{db}} \ -n {{edit_dist_missing_prob}} \ . \ . \ . \ {{ input }} \ -f {{ output}}
It is also possible to add if-else statements or other computing blocks in accordance with the design language.
{% if color_space %} -c {% endif %}
Command reference¶
moa run¶
- usage: moa run [-h] [-v] [–bg] [–profile] [-j THREADS] [–ol]
- [–olq OPENLAVAQUEUE] [–olx OPENLAVAEXTRA] [–oln OPENLAVAPROCS] [–oldummy] [–olm OPENLAVAHOST]
execute the template ‘run’ command. Execution depends on the template. This command can only be exeucted from within a template.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output --bg Run moa in the background (implies -s) --profile Run the profiler -j THREADS No threads to use when running Ruffus --ol Use OpenLava as actor --olq OPENLAVAQUEUE The Openlava queue to submit this job to --olx OPENLAVAEXTRA Extra arguments for bsub --oln OPENLAVAPROCS The number of processors the jobs requires --oldummy Do not execute - just create a script to run --olm OPENLAVAHOST The host to use for openlava
~~~~~~~~~~~~~~~~~~.. _command_moa_archive:
moa archive¶
Archive a job, or tree with jobs for later reuse.
This command stores only those files that are necessary for execution of this job, that is: templates & configuration. In & output files, and any other file are ignored. An exception to this are all files that start with ‘moa. If the name is omitted, it is derived from the jobid parameter.
It is possible to run this command recursively with the -r parameter - in which case all (moa job containing) subdirectories are included in the archive.
- positional arguments:
- name archive name (default: None)
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -s, --sync Alternative approach to deal with sync type jobs - only include _ref directories (default: False) -t, --template Store this archive as a template (default: False)
moa archive_excl¶
Toggle a directory to be included in an moa archive.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa archive_incl¶
Toggle a directory to be included in an moa archive.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa cp¶
Copy a moa job, or a tree with jobs (with -r).
moa cp copies only those files defining a job: the template files and the job configuration. Additionaly, all files in the moa directory that start with moa. (for example moa.description are copied as well. Data and log files are not copied!. If used in conjunction with the -r (recursive) flag the complete tree is copied.
- positional arguments:
- from copy from to copy to (default: None)
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -r, --recursive copy recursively - including all subdirectories (default: False) -o, --overwrite if the target dir exists - overwrite (instead of copying into that dir (default: False)
moa dumpTemplate¶
moa template_dump - Show raw template information
Usage:
moa template_dump [TEMPLATE_NAME]
Show the raw template sysConf.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa err¶
Show the stderr of the most recently executed moa job
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa files¶
Show in and output files for this job
Display a list of all files discovered (for input & prerequisite type filesets) and inferred from these for map type filesets.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -a, --all Show all filesets (default: False) -n NO_FILES, --no_files NO_FILES No filesets to show (default 10) (default: 10)
moa kill¶
Kill a running job.
This command checks if a job is running. If so - it tries to kill it by sending SIGKILL (-9) to the job.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa list¶
Lists all known local templates
Print a list of all templates known to this moa installation. This includes locally installed templates as well.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa lock¶
Lock a job - prevent execution
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa log¶
Show activity log
Shows a log of moa commands executed. Only commands with an impact on the pipeline are logged, such as moa run & moa set.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa map¶
create an adhoc moa ‘map’ job
Moa will query the user for process, input & output files. A map job maps a set of input files on a set of output files, executing the process command for each combination. The process parameter is interpreted as a Jinja2 template with the input file available as {{ input }} and the output as {{ output }}.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -t TITLE, --title TITLE A title for this job (default: None)
moa map!¶
create an adhoc moa ‘map’ job
This command is exactly the same as moa map but uses the Moa local (or user) bash history instead.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -t TITLE, --title TITLE A title for this job (default: None)
moa mv¶
Move, rename or renumber a moa job.
- positional arguments:
- from copy from to copy to (default: None)
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa new¶
Create a new job.
This command creates a new job with the specified template in the current directory. If the directory already contains a job it needs to be forced using ‘-f’. It is possible to define arguments for the job on the commandline using KEY=VALUE after the template. Note: do not use spaces around the ‘=’ sign. Use quotes if you need spaces in variables (KEY=’two values’)
- positional arguments:
- template name of the template to use for this moa job parameter arguments for this job, specifyas KEY=VALUE without spaces (default: None)
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -t TITLE, --title TITLE mandatory job title (default: )
moa out¶
Show the stdout of the most recently executed moa job
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa pause¶
Pause a running job
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa postcommand¶
Execute ‘postcommand’
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa precommand¶
Execute ‘precommand’
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa raw_commands¶
return a list available commands
Print a list of known Moa commands, both global, plugin defined commands as template specified ones. This command meant to be used by software interacting with Moa.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa raw_parameters¶
Print a list of all known parameters
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa reduce¶
Create a ‘reduce’ adhoc job.
There are a number of ways this command can be used:
$ moa reduce -t 'a title' -- echo 'define a command'
Anything after – will be the executable command. If omitted, Moa will query the user for a command.
Moa will also query the user for input & output files. An example session:
$ moa map -t 'something intelligent'
process:
> echo 'processing {{ input }} {{ output }}'
input:
> ../10.input/*.txt
output:
> ./*.out
Assuming you have a number of text files in the ../10/input/ directory, you will see, upon running:
processing ../10.input/test.01.txt ./test.01.out
processing ../10.input/test.02.txt ./test.02.out
processing ../10.input/test.03.txt ./test.03.out
...
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -t TITLE, --title TITLE A title for this job (default: None)
moa reduce!¶
Create a ‘reduce’ adhoc job using the bash history
This command is exactly the same as moa reduce, but uses the bash history instead of the moa process history.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -t TITLE, --title TITLE A title for this job (default: None)
moa refresh¶
Refresh the template
Reload the template from the original repository.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa rehash¶
cache a list of variables for command line completion
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa resume¶
Resume a running job
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa set¶
Set one or more variables
This command can be used in two ways. In its first form both parameter key and value are defined on the command line: moa set KEY=VALUE. Note that the command line will be processed by bash, which can either create complications or prove very useful. Take care to escape variables that you do not want to be expandend and use single quotes where necessary. For example, to include a space in a variable: moa set KEY=’VALUE WITH SPACES’.
Alternative use of the set command is by just specifying the key: ‘moa set PARAMETER_NAME’, in which case Moa will prompt the user enter a value - circumventing problems with bash interpretation.
Note: without -s, moa needs to be executed from within a Moa job
System configuration¶
By specifying -s or –system, the variable is stored as a system configuration variable in the YAML formatted ~/.config/moa/config. Please, use this with care!
The dots in the key name are interpreted as nested levels, so, running:
moa set -s plugins.job.completion.enabled=false
will result in the following section added on top of the YAML:
plugins:
job:
completion:
enabled: false
Adding keys like this mixes safely with configuration information that is already present. So, setting:
moa set -s plugins.job.completion.something=else
will not remove the enabled: false heading under completion:, resulting in:
plugins:
job:
completion:
enabled: false
someting: else
- positional arguments:
- parameter arguments for this job, specifyas KEY=VALUE without spaces
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -s, --system store this a system configuration variable (default: False)
moa show¶
Show parameters known to this job.
The command outputs three columns, parameter name, flag and value. The two flags have the following meaning:
- Origin: (l) locally defined; (d) default value; (r) recursively defined; (s) system defined; (x) extra value, not in the template; and (.) not defined.
- Private: a p indicates this variable to be private.
- Mandatory: a lower case o indicates this to be an optional variable and M means mandatory.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -u show unrendered values (default: False) -a show all parameters (default: False) -p show private parameters (default: False)
moa simple¶
Create a ‘simple’ adhoc job.
Simple meaning that no in or output files are tracked. Moa will query you for a command to execute (the process parameter). Note that Moa tracks a history for all ‘process’ parameters used.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -t TITLE, --title TITLE A title for this job (default: None)
moa simple!¶
Create a ‘simple’ adhoc job.
This command is exactly the same as moa simple except for the fact that Moa uses the bash history specific for the moa job or, if absent, the user bash history. This is convenient if you would like to register or reuse a command that you have alreayd executed.
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -f, --force Force this action (default: False) -t TITLE, --title TITLE A title for this job (default: None)
moa status¶
Show job status
Print a short status of the job, including configuration
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -u show unrendered values (when using inline parameters) (default: False) -R show recursively defined parameters not specified by the local template (default: False) -p show private parameters (default: False) -a show all parameters (default: False)
moa template¶
moa template - Print the template name of the current job
Usage:
moa template
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa test¶
Test the job parameters
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa tree¶
Show a directory tree and job status
- positional arguments:
- filter show only directories that match this filter (default: None)
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False) -a, –all
moa unlock¶
Unlock a job - allow execution
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa unset¶
Remove a parameter from the configuration
Remove a configured parameter from this job. In the parameter was defined by the job template, it reverts back to the default value. If it was an ad-hoc parameter, it is lost from the configuration.
- positional arguments:
- parameter parameter to unset
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
moa version¶
print moa version number
- optional arguments:
-h, --help show this help message and exit -v, --verbose Show debugging output (default: False) --profile Run the profiler (default: False)
Templates¶
Contents:
abyss_pe¶
Run Abysspe
Commands¶
- clean
- Remove all job data
- run
- Execute abysspe in paired-end mode
Filesets¶
- fq_forward
- fastq input files directory - forward
- fq_reverse
fastq input files directory - reverse
type: mapsource: fq_forwardcategory: inputoptional: Truepattern: */*_2.fq- output
soap denovo output file
type: singlecategory: outputoptional: Truepattern: {}
Parameters¶
- joinpairs
number of pairs needed to consider joining two contigs
type: integerdefault: 10optional: True- kmer
kmer size
type: integerdefault: 31optional: True- threads
no threads to use
type: integerdefault: 3optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Mon, 21 Nov 2011 12:47:16
- Modification date
- Mon, 21 Nov 2011 12:47:22
abyss_se¶
Run Abysspe
Commands¶
- clean
- Remove all job data
- run
- Execute abyss se
Filesets¶
- input
- fastq input files directory
- output
soap denovo output file
type: singlecategory: outputoptional: Truepattern: {}
Parameters¶
- kmer
kmer size
type: integerdefault: 31optional: True- threads
no threads to use
type: integerdefault: 3optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Mon, 21 Nov 2011 12:47:16
- Modification date
- Mon, 21 Nov 2011 12:47:22
archroot¶
Helper script for a root archive
Helper script for the root of an archive template
Commands¶
- run
- no help defined
Parameters¶
- moa_archive_parameters
space separated list of parameters to set for this template
type: stringdefault: {}optional: False
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue, 17 Apr 2012 10:21:31
- Modification date
- Tue, 17 Apr 2012 10:21:25
autohagfish¶
Automatically run bowtie & hagfish combined
Run the preparatory steps for hagfish
Commands¶
- clean
- remove all Hagfish files
- finish
- finish up - find gaps - combine plots - create a report
- run
- Run hagfish
Filesets¶
- fasta
fasta sequence of the reference
type: singlecategory: prerequisiteoptional: Falsepattern: {}- fw_fq
- forward fq input
- outbase
basename for output files
type: mapsource: fw_fqcategory: outputoptional: Truepattern: ./*- rev_fq
reverse fq input
type: mapsource: fw_fqcategory: inputoptional: Truepattern: */*2.fq
Parameters¶
- max_ok
Maximal acceptable insert size for an aligned pair. If omitted, hagfish will make an estimate
type: intdefault: 0optional: True- min_ok
Minimal acceptable insert size for an aligned pair. If omitted, hagfish will make an estimate
type: intdefault: 0optional: True- threads
no threads to use
type: integerdefault: 8optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Thu, 19 May 2011 20:49:04 +1200
bamextract¶
bamextract
Extract a region from a BAM file
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Extract a region from a BAM file
Filesets¶
- bam
BAM input
type: singlecategory: inputoptional: Falsepattern: {}- regions
List with regions to extract (id seqid start stop)
type: singlecategory: inputoptional: Falsepattern: {}- vcf
optional VCF input
type: singlecategory: inputoptional: Truepattern: {}
Parameters¶
- flank
flanking region to extract
type: integerdefault: 100optional: {}
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
bdbb¶
Bidirectional best BLAST hit
Discover the bidirectional best blast hit between two sets of sequences
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- generate a list of bidirectional best blast hits between two databases of sequences
Filesets¶
- input_a
First multi fasta input set
type: singlecategory: inputoptional: Falsepattern: */*.fasta- input_b
Second multi fasta input set
type: singlecategory: inputoptional: Falsepattern: */*.fasta- output
List of bidirectional best blasts hits
type: mapsource: input_acategory: outputoptional: Truepattern: */*.list
Parameters¶
- eval
e value cutoff
type: floatdefault: 1e-10optional: True- extract
Extract the identified sequences from the input fasta files
type: booleandefault: Falseoptional: True- nothreads
Threads to run blast with with
type: integerdefault: 4optional: True- protein
Is this a protein set
type: booleandefault: Falseoptional: True- tblastx
If this is a nucleotide set, use tblastx?? (otherwise use blastn)
type: booleandefault: Foptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- unknown
bfast_aln¶
Generate bam format alignments using bfast
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run bfast match, localalign, postprocess commands
Filesets¶
- fa_input
- fasta input file
- fq_input
- fastq input files
output_aln
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.aln
output_bam
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.bam
Parameters¶
- algorithm_colour_space
true -> colour space, false -> NT space
type: booleandefault: Falseoptional: True- avg_mism_qual
Specifies the average mismatch quality
type: integerdefault: 10optional: True- extra_params_localalign
Any extra parameters for the localalign command
type: stringdefault: ``optional: True- extra_params_match
Any extra parameters for the match command
type: stringdefault: ``optional: True- extra_params_postprocess
Any extra parameters for the postprocess command
type: stringdefault: ``optional: True- min_mapping_qual
Specifies to remove low mapping quality alignments
type: integerdefault: -2147483648optional: True- min_norm_score
Specifies to remove low (alignment) scoring alignments
type: integerdefault: -2147483648optional: True- output_format
0 - BAF, 1 - SAM
type: integerdefault: 1optional: True- paired_opp_strands
Specifies that paired reads are on opposite strands
type: booleandefault: Falseoptional: True- pairing_std_dev
Specifies the pairing distance standard deviation to examine when recuing
type: floatdefault: 2.0optional: True- print_params
print program parameters
type: booleandefault: Falseoptional: True- thread_num
Specifies the number of threads to use
type: integerdefault: 1optional: True- timing_information
specifies output timing information
type: booleandefault: Trueoptional: True- ungapped_aln
Do ungapped local alignment
type: booleandefault: Falseoptional: True- ungapped_pairing_rescue
Specifies that ungapped pairing rescue should be performed
type: booleandefault: Falseoptional: True- unpaired_reads
True value specifies that pairing should not be performed
type: booleandefault: Falseoptional: True- usage_summary
Display usage summary (help)
type: booleandefault: Falseoptional: True- which_strand
0 - consider both strands, 1 - forwards strand only, 2 - reverse strand only
type: integerdefault: 0optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Feb 15 10:06:48 2011
- Modification date
- unknown
bfast_db¶
Generate db index files for aligning reads with bfast
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run bfast fasta2brg and index commands
Filesets¶
- fa_input
- fasta input file
Parameters¶
- algorithm_colour_space
true -> colour space, false -> NT space
type: booleandefault: Falseoptional: True- depth
The depth of the splitting(d). The index will be split into 4^d parts.
type: integerdefault: 0optional: True- extra_params
Any extra parameters
type: stringdefault: ``optional: True- hash_width
The hash width for the index (recommended from manual = 14)
type: integerdefault: {}optional: False- index_num
Specifies this is the ith index you are creating
type: integerdefault: 1optional: True- mask
The mask or spaced seed to use.
type: stringdefault: {}optional: False- print_params
print program parameters
type: booleandefault: Falseoptional: True- thread_num
Specifies the number of threads to use
type: integerdefault: 1optional: True- timing_information
specifies output timing information
type: booleandefault: Trueoptional: True- usage_summary
Display usage summary (help)
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Feb 15 10:06:48 2011
- Modification date
- unknown
blast¶
Basic Local Alignment Tool
Wraps BLAST [[Alt90]], probably the most popular similarity search tool in bioinformatics.
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- report
- Generate a text BLAST report.
- run
- Running BLAST takes an input directory, determines what sequences are present and executes BLAST on each of these. Moa BLAST is configured to create XML output (as opposed to the standard text based output) in the out directory. The output XML is subsequently converted to GFF3 by the custom blast2gff script (using BioPython). Additionally, a simple text report is created.
Filesets¶
- db
Blast database
type: singlecategory: prerequisiteoptional: Falsepattern: */*- input
- Directory with the input files for BLAST, in Fasta format
- outgff
GFF output files
type: mapsource: inputcategory: outputoptional: Truepattern: gff/*.gff- output
XML blast output files
type: mapsource: inputcategory: outputoptional: Truepattern: out/*.out
Parameters¶
- eval
e value cutoff
type: floatdefault: 1e-10optional: True- gff_blasthit
(T,**F**) - export an extra blasthit feature to the created gff, grouping all hsp (match) features.
type: setdefault: Foptional: True- gff_source
source field to use in the gff
type: stringdefault: BLASToptional: True- nohits
number of hits to report
type: integerdefault: 50optional: True- nothreads
threads to run blast with (note the overlap with the Make -j parameter)
type: integerdefault: 2optional: True- program
blast program to use (default: blastn)
type: setdefault: blastnoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
blastdb¶
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Takes either a set of fasta files or a single multi-fasta input file and creates a BLAST database.
Filesets¶
dbname
type: mapsource: inputcategory: outputoptional: {}pattern: ./db
- input
The file with all input FASTA sequences for the blastdb.
type: singlecategory: inputoptional: Falsepattern: */*.fasta
Parameters¶
- protein
Protein database? (T)rue) or not (F)alse (default: F)
type: setdefault: Foptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Tue, 03 Jan 2012 15:00:23
bowtie¶
Bowtie
Run BOWTIE on an set of input files (query) vs a database index.
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template
- run
- no help defined
Filesets¶
- input
- Fasta/fastq input files for bowtie
- output
Output files
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.bam
Parameters¶
- db
The (basename of the) bowtie database to use.
type: stringdefault: {}optional: False- extra_params
extra parameters to feed bowtie
type: stringdefault: ``optional: True- input_format
Format of the input files
type: setdefault: fastqoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
bowtie_pe¶
Run BOWTIE on an set of input files (query) vs a database index.
Commands¶
- clean
- Remove all job data, not the Moa job itself
- finish
- finish up
- report
- Create a report on the results
- run
- Execute soapdenovo in paired-end mode
Filesets¶
- db
The (basename of the) bowtie database to use.
type: singlecategory: prerequisiteoptional: Falsepattern: ../20.bowtiedb/db- fq_forward_input
- Fastq input files - forward
- fq_reverse_input
Fastq input files - reverse
type: mapsource: fq_forward_inputcategory: inputoptional: Truepattern: */*_2.fq- output
Bam output file
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*.bam
Parameters¶
- extra_params
extra parameters to feed to bowtie
type: stringdefault: ``optional: True- input_format
Format of the input files
type: setdefault: fastqoptional: True- lots_of_data
Keep unmapped reads, unsorted BAM - takes up a lot of space!
type: booleandefault: Falseoptional: True- max_insertsize
Maximum allowed insertsize
type: integerdefault: 250optional: True- min_insertsize
Minimum allowed insertsize
type: integerdefault: 1optional: True- orientation
orientation of the reads, allowed values are fr, rf, ff
type: {}default: froptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
bowtie_se¶
Run BOWTIE on an set of input files (query) vs a database index.
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template
- run
- no help defined
Filesets¶
- fq_input
- fastq input files directory
- output
Bam output file
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.bam
Parameters¶
- ebwt_base
The (basename of the) bowtie database to use.
type: stringdefault: {}optional: False- extra_params
extra parameters to feed to bowtie
type: stringdefault: ``optional: True- input_format
Format of the input files
type: setdefault: fastqoptional: True- output_format
Format of the output file
type: setdefault: bamoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
bowtiedb¶
Builds a bowtie index from a reference sequence
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Create the bowtie database
Filesets¶
- input
- Input fasta file for the bowtie database
- output
database name to create
type: singlecategory: outputoptional: {}pattern: db
Parameters¶
- extra_params
any option parameters
type: stringdefault: ``optional: True
title
type: {}default: Bowtie index builderoptional: {}
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Dec 09 07:56:48 2010
bwa_aln¶
Use BWA to align a set of fastq reads against a db
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- run bwa aln
Filesets¶
- input
- Fastq input files
output
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.sai
Parameters¶
- best_hits_stop
stop searching when there are >INT equally best hits
type: integerdefault: {}optional: True- color_space
input sequences are in the color space
type: booleandefault: Falseoptional: True- db
bwa database to align against
type: stringdefault: {}optional: False- edit_dist_missing_prob
max
type: floatdefault: {}optional: True
gap_ext_max
type: integerdefault: {}optional: True
- gap_ext_penalty
gap extension penalty
type: integerdefault: {}optional: True- gap_open_penalty
gap open penalty
type: integerdefault: {}optional: True- gap_opens_max
maximum number or fraction of gap opens
type: integerdefault: {}optional: True- log_gap_penalty_del
log-scaled gap penalty for long deletions
type: booleandefault: {}optional: True- max_ext_long_del
maximum occurrences for extending a long deletion
type: integerdefault: {}optional: True- max_queue_entry
maximum entries in the queue
type: integerdefault: {}optional: True- mismatch_penalty
mismatch penalty
type: integerdefault: {}optional: True- no_indel_from_ends
do not put an indel within INT bp towards the ends
type: integerdefault: {}optional: True- non_iterative
non-iterative mode search for all n-difference hits (slow)
type: booleandefault: Falseoptional: True- quality_step
quality threshold for read trimming down to 35bp
type: integerdefault: {}optional: True- seed_len
Seed length
type: integerdefault: {}optional: True- seed_max_diff
Maximum differences in the seed
type: integerdefault: {}optional: True- thread_num
number of threads
type: integerdefault: {}optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- unknown
bwa_index¶
Bwa index builder
Builds a bwa index from a reference sequence
Commands¶
- clean
- Remove all job data
- run
- Create the index
Filesets¶
- input
Input fasta file for the bowtie database
type: singlecategory: inputoptional: Falsepattern: */*.fasta
Parameters¶
- algorithm
Algorithm for constructing BWT index. Available options are ‘is’ and ‘bwtsw’
type: stringdefault: isoptional: True- color_space
input sequences are in the color space
type: booleandefault: Falseoptional: True- prefix
Name of the bwa index to create
type: stringdefault: dboptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
bwa_sampe¶
Generate alignments in SAM format given paired end reads
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run bwa sampe
Filesets¶
- fq_forward_input
- fastq input files directory - forward
- fq_reverse_input
fastq input files directory - reverse
type: mapsource: fq_forward_inputcategory: inputoptional: Truepattern: */*_2.fq
output_bam
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*.bam
- sai_forward_input
sai input files - forward
type: mapsource: fq_forward_inputcategory: inputoptional: Falsepattern: */*_1.sai- sai_reverse_input
sai input files - reverse files
type: mapsource: sai_forward_inputcategory: inputoptional: Truepattern: */*_2.sai
Parameters¶
- db
bwa database to align against
type: stringdefault: {}optional: False- disable_insert_size
disable insert size estimate (force -s)
type: booleandefault: Falseoptional: True- disable_SW
disable Smith-Waterman for the unmapped mate
type: booleandefault: Falseoptional: True- lots_of_data
store unmapped reads - takes up a lot of space!
type: booleandefault: Falseoptional: True- max_aln_out
maximum hits to output for paired reads
type: integerdefault: 3optional: True- max_insert_size
maximum insert size
type: integerdefault: 500optional: True- max_occ_read
maximum occurrences for one end
type: integerdefault: {}optional: True- max_out_discordant_pairs
maximum hits to output for discordant pairs
type: integerdefault: {}optional: True- preload_index
preload index into memory (for base-space reads only)
type: booleandefault: Falseoptional: True- prior_chimeric_rate
prior of chimeric rate (lower bound)
type: integerdefault: {}optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Nov 25 17:06:48 2010
- Modification date
- unknown
bwa_samse¶
Generate alignments in SAM format given single end reads, using both ‘bwa samse’.
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run bwa samse
Filesets¶
- fq_input
- fastq input file
- output_bam
output bam file
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.bam- sai_input
sai input directory - filenames must correspond to the fastq input files
type: mapsource: fq_inputcategory: inputoptional: Falsepattern: */*.sai
Parameters¶
- db
bwa database to align against
type: stringdefault: ``optional: False- max_aln_out
Maximum number of alignments to output in the XA tag for reads paired properly
type: integerdefault: 3optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Nov 25 17:06:48 2010
- Modification date
- unknown
cdsmatrix¶
CdsMatrix
Predicts (prokaryotic) using glimmer3.
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Generate a matrix of CDS’s
Filesets¶
- input
- Directory with the cds files for Glimmer3
- output
Output blast files
type: mapsource: inputcategory: outputoptional: Truepattern: ./*.out- reference
reference multi fasta file
type: singlecategory: prerequisiteoptional: {}pattern: */*.fasta- table
table files
type: mapsource: inputcategory: outputoptional: Truepattern: ./*.tab
Parameters¶
- cutoff
score cutoff value - disregards hits below this score
type: {}default: 100optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Thu, 21 Jul 2011 20:31:10 +1200
empty¶
empty
Do nothing...
Commands¶
Parameters¶
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Mon Apr 04 16:02:58 2011
- Modification date
- Mon Apr 04 16:03:18 2011
fastainfo¶
gather information on a set of fasta files
gather info on a set of input files
Commands¶
- finish
- create a report
- run
- generate info on each of the input sequences
Filesets¶
- input
- “fastainfo” input files
- output
“fastainfo” raw output files
type: mapsource: inputcategory: outputoptional: Truepattern: stats/*.out- stats
“fastainfo” collect stat files
type: mapsource: inputcategory: outputoptional: Truepattern: stats/*.stat
Parameters¶
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Mon, 11 Jul 2011 15:15:20
- Modification date
- Mon, 11 Jul 2011 15:15:12
fastqc¶
Run FastQC for fastq QC
Run FastQC on a set a fastq files - quality assessment
Commands¶
- finish
Run Fastqc
finish delegates execution to: report
- report
- Generate a simple fastqc report
- run
- no help defined
Filesets¶
- input
- fastqc input files’
- touch
touch files - track if a file has been processed - do not touch this unless you know what you’re doing.
type: mapsource: inputcategory: outputoptional: Truepattern: ./*.touch
Parameters¶
- output_dir
output directory for the fastQC report
type: dirdefault: .optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Thu, 28 Apr 2011 09:27:17 +1200
- Modification date
- Thu, 28 Apr 2011 14:19:04 +1200
fastx_clipper¶
run fastx_clipper
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run fastx_clipper
Filesets¶
- input
- fastq input files directory
output
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.fq
Parameters¶
- adaptor
ADAPTER string. default is CCTTAAGG (dummy adapter).
type: stringdefault: CCTTAAGGoptional: True- adaptor_and_bases
Keep the adapter and N bases after it.
type: integerdefault: 0optional: True- compress_output
Compress output with GZIP.
type: booleandefault: Falseoptional: True- debug_output
DEBUG output.
type: booleandefault: Falseoptional: True- help
help screen
type: booleandefault: Falseoptional: True- keep_unknown_nuc_seq
keep sequences with unknown (N) nucleotides. default is to discard such sequences.
type: booleandefault: Falseoptional: True- out_adaptor_only_seq
Report Adapter-Only sequences.
type: booleandefault: Falseoptional: True- rm_clipped_seq
Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter).
type: booleandefault: Falseoptional: True- rm_non_clipped_seq
Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter).
type: booleandefault: Falseoptional: True- rm_short_seq
discard sequences shorter than N nucleotides. default is 5.
type: integerdefault: 5optional: True- verbose
Verbose - report number of sequences. If [-o] is specified, report will be printed to STDOUT. If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR.
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Dec 06 17:06:48 2010
- Modification date
- unknown
fastx_qual_stats¶
run fastx_quality_stats, fastq_quality_boxplot_graph.sh and fastx_nucleotide_distribution_graph.sh
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run fastx_quality_stats, fastq_quality_boxplot_graph.sh and fastx_nucleotide_distribution_graph.sh
Filesets¶
boxplot_output
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.png
- input
- fastq input files directory
nuc_distr_output
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.png
qual_output
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.txt
Parameters¶
- gen_postScript_file
Generate PostScript (.PS) file. Default is PNG image.
type: booleandefault: Falseoptional: True- graph_title
Title - will be plotted on the graph.
type: stringdefault: {{ input_glob }}optional: True- help
help screen
type: booleandefault: Falseoptional: True- new_out_format
New output format (with more information per nucleotide/cycle)
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Dec 03 17:06:48 2010
- Modification date
- unknown
filterwgs_pair¶
Execute a “map22” ad-hoc analysis - two input files, two output files
Filter raw WGS data
Commands¶
- run
- Filter WGS data
Filesets¶
- input1
- forward input fastq
- input2
reverse input fastq
type: mapsource: input1category: inputoptional: Falsepattern: */*- output1
forward output fastq
type: mapsource: input1category: outputoptional: Truepattern: ./*- output2
reverse output fastq
type: mapsource: input1category: outputoptional: Truepattern: ./*
Parameters¶
- adapters
Fasta file with the adapter sequences to trim
type: filedefault: {}optional: False- minlen
Minimum remaining sequence length
type: intdefault: 50optional: True- qual
quality threshold causing trimming
type: intdefault: 13optional: True
title
type: {}default: Filter paired fastq files using fastq-mcfoptional: {}
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Mon, 13 Feb 2012 09:16:36 +1300
genemarks¶
geneMarkS
predict genes using geneMarkS
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- no help defined
Filesets¶
- input
- Directory with the input files for Genemarks
Parameters¶
- gff_source
source field to use in the gff. Defaults to “geneMarkS”
type: stringdefault: genemarkSoptional: True- matrix
the matrix to use
type: filedefault: ``optional: True
miscellaneous¶
- Backend
- ruff
Author
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
glimmer3¶
Glimmer3
Predicts (prokaryotic) using glimmer3.
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Glimmer3 is a open reading frame discovery program from the EMBOSS [[emboss]] package. It takes a set of input sequences and predicts all open reading frames. Additionally, this template converts the default output (predicted protein sequences) to GFF3.
Filesets¶
- cds
CDS output files from glimmer3
type: mapsource: inputcategory: outputoptional: Truepattern: cds/*.fasta- gff
GFF output files from glimmer3
type: mapsource: inputcategory: outputoptional: Truepattern: gff/*.gff- input
- Directory with the input files for Glimmer3
- output
Raw output files from glimmer3
type: mapsource: inputcategory: outputoptional: Truepattern: out/*.g3- pep
peptide output files from glimmer3
type: mapsource: inputcategory: outputoptional: Truepattern: pep/*.fasta
Parameters¶
- gene_len
Minimum gene length (glimmer3 -g/–gene_len)
type: integerdefault: 110optional: True- gff_source
source field to use in the gff. Defaults to “glimmer3”
type: stringdefault: glimmer3optional: True- max_overlap
Maximum overlap, see the glimmer documentation for the -o or –max_olap parameter
type: integerdefault: 50optional: True- stop_codons
stop codons
type: {}default: tag,tga,taa,nnn,tnn,ann,gnn,cnnoptional: True- treshold
treshold for calling a gene a gene (glimmer3 -t)
type: integerdefault: 30optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
gmap¶
Gmap
Run GMAP on an set of input files (query) vs a database index.
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- no help defined
Filesets¶
align
type: mapsource: inputcategory: outputoptional: {}pattern: ./align/*.align
genepred
type: mapsource: inputcategory: outputoptional: {}pattern: ./genepred/*.genepred
gff
type: mapsource: inputcategory: outputoptional: {}pattern: ./gff/*.gff
gff_invert
type: mapsource: inputcategory: outputoptional: {}pattern: ./gff/*.invert.gff
- input
- Sequences to map
raw
type: mapsource: inputcategory: outputoptional: {}pattern: ./raw/*.raw
Parameters¶
- db
Gmap db
type: filedefault: ``optional: False- extra_parameters
extra parameters to feed to gmap
type: stringdefault: ``optional: True- gff_source
Source field to use in the output GFF
type: stringdefault: gmapoptional: True- invert_gff
Invert the GFF (T/F)
type: setdefault: Toptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
gmapdb¶
gmapdb index builder
Builds gmapdb index from a reference sequence
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- no help defined
Filesets¶
- input
The reference sequence to build a gmap database with.
type: singlecategory: inputoptional: Falsepattern: */*.fasta
Parameters¶
- name
Name of the gmap index to create
type: stringdefault: gmapdboptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
hagfish¶
Run hagfish_extract & hagfish_combine
Run the preparatory steps for hagfish
Commands¶
- clean
- remove all Hagfish files
- finish
- finish up - find gaps - combine plots - create a report
- run
- Run hagfish
Filesets¶
- fasta
fasta sequence of the reference
type: singlecategory: prerequisiteoptional: Falsepattern: {}- input
- “hagfish” input files
- output
“hagfish” touch files - track what files are done - please do not touch this!
type: mapsource: inputcategory: outputoptional: Truepattern: ./touch/*.touch
Parameters¶
- circosbinsize
Binsize for generating circos formatted histograms
type: intdefault: {}optional: True- max_ok
Maximal acceptable insert size for an aligned pair. If omitted, hagfish will make an estimate
type: intdefault: 0optional: True- min_ok
Minimal acceptable insert size for an aligned pair. If omitted, hagfish will make an estimate
type: intdefault: 0optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Thu, 19 May 2011 20:49:04 +1200
kanga¶
use kanga to align short reads to a reference genome
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- run kanga
Filesets¶
- input_fasta
- Fasta input file
- output
output files
type: mapsource: rds_inputcategory: outputoptional: Truepattern: ./*.sam- output_bam
output files
type: mapsource: rds_inputcategory: outputoptional: Truepattern: ./*.bam- output_log
output log file
type: mapsource: rds_inputcategory: outputoptional: {}pattern: ./*.log.txt- rds_input
- rds (preprocessed) input files
- sfx_input
- sfx array lookup file
Parameters¶
- color_space
process for colorspace (SOLiD)
type: booleandefault: Falseoptional: True- extra_params
any extra parameters
type: stringdefault: ``optional: True- help
print this help and exit
type: booleandefault: Falseoptional: True- max_Ns
maximum number of intermediate N’s in reads before treating read as unalignable
type: integerdefault: 1optional: True- max_pair_len
accept paired end alignments with apparent length of at most this
type: integerdefault: 300optional: True- min_pair_len
accept paired end alignments with apparent length of at least this
type: integerdefault: 100optional: True- no_multireads
do not accept multiple reads aligning to the same loci
type: booleandefault: Falseoptional: True- out_format
0 - CSV loci only, 1 - CSV loci + match sequence, 2 - CSV loci + read sequence, 3 - CSV loci + read + match sequence, 4 - UCSC BED, 5 - SAM format
type: integerdefault: 0optional: True- pe_mode
0 - none, 1 - paired ends with recover orphan ends, 2 - paired end no orphan recovery
type: integerdefault: 0optional: True- quality
fastq quality scoring- 0 - sanger, 1m - Illumina 1.3+, 2 - Solexa < 1.3, 3 - Ignore quality
type: integerdefault: 3optional: True- thread_num
number of processing threads (0 sets threads to number of CPU cores)
type: integerdefault: 0optional: True- trim3
trim this number of bases from 3’ end of reads when loading raw reads
type: integerdefault: 0optional: True- trim5
trim this number of bases from 5’ end of reads when loading raw reads
type: integerdefault: 0optional: True- version
print version information and exit
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- unknown
kangar_pe¶
use kangar to pre process raw fq reads
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- run kangar
Filesets¶
- fq_forward_input
- fastq input files - forward - containing the 5’ end
- fq_reverse_input
fastq input files directory - reverse - containing the 3’ end
type: mapsource: fq_forward_inputcategory: inputoptional: Truepattern: */*_2.fq- output_log
output log file
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*.log.txt- rds_output
output rds file
type: mapsource: fq_forward_inputcategory: outputoptional: Truepattern: ./*.rds
Parameters¶
- extra_params
any extra parameters
type: stringdefault: ``optional: True- help
print this help and exit
type: booleandefault: Falseoptional: True- mode
processing mode 0 - single end create, 1 - paired end create, 2 - output statistics 3 - dump as fasta
type: integerdefault: 0optional: True- quality
fastq quality scoring- 0 - sanger, 1m - Illumina 1.3+, 2 - Solexa < 1.3, 3 - Ignore quality
type: integerdefault: 3optional: True- reads_num
limit number of reads (or dumps) in each input file to this many, 0 if no limit
type: integerdefault: 0optional: True- rm_duplicates
remove duplicate reads retaining only one
type: booleandefault: Falseoptional: True- trim3
trim this number of bases from 3’ end of sequence
type: integerdefault: 0optional: True- trim5
trim this number of bases from 5’ end of sequence
type: integerdefault: 0optional: True- version
print version information and exit
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- unknown
kangar_se¶
use kangar to pre process raw fq single end reads
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- run kangar
Filesets¶
- fq_input
- fastq input files - forward - containing the 5’ end
- output_log
output log file
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.log.txt- rds_output
output rds file
type: mapsource: fq_inputcategory: outputoptional: Truepattern: ./*.rds
Parameters¶
- extra_params
any extra parameters
type: stringdefault: ``optional: True- help
print this help and exit
type: booleandefault: Falseoptional: True- mode
processing mode 0 - single end create, 1 - paired end create, 2 - output statistics 3 - dump as fasta
type: integerdefault: 0optional: True- quality
fastq quality scoring- 0 - sanger, 1m - Illumina 1.3+, 2 - Solexa < 1.3, 3 - Ignore quality
type: integerdefault: 3optional: True- reads_num
limit number of reads (or dumps) in each input file to this many, 0 if no limit
type: integerdefault: 0optional: True- rm_duplicates
remove duplicate reads retaining only one
type: booleandefault: Falseoptional: True- trim3
trim this number of bases from 3’ end of sequence
type: integerdefault: 0optional: True- trim5
trim this number of bases from 5’ end of sequence
type: integerdefault: 0optional: True- version
print version information and exit
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- unknown
kangax¶
use kangax to create the suffix array lookup database for the reference genome
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- run kangax
Filesets¶
- input_fasta
- Fasta input file
- output_log
output log file
type: mapsource: input_fastacategory: outputoptional: {}pattern: ./*.log.txt- output_sfx
output suffix array lookup
type: mapsource: input_fastacategory: outputoptional: {}pattern: ./*.sfx
Parameters¶
- block_seq_len
generated suffix blocks to hold at most this length (MB) concatenated sequences
type: integerdefault: 3300optional: True- color_space
generate for colorspace (SOLiD)
type: booleandefault: Falseoptional: True- extra_params
any extra parameters
type: stringdefault: ``optional: True- help
print this help and exit
type: booleandefault: Falseoptional: True- reference_species
reference species
type: stringdefault: ``optional: False- target_dep
generate target file only if missing or older than any independent source files
type: booleandefault: Falseoptional: True- version
print version information and exit
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- unknown
map¶
Execute a “map” ad-hoc analysis
Execute one command, on a number of input files.
Commands¶
- run
- no help defined
Filesets¶
- input
- “map” input files
- output
“map” output files
type: mapsource: inputcategory: outputoptional: Truepattern: ./*
Parameters¶
- process
The command to execute
type: stringdefault: {}optional: False
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Wed Mar 30 06:02:01 2011
map2¶
Execute a “map2” ad-hoc analysis
Execute one command, on a number of input files.
Commands¶
- run
- no help defined
Filesets¶
- input1
- “map” input files set 1
- input2
“map” input files set 2
type: mapsource: input1category: inputoptional: Falsepattern: */*- output
“map” output files
type: mapsource: input1category: outputoptional: Truepattern: ./*
Parameters¶
- process
The command to execute
type: stringdefault: Trueoptional: False
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Wed Mar 30 06:02:01 2011
map22¶
Execute a “map22” ad-hoc analysis - two input files, two output files
Execute one command, on a number of input files.
Commands¶
- run
- no help defined
Filesets¶
- input1
- “map” input files set 1
- input2
“map” input files set 2
type: mapsource: input1category: inputoptional: Falsepattern: */*- output1
“map” output files
type: mapsource: input1category: outputoptional: Truepattern: ./*- output2
“map” output files
type: mapsource: input1category: outputoptional: Truepattern: ./*
Parameters¶
- process
The command to execute
type: stringdefault: Trueoptional: False
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Wed Mar 30 06:02:01 2011
maq_pe¶
Generate alignments in SAM format given paired end reads using Maq.
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run maq’s fasta2bfa, fastq2bfq and map.
Filesets¶
- bam_output
bam alignment output file
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*.bam- bfa_output
BFA Index name
type: singlecategory: otheroptional: {}pattern: {}- bfq_forward_output
bfq files - forward files
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*_1.bfq- bfq_reverse_output
bfq files - reverse files
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*_2.bfq- fa_input
- directory with reference fasta file name
- fq_forward_input
- fastq input files directory - forward files
- fq_reverse_input
fastq input files directory - reverse files
type: mapsource: fq_forward_inputcategory: inputoptional: {}pattern: */*_2.fq- map_output
maq map output files
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*.map
Parameters¶
- disable_sw
disable Smith-Waterman alignment
type: booleandefault: Falseoptional: True- extra_parameters
Any extra parameters
type: stringdefault: ``optional: True- first_read_len
length of the first read (<=127)s
type: integerdefault: 0optional: True- match_in_colorspace
match in the colorspace
type: booleandefault: Falseoptional: True- max_dist_read_pairs
max distance between two paired reads s
type: integerdefault: 250optional: True- max_dist_RF_read_pairs
max distance between two RF paired reads s
type: integerdefault: 0optional: True- max_mismatch_qual_sum
maximum allowed sum of qualities of mismatches
type: integerdefault: 70optional: True- max_num_hits_out
max number of hits to output. >512 for all 01 hits.
type: integerdefault: 250optional: True- num_mismatch_24bp
number of mismatches in the first 24bp
type: integerdefault: 2optional: True- read_ref_diff_rate
rate of difference between reads and references
type: floatdefault: 0.001optional: True- sec_read_len
length of the second read (<=127)s
type: integerdefault: 0optional: True- trim_all_reads
trim all reads (usually not recommended)
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Dec 03 17:06:48 2010
- Modification date
- unknown
maq_se¶
Generate alignments in SAM format given single end reads using Maq.
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run maq’s fasta2bfa, fastq2bfq and map.
Filesets¶
- bam_output
bam alignment output file
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.bam- bfa_output
BFA Index name
type: singlecategory: otheroptional: {}pattern: {}- bfq_output
bfq files - forward files
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.bfq- fa_input
- directory with reference fasta file name
- fq_input
- fastq input files
- map_output
maq map output files
type: mapsource: fq_inputcategory: outputoptional: {}pattern: ./*.map
Parameters¶
- disable_sw
disable Smith-Waterman alignment
type: booleandefault: Falseoptional: True- extra_parameters
other parameters
type: stringdefault: ``optional: True- match_in_colorspace
match in the colorspace
type: booleandefault: Falseoptional: True- max_mismatch_qual_sum
maximum allowed sum of qualities of mismatches
type: integerdefault: 70optional: True- max_num_hits_out
number of mismatches in the first 24bp
type: integerdefault: 250optional: True- num_mismatch_24bp
number of mismatches in the first 24bp
type: integerdefault: 2optional: True- read_ref_diff_rate
rate of difference between reads and references
type: floatdefault: 0.001optional: True- trim_all_reads
trim all reads (usually not recommended)
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers, Yogini Idnani
- Creation date
- Wed Dec 02 17:06:48 2010
- Modification date
- unknown
mummer¶
mummer
Run mummer between two sequences
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Run mummer
Filesets¶
- input
- Set 1 input fasta files
- reference
- Set 1 input fasta files
Parameters¶
- base
base name for all generated files
type: {}default: outoptional: True- breaklen
Set the distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200)
type: integerdefault: 200optional: True- genomecenter
genome center - used in the AGP file
type: {}default: pflnzoptional: True- gff_source
GFF source field
type: {}default: mumscaffoptional: True- linker
linker sequence for the merged output sequence
type: {}default: NNNNNNCTAGCTAGCATGNNNNNNoptional: True- matchmode
use all matching fragments (max) or only unique matchers (mum)
type: setdefault: mumoptional: True- mum_plot_raw
plot an alternative visualization where mummer does not attempt to put the sequences in the correct order
type: booleandefault: Falseoptional: True- organism
Organism name - used in the AGP file
type: {}default: ``optional: True- taxid
Taxonomy id - used in the AGP file
type: {}default: ``optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
ncbi¶
Download data from NCBI
Download a set of sequences from NCBI based on a query string ncbi_query and database ncbi_db. This tempate will run only once, after a succesful run it creates a lock file that you need to remove to rerun
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Download from NCBI
Parameters¶
- db
NCBI database
type: stringdefault: nuccoreoptional: True- query
NCBI query (for example txid9397[Organism%3Aexp])
type: stringdefault: ``optional: False- rename_sequence
try to rename the sequence - note, this does not work if you are downloading more than one sequence
type: booleandefault: Falseoptional: True- sequence_name
Name of the file to write the downloaded sequences to. Use ‘from_dir’ to have the sequence name extracted from the directory name
type: stringdefault: outoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Nov 10 07:56:48 2010
newjobtest¶
Execute a “simple” ad hoc analysis
Execute one command, No in or output files are tracked by Moa.
Commands¶
- run
- no help defined
Parameters¶
- process
The command to execute
type: stringdefault: {}optional: False
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Wed Mar 30 06:02:01 2011
orthomcl¶
Run OrthoMCL
Execute one command, No in or output files are tracked by Moa.
Commands¶
- run
- no help defined
Parameters¶
- db
Db name
type: stringdefault: {}optional: False- eval
Evalue cutoff for blast to use
type: stringdefault: 1e-5optional: True- group_prefix
OrthoMCL prefix for group names
type: stringdefault: g_optional: True- host
Db Host
type: localhostdefault: {}optional: True- input_dir
Input directory with compliant (read the manual) fasta files
type: stringdefault: {}optional: False- login
Db username
type: stringdefault: Noneoptional: False- mcl_i
mcl -i value
type: floatdefault: 1.5optional: True- num_threads
Number of threads to use
type: integerdefault: 4optional: True- pass
Db password
type: stringdefault: Noneoptional: False- port
Db port
type: integerdefault: 3306optional: True- prefix
OrthoMCL prefix for the database tables
type: stringdefault: orthooptional: True- vendor
Db vendor
type: stringdefault: mysqloptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Wed Mar 30 06:02:01 2011
project¶
Create a project
Placeholder for a Moa Project
Commands¶
- run
- This template does not do anything - it is a project placeholder.
Parameters¶
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue, 10 Jan 2012 14:54:39 +1300
- Modification date
- Wed Nov 10 07:56:48 2010
recursive_map¶
Recursively map a genome to the reference
Recursively map a resequence dataset against a reference genome
Commands¶
- run
- recusive map
Filesets¶
- fq_forward
- fastq input files directory - forward
- fq_reverse
fastq input files directory - reverse
type: mapsource: fq_forwardcategory: inputoptional: Truepattern: */*_2.fq- output
base output filename
type: singlecategory: outputoptional: Truepattern: output
reference
type: singlecategory: prerequisiteoptional: Falsepattern: */*
Parameters¶
- iterations
no of iterations to run
type: integerdefault: 3optional: True- param_first
First set of parameters - get the low hanging fruit
type: stringdefault: –fastoptional: True- param_second
Second set of parameters - more sensitive
type: stringdefault: –very-sensitiveoptional: True- threads
Number of threads to use
type: integerdefault: 4optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Fri, 08 Jun 2012 13:32:30 +1200
- Modification date
- Fri, 08 Jun 2012 13:43:19 +1200
reduce¶
Execute a “reduce” ad-hoc analysis
Execute one command, on a number of input files.
Commands¶
- run
- no help defined
Filesets¶
- input
- “reduce” input files
- output
“reduce” output files
type: singlecategory: outputoptional: Truepattern: ./*
Parameters¶
- process
The command to execute
type: stringdefault: echo “input: {{ input|join(” ”)}}”; echo “output: {{ output }}”optional: False
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Wed Mar 30 06:02:01 2011
samtools_pileup¶
Print the alignment in the pileup format.
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run samtools pileup command
Filesets¶
- fasta
reference fasta file
type: singlecategory: prerequisiteoptional: Truepattern: */*.fasta- input
- bam or sam files
output
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.pileup
output_bam
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.sorted
Parameters¶
- cap_mapQ_at
cap mapping quality at INT
type: integerdefault: 60optional: True- extra_params
any extra parameters
type: stringdefault: ``optional: True- filter_read_bits
filtering reads with bits in INT
type: integerdefault: 1796optional: True- input_is_SAM
the input is in SAM
type: booleandefault: Falseoptional: True- num_haplotypes
number of haplotypes in the sample (for -c/-g)
type: integerdefault: 2optional: True- out_2nd_best
output the 2nd best call and quality
type: booleandefault: Falseoptional: True- out_GLFv3_format
output in the GLFv3 format (suppressing -c/-i/-s)
type: booleandefault: Falseoptional: True- out_maq_consensus
output the maq consensus sequence
type: booleandefault: Falseoptional: True- phred_prob_indel
phred prob. of an indel in sequencing/prep. (for -c/-g)
type: integerdefault: 40optional: True- print_variants_only
print variants only (for -c)
type: booleandefault: Falseoptional: True- prior_diff_haplotypes
phred prob. of an indel in sequencing/prep. (for -c/-g)
type: floatdefault: 0.001optional: True- prior_indel_haplotypes
number of haplotypes in the sample (for -c/-g)
type: floatdefault: 0.00015optional: True- show_lines_indels
only show lines/consensus with indels
type: booleandefault: Falseoptional: True- simple_pileup_format
simple (yet incomplete) pileup format
type: booleandefault: Falseoptional: True- theta_maq_model
number of haplotypes in the sample (for -c/-g)
type: floatdefault: 0.85optional: True- use_SOAPsnp_model
use the SOAPsnp model for SNP calling
type: booleandefault: Falseoptional: True
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Dec 15 17:06:48 2010
- Modification date
- unknown
simple¶
Execute a “simple” ad hoc analysis
Execute one command, No in or output files are tracked by Moa.
Commands¶
- run
- no help defined
Parameters¶
- process
The command to execute
type: stringdefault: {}optional: False
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue Mar 29 16:34:19 2011
- Modification date
- Wed Mar 30 06:02:01 2011
smalt_pe¶
Run SMALT on an set of input files (query) vs a database index.
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- Execute SMALT with with paired-end fastq
Filesets¶
- db
The (basename of the) smalt database to use.
type: singlecategory: prerequisiteoptional: Falsepattern: ../10.smaltdb/db- fasta
reference fasta file
type: singlecategory: prerequisiteoptional: Falsepattern: *.fasta- fq_forward_input
- fastq input files directory - forward
- fq_reverse_input
fastq input files directory - reverse
type: mapsource: fq_forward_inputcategory: inputoptional: Truepattern: */*_2.fq- output
output BAM file (automatically converted & filtered for reads that to not map)
type: mapsource: fq_forward_inputcategory: outputoptional: {}pattern: ./*.sam
Parameters¶
- extra_params
extra parameters to feed to smalt
type: stringdefault: ``optional: True- max_insertsize
Maximum allowed insertsize
type: integerdefault: 250optional: True- min_insertsize
Minimum allowed insertsize
type: integerdefault: 1optional: True- output_format
output format (sam or samsoft)
type: {}default: samoptional: True- pairtype
pair type (pe: fr/illumina short; mp: rf/illumina mate pairs or pp: ff
type: {}default: peoptional: True- threads
No threads to use
type: intdefault: 4optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Tue, 27 Mar 2012 10:05:40 +1300
- Modification date
- Tue, 27 Mar 2012 10:31:09 +1300
smaltdb¶
Smalt index builder
Builds a smalt index from a reference sequence
Commands¶
- clean
- Remove all job data, not the Moa job itself, note that this must be implemented by the template.
- run
- Create the smalt index
Filesets¶
- input
Input fasta file for the smalt database
type: singlecategory: inputoptional: Falsepattern: */*.fasta- output
database name to create
type: singlecategory: outputoptional: {}pattern: db
Parameters¶
- word_length
word length
type: intdefault: 10optional: True- word_spacing
word spacing
type: intdefault: 6optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Wed Nov 10 07:56:48 2010
- Modification date
- Wed Dec 09 07:56:48 2010
soapdenovo_pe¶
Run Soapdenovo
Commands¶
- clean
- Remove all job data
- run
- Execute soapdenovo in paired-end mode
Filesets¶
- fq_forward
- fastq input files directory - forward
- fq_reverse
fastq input files directory - reverse
type: mapsource: fq_forwardcategory: inputoptional: Truepattern: */*_2.fq- output
soap denovo output file
type: singlecategory: outputoptional: Truepattern: output.scafSeq
Parameters¶
- avg_insert
library insert size
type: integerdefault: 200optional: {}- executable_not_used_anymore
which executable to use (SOAPdenovo-127mer, SOAPdenovo-31mer or SOAPdenovo-63mer)
type: {}default: SOAPdenovo-31meroptional: True- kmer
kmer size
type: integerdefault: 31optional: True- skip_config_file
skip automatic config file generation - if you skip this, make sure that you have a soap.config configuration file in the current directory
type: booleandefault: Falseoptional: True- threads
no threads to use
type: integerdefault: 8optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Mon, 21 Nov 2011 12:47:16
- Modification date
- Mon, 21 Nov 2011 12:47:22
soapdenovo_postprocess¶
Run Soapdenovo
Commands¶
- run
- Postprocess - run GapCloser & SSpace
Filesets¶
- fq_forward
- fastq input files directory - forward
- fq_reverse
fastq input files directory - reverse
type: mapsource: fq_forwardcategory: inputoptional: Truepattern: */*_2.fq- input
input scaffold to process
type: singlecategory: inputoptional: Falsepattern: {}- output
output file to generate
type: singlecategory: outputoptional: Truepattern: final.fasta
Parameters¶
- avg_insert
library insert size
type: integerdefault: 200optional: {}- noruns
no times to run gapcloser & SSPace
type: integerdefault: 2optional: True- run_sspace
run SSPace? use
type: booleandefault: Trueoptional: True- sspace_executable
SSPace executable
type: {}default: SSPACE_Basic_v2.0.ploptional: True- sspace_extra_variables
Extra variables to pass to Sspace
type: {}default: ``optional: True- threads
no threads to use
type: integerdefault: 8optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Mon, 21 Nov 2011 12:47:16
- Modification date
- Mon, 21 Nov 2011 12:47:22
statsidx¶
Retrieve and print stats from BAM file to an index file
Commands¶
- clean
- Remove all job data, not the Moa job itself
- run
- run samtools idxstats
Filesets¶
- input
- bam input files directory - forward files
output
type: mapsource: inputcategory: outputoptional: {}pattern: ./*.index
Parameters¶
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Dec 08 17:06:48 2010
- Modification date
- unknown
sync¶
Sync directories
Create this directory in sync with another directory
Commands¶
- run
- Sync!
Parameters¶
- ignore
ignore these names (space separated list)
type: {}default: ``optional: True- original
The local directory to use as a source. If the target (based on what is in the source) does not exists, this directory is copied. If the target exists - only the configuration is copied, and all directory contents are left alone. If this parameter is omitted, the directory with the most recently changed moa configuration.
type: stringdefault: {}optional: True- recursive
copy the jobs/config recursively
type: booleandefault: Falseoptional: True- source
The directory to keep in sync with. If not specified, this template just keeps local directory synced
type: stringdefault: {}optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Thu, 30 Jun 2011 21:26:19
- Modification date
- Thu, 30 Jun 2011 21:25:53
unittest¶
Template used in testing - has no other purpose
Commands¶
- clean
- Remove all job data
- prepare
- prepare for the unittest
- run
Prepare & Run
run delegates execution to: prepare, run2
- run2
- actually run
Filesets¶
- input_1
- Input file set 1
- input_2
Input file set 2
type: mapsource: input_1category: inputoptional: {}pattern: in2/*_2.txt- output
output files
type: mapsource: input_1category: outputoptional: {}pattern: ./*.out
Parameters¶
- test_string
Test string values
type: stringdefault: {}optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Yogini Idnani, Mark Fiers
- Creation date
- Wed Nov 25 17:06:48 2010
- Modification date
- unknown
wget¶
wget
Use WGET to download files. This template has two modi, one is set wget_mode to mirror data, in which case both wget_url and wget_pattern (default *) are used. The other modus is wget_mode=get, when one file defined by wget_url is downloaded. In the mirror mode it is possible to download only those files that are newer as the files already downloaded by using the wget_timestamp parameter
Commands¶
- run
- Download
Parameters¶
- pass
Password for the remote site (note - this is not very safe, the password will be stored in plan text
type: passworddefault: ``optional: True- url
The url of the file to download
type: stringdefault: {}optional: False- user
Username for the remote site
type: stringdefault: ``optional: True
miscellaneous¶
- Backend
- ruff
- Author
- Mark Fiers
- Creation date
- Thu, 02 Jun 2011 10:22:31 +1200
- Modification date
- Thu, 02 Jun 2011 10:22:53 +1200
Moa API¶
moa.actor¶
‘Simple’ wrapper around subprocess to execute code
-
moa.actor.
async
(f)¶ decorator designating an actor to be asynchronous
-
moa.actor.
getLastStderr
(job)¶ Get the last stderr
-
moa.actor.
getLastStdout
(job)¶ Get the last stdout
-
moa.actor.
getRecentOutDir
(job)¶ Return the most recent output directory
-
moa.actor.
simpleRunner
(wd, cl, conf={}, **kwargs)¶ Don’t think - just run - here & now
what does this function do? - put env in the environment - Execute the commandline (in cl) - store stdout & stderr in log files - return the rc
-
moa.actor.
sync
(f)¶ decorator designating an actor to be synchronous
moa.commands¶
Handle Moa commands (i.e. anything that you can run as moa COMMAND on the commandline
-
exception
moa.exceptions.
CannotGetAFileLock
(f)¶ Cannot get a file lock
-
exception
moa.exceptions.
MoaCommandDoesNotExist
¶ Command does not exists?
-
exception
moa.exceptions.
MoaDirNotWritable
¶ Moa directory is not writable
-
exception
moa.exceptions.
MoaFileError
¶ Error handling a file
-
exception
moa.exceptions.
MoaInvalidCommandLine
¶ Invalid command line
-
exception
moa.exceptions.
MoaPermissionDenied
¶ Permission denied - you do not have the rights to perform this operation
-
exception
moa.exceptions.
NotAMoaDirectory
(dir)¶ This is not a moa directory
-
exception
moa.exceptions.
NotAMoaTemplate
(template)¶ This is not a valid moa template
fileset - define sets of in&output files¶
-
moa.filesets.
render
(job)¶ render all filesets - i.e. figure out what files belong in what sets
moa.job¶
-
class
moa.job.
Job
(wd)¶ Class defining a single job
Note - in the moa system, there can be only one current job - many operations try to access the job in sysConf
>>> wd = tempfile.mkdtemp() >>> job = Job(wd) >>> assert(isinstance(job, Job)) >>> assert(job.template.name == 'nojob')
-
checkCommands
(command)¶ Check command, and rearrange if there are delegates.
>>> job = newTestJob('unittest')
## >>> assert(job.template.commands.run.delegate == [‘prepare’, ‘run2’]) ## >>> assert(job.checkCommands(‘run2’) == [‘run2’]) ## >>> assert(job.checkCommands(‘run’) == [‘prepare’, ‘run2’]) ## >>> assert(job.checkCommands(‘prepare’) == [‘prepare’])
Parameters: commands (list of strings) – The list of commands to check Returns: The checked list of commands Return type: list of strings
-
checkConfDir
()¶ Check if the configuration directory exists. If not create it.
>>> job = newTestJob('unittest') >>> confdir = os.path.join(job.wd, '.moa') >>> assert(os.path.exists(confdir)) >>> import shutil >>> shutil.rmtree(confdir) >>> assert(os.path.exists(confdir) == False) >>> job.checkConfDir() >>> assert(os.path.exists(confdir))
-
defineCommands
(commandparser)¶ Register template commands with the argparser
-
defineOptions
(parser)¶ Set command line options - deferred to the backend - PER COMMAND
>>> job = newTestJob('unittest') >>> import optparse >>> parser = optparse.OptionParser() >>> job.defineOptions(parser)
-
execute
(job, args, **kwargs)¶ Execute command in the context of this job. Execution is alwasy deferred to the backend
#Note: this is the function that will be called from argparse #Note: Uncertain how to test verbose & silent
Parameters: - verbose (Boolean) – output lots of data
- silent (Boolean) – output nothing
-
finishExecute
()¶ Finish the run!
-
getFiles
()¶ Return all moa files - i.e. all files crucial to this job.
-
hasCommand
(command)¶ Check if this job defines a certain command
Warning
THIS METHOD DOES NOT WORK PROPERLY YET
>>> job = newTestJob('unittest') >>> assert(job.hasCommand('run'))
### >>> assert(job.hasCommand(‘dummy’))
-
init2
()¶ Continue initialization
-
initialize
()¶ Initialize a new job in the current wd
-
isMoa
()¶ Check if this is a Moa directory - Currently, this needs to be overridden
TODO: check if this ever gets called
-
loadBackend
()¶ load the backend
-
loadTemplate
()¶ Load the template for this job, based on what configuration can be found
-
prepareExecute
()¶ Give this job a chance to prepare for execution.
-
refreshTemplate
()¶ Reload the template into the local .moa/template.d directory
>>> job = newTestJob('unittest') >>> templ = os.path.join(job.confDir, 'template.d', 'unittest.jinja2') >>> assert(os.path.exists(templ)) >>> os.unlink(templ) >>> assert(not os.path.exists(templ)) >>> job.refreshTemplate() >>> assert(os.path.exists(templ))
-
run_hook
(hook, **kwargs)¶ Shortcut to run a job plugin hook
-
setTemplate
(name, provider=None)¶ Set a new template for this job
>>> job = newTestJob('unittest') >>> job.setTemplate('simple') >>> afile = os.path.join(job.confDir, 'template.d', 'simple.jinja2') >>> assert(os.path.exists(afile))
-
-
moa.job.
newJob
(job, template, title, parameters=[], provider=None)¶ Create a new job in the wd and return the proper job object currently only makefile jobs are supported - later we’ll scan the template, and instantiate the proper job type
Parameters: - job – Job object to fill - needs only wd set.
- template (String) – Template name for this job
- parameters (list of (key, value) tuples) – A list of parameters to set for this job
Return type: instance of
moa.job.Job
-
moa.job.
newTestJob
(template, title='Test job', provider=None)¶ for testing purposes - creates a temporary directory and uses that to instantiate a job. This function returns the job object created
>>> job = newTestJob(template = 'simple', title='test title') >>> assert(isinstance(job, Job)) >>> assert(os.path.exists(job.wd)) >>> assert(job.conf.title == 'test title') >>> assert(os.path.exists(os.path.join(job.wd, '.moa'))) >>> assert(os.path.exists(os.path.join(job.wd, '.moa', 'template')))
### >>> assert(job.template.name == ‘simple’)
Returns: the created job Return type: instance of moa.job.Job
moa.jobConf¶
moa job configuration
-
class
moa.jobConf.
JobConf
(job)¶ to distinguish between attributes of this object & proper job configuration parameters
-
doNotCheck
= None¶ these fields are not be type-checked
-
doNotSave
= None¶ these fields are not to be saved
-
getRendered
(key)¶ Get the rendered value of this key
-
isEmpty
()¶ Check if the config is empty is empty
-
isPrivate
(k)¶ Is this a private variable? can be locally defined or in the template definition
-
keys
()¶ return a dict with all known parameters and values, either defined in the job configuration of the template
-
load
(confFile, delta=None)¶ Load a configuration file
Parameters: delta – if a value appears to be a relative path, try to correct for this. Currently this only works for files that exist. i.e.
-
private
= None¶ these fields are private (i.e. not to be displayed by default)
-
save
()¶ Save the conf to disk
-
setRecursiveVar
(k, v)¶ Register a recursive variable
-
-
class
moa.logger.
MoaFormatter
(fmt=None, datefmt=None)¶ A somewhat more advanced formatter
-
format
(record)¶ Defines two extra fields in the record class, upon formatting: - visual, a visual indication of the severity of the message - tb, a formatted traceback, used when sending mail @param record: the log message record
-
moa.sysConf¶
Store Moa wide configuration
moa.ui¶
communicate information to the user
-
moa.ui.
askUser
(parameter, default='', xtra_history=None)¶ Parameters: - parameter – paramter to ask value of
- default – default value - if absent use the last history item
- xtra_history – extra history file to show to the user
moa.utils¶
A set of random utilities used by Moa
-
moa.utils.
deprecated
(func)¶ Decorator function to flag a function as deprecated
Parameters: func – any function
-
moa.utils.
flog
(f)¶ A simple logger - uses the
moa.logger
code to log the calling function. Use as a decorator:@moa.utils.flog def any_function(*args); ...
This is for debugging purposes (obviously)
Parameters: func – Any python function
-
moa.utils.
getCwd
()¶ Do not use os.getcwd() - need to make sure symbolic links do not get dereferenced
hijacked some code from: http://stackoverflow.com/questions/123958/how-to-get-set-logical-directory-path-in-python
-
moa.utils.
getMoaBase
()¶ Return MOABASE - the directory where Moa is installed. This function also sets an environment variable MOABASE
>>> d = getMoaBase() >>> assert(os.path.isdir(d)) >>> assert(os.path.isfile(os.path.join(d, 'README'))) >>> assert(os.path.isdir(os.path.join(d, 'lib')))
Return type: string (path)
-
moa.utils.
getProcessInfo
(pid)¶ Return some info on a process
-
moa.utils.
moaDirOrExit
(job)¶ Check if the job contains a proper Moa job, if not, exit with an error message and a non-zero exit code.
Parameters: job – An instance of moa.job.Job
-
moa.utils.
niceRunTime
(d)¶ Nice representation of the run time d is time duration string
-
moa.utils.
printstack
(func)¶ Decorator function to print stack
Parameters: func – any function
-
moa.utils.
removeIndent
(txt)¶ Removes indentation from a txt - for use by moa.args and moa.api
-
moa.utils.
sendmail
(server, sender, recipient, subject, message)¶ Send an email.
-
moa.utils.
simple_decorator
(decorator)¶ This decorator can be used to turn simple functions into well-behaved decorators, so long as the decorators are fairly simple. If a decorator expects a function and returns a function (no descriptors), and if it doesn’t modify function attributes or docstring, then it is eligible to use this. Simply apply @simple_decorator to your decorator and it will automatically preserve the docstring and function attributes of functions to which it is applied.
Note; I got this code from somehwere, but forgot where exactly. This seems the most likely source:
http://svn.navi.cx/misc/trunk/djblets/djblets/util/decorators.py
moa.template¶
moa.template¶
Store information on a template. This module is also responsible for retrieving template information.
-
moa.template.
initTemplate
(*args, **kwargs)¶
-
moa.template.
installTemplate
(wd, tName, provider=None)¶ Initialize the template - this means - try to figure out where the template came from & copy the template files into job/.moa/template & job/.moa/template.d/extra.
Currently all templates come from the moa repository. In the future, multiple sources must be possible
>>> import tempfile >>> wd = tempfile.mkdtemp() >>> installTemplate(wd, 'simple') >>> templateFile = os.path.join(wd, '.moa', 'template') >>> adhocFile = os.path.join(wd, '.moa', 'template.d', 'simple.jinja2') >>> assert(os.path.exists(templateFile)) >>> assert(os.path.exists(adhocFile))
-
moa.template.
refresh
(wd)¶ Refresh the template - try to find out what the template is from {{wd}}/.moa/template.meta. If that doesn’t work, revert to the default template. If default is not specified - exit with an error
>>> import tempfile >>> wd = tempfile.mkdtemp() >>> installTemplate(wd, 'simple') >>> templateFile = os.path.join(wd, '.moa', 'template') >>> adhocFile = os.path.join(wd, '.moa', 'template.d', 'simple.jinja2') >>> os.unlink(adhocFile) >>> os.unlink(templateFile) >>> assert(not os.path.exists(templateFile)) >>> assert(not os.path.exists(adhocFile)) >>> refresh(wd) >>> assert(os.path.exists(templateFile)) >>> assert(os.path.exists(adhocFile))
moa.template.template¶
Store information on a template. This module is also responsible for retrieving template information.
-
class
moa.template.template.
Template
(wd)¶ Template extends Yaco
-
getRaw
()¶ Return a Yaco representation of the yaml-template, without any of this Template processing. This is really useful when processing a template that needs to be written back to disk
>>> import moa.job >>> job = moa.job.newTestJob(template='simple') >>> raw = job.template.getRaw() >>> assert(isinstance(raw, Yaco.Yaco)) >>> assert(raw.has_key('parameters'))
-
loadMeta
()¶ Load the template meta data for this job, based on what configuration can be found
-
moa.plugin¶
Yaco¶
Yaco provides a dict like structure that can be serialized to & from yaml. Yaco objects behave as dictionaries but also allow attribute access (loosely based on this `recipe < http://code.activestate.com/recipes/473786/>`_). Sublevel dictionaries are automatically converted to Yaco objects, allowing sublevel attribute access, for example:
>>> x = Yaco()
>>> x.test = 1
>>> x.sub.test = 2
>>> x.sub.test
2
Note that sub-dictionaries do not need to be initialized. This has as a consequence that requesting uninitialized items automatically return an empty Yaco object (inherited from a dictionary).
Yaco can be found in the Python package index and is also part of the Moa source distribution
Autogenerating keys¶
An important feature (or annoyance) of Yaco is the auto generation of keys that are not present (yet). For example:
>>> x = Yaco()
>>> x.a.b.c.d = 1
>>> assert(x.a.b.c.d == 1)
works - a, b and c are assumed to be Yaco dictionaries and d is give value 1. This makes populating data structures easy.
It might also generate some confusion when querying for keys in the Yaco structure - if a key does not exists, it automatically comes back as an empy dict or Yaco object (renders as {}). This means that if it is easy to check if a certain ‘branch’ of a Yaco datastructure exists:
>>> x = Yaco()
>>> assert (not x.a.b)
but now the following works as well:
>>> assert('a' in x)
>>> assert('b' in x.a )
So, a safe way to test a data structure, without introducing extra branches is:
>>> x = Yaco()
>>> assert(not 'a' in x)
Todo: Need to find a more elegant way of testing without introducing data structures
-
class
Yaco.
PolyYaco
(name='PY', files=[], pattern='*.config', leaf='')¶ A meta object that allows a composite Yaco object to be loaded from any number of different files which are kept as a stack of Yaco objects. If looking for a value, this object will check each of the layers in the stack and return the first value that it comes across.
Changes are only made to the toplevel object.
The goal is to have multiple configuration files, for example in:
/location/to/python/package/etc/config.yaml /etc/APPLICATION.yaml ~/.config/APPLICATION/config.yaml
and have values in the latter file override those in the former. Saving changed values will also be done to the latter, but system and application wide settings can be maintained as well (manually for the time being).
-
load
(leaf, files, pattern)¶
-
-
class
Yaco.
Yaco
(data={}, leaf=None)¶ - Originated from:
- http://code.activestate.com/recipes/473786/
>>> v= Yaco() >>> v.a = 1 >>> assert(v.a == 1) >>> assert(v['a'] == 1) >>> v= Yaco({'a':1}) >>> assert(v.a == 1) >>> assert(v['a'] == 1)
-
get_data
()¶ Prepare & parse data for export
>>> y = Yaco() >>> y.a = 1 >>> y.b = 2 >>> y._c = 3 >>> assert(y._c == 3) >>> d = y.get_data() >>> assert('a' in d) >>> assert('b' in d) >>> assert(not 'c' in d) >>> y._private = ['b'] >>> d = y.get_data() >>> assert('a' in d) >>> assert(not 'b' in d) >>> assert(not '_c' in d)
-
load
(from_file, leaf=None)¶ Load this dict from_file
Note - it can load the file into a leaf, instead of the root of this Yaco structure. Note - the leaf variable is a string, but may contain dots (which are automatically interpreted)
>>> import tempfile >>> tf = tempfile.NamedTemporaryFile(delete=True) >>> tf.close() >>> x = Yaco({'a': [1,2,3, [1,2,3, {'d' : 4}]], ... 'b': 4, 'c': '5', 'uni' : "Aπ"}) >>> x.save(tf.name) >>> y = Yaco() >>> y.load(tf.name) >>> assert(y.a[3][3].d == 4) >>> assert(sys.version_info[0] == 2 or y.uni == "Aπ")
-
pretty
()¶ Return data as a pprint.pformatted string
-
save
(to_file, doNotSave=[])¶
-
simple
()¶ return a simplified representation of this Yaco struct - remove Yaco from the equation - and all object reference. Leave only bool, float, str, lists, tuples and dicts
>>> x = Yaco() >>> x.y.z = 1 >>> assert(isinstance(x.y, Yaco)) >>> s = x.simple() >>> assert(s['y']['z'] == 1) >>> assert(isinstance(s['y'], dict)) >>> assert(not isinstance(s['y'], Yaco))
-
soft_update
(data)¶ As update - but only update keys that do not have a value.
Note - lists are completely
>>> d1 = {'a' : [1,2,3,{'b': 12}], 'd' : {'e': 72}} >>> d2 = {'a' : [2,3,4,{'b': 12}], 'd' : {'e': 73, 'f': 18}, 'c' : 18} >>> v = Yaco(d1) >>> assert(v.a[2] == 3) >>> assert(v.d.e == 72) >>> v.soft_update(d2) >>> assert(v.d.e == 72) >>> assert(v.d.f == 18) >>> assert(v.a[2] == 3)
-
update
(data)¶ >>> v = Yaco({'a' : [1,2,3,{'b' : 12}]}) >>> assert(v.a[3].b == 12)
>>> v = Yaco({'a' : [1,2,3,[1,{'b' : 12}]]}) >>> assert(v.a[3][1].b == 12)
-
class
Yaco.
YacoDir
(dirname, pattern='*.config')¶ As Yaco, but load all files in a directory on top of each other.
Order of loading is the alphanumerical sort of filenames
files in subdirectories are loaded into leaves e.g. a file in /tmp/test/sub/a.yaml with only (x=1) will end up as follows:
y = YacoDir(‘/tmp/test’) y.sub.x == 1Note, YacoDir will try to cache itself in a .yacodir.cache file in the root of the dirname if the modification date of this file is the same as the directory - that will be loaded instead.
-
load
(dirname, pattern)¶ Load from the defined directory
-
save
()¶ Save is disabled.
-
fist¶
Filesets
Handle & manipulate sets of files
This module aims at providing classes to handle and manipulate sets of
files. Two simple examples are a simple set containing one file
(fist.fistSingle
) or a glob based set of files
(fist.fistFileset
). A more complicated example is
fistMapset
that maps another fileset based on a pattern.
Each fileset inherits from list - hence fist filesets behave as lists.
Future work should allow the definition of remote filesets (for example over http or ssh).
Each fist class is istantiated with a url defining the file(set). In
the case of fist.fistFileset
this url contains a globbing
characters:
fs = fist.fistFileset('/tmp/*,txt')
This fileset object contains a list with all *.txt files in /tmp. Subsequently it is possible to map this set
-
class
fist.
fistCore
(url, context=None)¶ Core class for all fist classes
-
resolve
()¶ This function needs to be overridden context
-
-
class
fist.
fistFileset
(url, context=None)¶ Most basic set of files - handle a set of files described by a single URI with wildcards, for example:
* `*.txt` * `../*.txt` * `file:///home/name/data/*.txt`
>>> f = fistFileset('*.txt') >>> assert(f.path=='.') >>> assert(f.glob=='*.txt') >>> assert(f.path=='.') >>> assert(f.glob=='*.txt') >>> f = fistFileset('/tmp') >>> assert(f.path=='/tmp') >>> assert(f.glob=='*') >>> f = fistFileset('/tmp/*.txt') >>> assert(f.path=='/tmp') >>> assert(f.glob=='*.txt') >>> f = fistFileset('../*.txt') >>> assert(f.path=='..') >>> assert(f.glob=='*.txt') >>> f = fistFileset(os.path.join(wd, 'in', '*.txt')) >>> f.resolve() >>> assert(len(f) == 100) >>> f = fistFileset(os.path.join(wd, 'in', 'in1*.txt')) >>> f.resolve() >>> assert(len(f) == 10) >>> f = fistFileset('~/*') >>> f.resolve() >>> assert(len(f) > 0)
-
class
fist.
fistMapset
(url, context=None)¶ Map set - map a fileset based on a target uri
>>> f = fistFileset(os.path.join(wd, 'in', '*')) >>> f.resolve() >>> assert(len(f) == 100) >>> ## >>> ## Null mapping >>> ## >>> m = fistMapset('*/*') >>> m.resolve(f) >>> assert(len(m) == 100) >>> assert(os.path.join(wd, 'in/in18.txt') in m) >>> ## >>> ## simple folder mapping >>> ## >>> m = fistMapset('out/*') >>> m.resolve(f) >>> assert(len(m) == 100) >>> assert('out/in18.txt' in m) >>> ## >>> ## simple folder mapping >>> ## >>> m = fistMapset('./*') >>> m.resolve(f) >>> assert(len(m) == 100) >>> assert('./in18.txt' in m) >>> ## >>> ## simple folder & mapping & extension append >>> ## >>> m = fistMapset('out/*.out') >>> m.resolve(f) >>> assert(len(m) == 100) >>> assert('out/in18.txt.out' in m) >>> ## >>> ## New from fileset - now with a pattern defining the extension >>> ## >>> f = fistFileset(os.path.join(wd, 'in', '*.txt')) >>> f.resolve() >>> ## >>> ## extension mapping >>> ## >>> m = fistMapset('out/*.out') >>> m.resolve(f) >>> assert(len(m) == 100) >>> assert('out/in18.out' in m) >>> ## >>> ## New from fileset - now with a pattern defining file glob & >>> ## extension >>> ## >>> f = fistFileset(os.path.join(wd, 'in', 'in*.txt')) >>> f.resolve() >>> ## >>> ## more complex filename mapping >>> ## >>> m = fistMapset('out/test*.out') >>> m.resolve(f) >>> assert(len(m) == 100) >>> assert('out/test18.out' in m) >>> ## >>> ## mapping keeping the extension the same >>> ## >>> m = fistMapset('out/test*.txt') >>> m.resolve(f) >>> assert(len(m) == 100) >>> assert('out/test18.txt' in m)
-
resolve
(mapFrom)¶ Resolve the mapped set based on a input fileSet
-
resolver
(mapFrom, list)¶ map all files in the incoming list
-