This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Workflows with Common Workflow Language: Setup

Software

These lessons are best followed using VSCode, and the Benten extension (which is a language server for CWL). We will also be using the CWL reference runner. Instructions for installing these are given below.

VSCode and Benten

  1. Download and install VSCode.

  2. Open Benten in the marketplace and click the Install button or follow the directions. The VSCode Benten extension will require the Benten server to be installed too. It will prompt you to do this the first time you activate the extension.

Docker, cwltool, and graphviz

This tutorial requires three pieces of software to run and visualize the workflows: Docker, cwltool, and graphviz. Please follow instructions for your OS by clicking on the relevant tab below.

Windows users first need to install the “Windows Subsystem for Linux 2” (WSL2) and the Docker Desktop before installing the cwltool. Follow the steps below, taken from the offical setup guide:

  1. Install Windows Subsystem for Linux 2 (WSL2), and Docker Desktop
  2. Install Debian from the Microsoft Store
  3. Set Debian as your default WSL 2 distro: wsl --set-default debian
  4. Return to the Docker Desktop, choose Settings → Resources → WSL Integration and under “Enable integration with additional distros” select “Debian”
  5. Reboot
  6. Launch Debian
  7. Open Remote - WSL to install the “Remote - WSL” extension for VS Code by clicking the Install button or by following the directions.
  8. After installation choose “Open a Remote - WSL Window” and then “New WSL Window”. Your VS Code window should now say “WSL: Debian” in green at the lower left corner.
  9. Enable the Benten CWL extension in this “WSL : Debian” window: press `Ctrl+Shift+X” to open the “Extensions” and click the “Install in WSL: Debian” button.
  10. Choose TerminalNew Terminal. Execute sudo apt-get update && sudo apt-get install -y python3-venv wget in the terminal
  11. Install cwltool using the instructions under the ‘Linux’ tab

Linux users already have a Bash terminal and can start with following the steps below.

  1. Install docker
  2. Enable docker usage as a non-root user
  3. Install the latest version of cwltool. To ensure this, a virtual environment using pip and venv is used.
     python3 -m venv env			# Create a virtual environment named 'env' in the current directory
     source env/bin/activate			# Activate the 'env' environment
    

    The virtual environment needs to be activated every time you start the terminal using the source env/bin/activate command.

    Next, install cwltool.

     pip install cwltool
    
  4. For the visualisation of the workflow, please install graphviz:
    sudo apt-get install -y graphviz
    

Mac users already have a Terminal program and should follow the steps below:

  1. Install docker
  2. Install miniconda
  3. Create a virtual environment using conda
     $ conda create --name cwltutorial
    
  4. Activate the virtual environment
     $ conda activate cwltutorial
    
  5. Install cwltool and graphviz using conda
     $ conda install -c bioconda cwltool
     $ conda install -c anaconda graphviz
    

The virtual environment needs to be activated every time you start the terminal using conda activate cwltutorial.

Confirm the software is installed correctly

To confirm docker is installed, run the following command to display the version number:

$ docker version

You should see something similar to the output shown below.

Client: Docker Engine - Community
 Version:           20.10.13
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 10 14:08:15 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.13
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       906f57f
  Built:            Thu Mar 10 14:06:05 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.10
  GitCommit:        2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

To confirm cwltool is installed, run the following command to display the version number:

cwltool --version

You should see something similar to the output shown below.

/home/learner/env/bin/cwltool 3.1.20220312132609

To confirm graphviz is installed, run the following command to display the version number:

$ dot -V

You should see something similar to the output shown below.

dot - graphviz version 2.40.1 (20161225.0304)

Files

You will need to install some example files for this lesson. In this tutorial we will use RNA sequencing data.

Setting up a practice repository

For this tutorial some existing tools are needed to build the workflow. These existing tools will be imported via GitHub. First we need to create an empty git repository for all our files. To do this, use this command:

git init novice-tutorial-exercises

Next, we need to move into our empty git repo:

cd novice-tutorial-exercises

Then import bio-cwl-tools with this command:

git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git

Downloading sample and reference data

Create a new directory inside the novice-tutorial-exercises directory and download the data:

mkdir rnaseq
cd rnaseq
wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/

Downloading or generating STAR index

To run the STAR tool index files generated from the reference files are needed. This is a large download (4 GB), so it is also possible to generate these files yourself.

Downloading

mkdir hg19-chr1-STAR-index
cd hg19-chr1-STAR-index
wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/

Generating

Create chr1-star-index.yaml in the the novice-tutorial-exercises directory:

InputFiles:
  - class: File
    location: rnaseq/reference_data/chr1.fa
    format: http://edamontology.org/format_1930
IndexName: 'hg19-chr1-STAR-index'
Gtf:
  class: File
  location: rnaseq/reference_data/chr1-hg19_genes.gtf
Overhang: 99

Generate the index files with cwltool:

cwltool bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml