{{Draft}}
[[Category:Software]]
CheckV is a fully automated command-line pipeline for assessing the quality of single-contig viral genomes, including identification of host contamination for integrated proviruses, estimating completeness for genome fragments, and identification of closed genomes.
[https://bitbucket.org/berkeleylab/checkv/src/master/ CheckV]
[https://pypi.org/project/checkv/ PyPI-CheckV]
Here is a demo explaining all the steps you should take when you want to use a software on the clusters. We will use CheckV as an example, but note that the point is for you to translate all these steps for the use of any other software.
== How to find if a software is available on the clusters? ==
=== Modules ===
You will find all information about available software here [[Available software]]. In short, some of the softwares are available by loading the appropriate module.
To find the module you do:
module spider nameOfYourSoftware
You can also do the search by putting the name in between double quotes "" if you do not have the full name. Note that this search is not sensitive to the case so you should get the same output with uppercase and lowercase or a mix of them.
You also have the possibility to add the version number after the name to get more details about some modules you might need to load before and/or together with your software, we name them dependencies.
module spider nameOfYourSoftware/10.2
=== Python packages ===
In our example, we would not get any output for CheckV because it is a python wheel.
Python modules are provided as binary wheels [[Available Python wheels]].
You can find them by typing:
avail_wheels CheckV.
You can apply the same search tricks as the module for double quotes and it is also insensitive to the case. You can add --all-version to list all the available versions.
== What do I need to do if the software I want to use is not available? ==
First steps would be to have a look at the documentation of the software. You can easily find the software development page that is often based on a github repository and follow the installation steps. Note that you '''cannot''' use Conda environment on the clusters [[Anaconda]]. We have a wiki page that explains how you can install it locally in your account [[Installing software in your home directory]] or you can email the [[Technical support]] to get some help to either install it in your account or in the clusters.
For python wheels, you can search them on [https://pypi.org/ PyPI] website which is a collection of wheels made available for everyone. We will get in more details in the following section but you can install them in your virtual environment with this command: pip install nameOfTheWheel.
You can also contact us to add your preferred wheel on the wheelhouse as this command is not installing the wheel from our wheelhouse but from the web. To install it from our wheelhouse you need to add --no-index parameter. pip install nameOfTheWheel --no-index
== Installation ==
'''1. Load the necessary modules.'''
As mentioned in section [[CheckV#How to find if a software is available on the clusters?]], you can find the dependencies that are necessary to load before you load your software by looking at a specific version with module spider nameOfYourSoftware/10.2
There could also be other dependencies, you usually find them on the software development page. Note that you would need to go though section [[CheckV#How to find if a software is available on the clusters?]] for all dependencies to find if they are present on the clusters.
{{Command|module load gcc hmmer/3.3.2 prodigal-gv/2.6.3 diamond/2.0.4 python/3.10}}
'''2. Create and activate the virtual environment.'''
{{Commands
|virtualenv ~/CheckV_env
|source ~/CheckV_env/bin/activate
}}
'''3. You should also upgrade pip in the environment.'''
This step is important if you are using python version < then 3.10.2.
{{Command|pip install --no-index --upgrade pip}}
''' 4. Install the wheel and its dependencies (if you have any).'''
4.1 A wheel from the wheelhouse (prefered choice):
{{Command|pip install --no-index checkv}}
4.2 A wheel from the web. Note that if you install a wheel from the web inside your virtual environment you will not be able to use a requirement file. You would need to do option 4.3 as an alternative.
{{Command|pip install checkv}}
4.3 If you want to use a wheel from the web and also use a requirement file, you would need to do the following command outside the virtual environment.
{{Commands
|deactivate
|pip install checkv
}}
''' 5. Validate it.'''
{{Command
|python -c 'import checkv'
|checkv --help
}}
Freeze the environment and requirements set. For requirements text file usage, have a look at the bash submission script described in point number {}. Remember that you can use a requirement file only with installation option 4.1 and 4.3.
{{Command|pip freeze > checkv-1.0.1-requirements.txt}}
== Datasets ==
''' 1. Download the database'''
You must pre-download the database before submitting your job. For intensive read/write operations on large files, scratch storage space is the best choice. This is why we usually recommend downloading databases in your scratch.
{{Command|checkv download_database $SCRATCH/}}
Some users may wish to update the database using their own complete genomes:
{{Command|checkv update_database /path/to/checkv-db /path/to/updated-checkv-db genomes.fna}}
Some users may wish to download a specific database version. See [[https://portal.nersc.gov/CheckV/]] for an archive of all previous database versions. If you go this route then you'll need to build the DIAMOND database manually:
{{Command
|wget https://portal.nersc.gov/CheckV/checkv-db-archived-version.tar.gz
|tar -zxvf checkv-db-archived-version.tar.gz
|cd /path/to/checkv-db/genome_db
|diamond makedb --in checkv_reps.faa --db checkv_reps
}}
''' 2. Download a sequence test'''
Some software will give access to a data set for you to test the software. You can look if anything is available on the web or the github repository. For CheckV, the data set is available here [https://bitbucket.org/berkeleylab/checkv/src/master/test/]. You can download it with this command:
{{Command|wget https://bitbucket.org/berkeleylab/checkv/raw/3f185b5841e8c109848cd0b001df7117fe795c50/test/test_sequences.fna}}
== Usage ==
=== Job submission ===
==== Interactive session ====
First step for running your job : use an interactive session.
Still need to demystify #SBATCH parameter?
If you need to refresh your SBATCH parameter knowledge we recommend having a look at the [https://slurm.schedmd.com/sbatch.html Slurm SBATCH command page] and the [[Running jobs]] wiki page.
To learn more about interactive job you can have a look at the wiki page [[Running jobs#Interactive job]].
''' 1. Gather information on the command line and the software.'''
The first thing you need to do is to analyze the proposed command line and look in the help menu if there is any information about threading or parameters to help you set up an HPC (High performance computing) usage.
In our case, here is the command line proposed for a full pipeline analysis:
{{Command|checkv end_to_end input_file.fna output_directory -t 16}}
In this case, you should be intrigued in finding what the -t parameter is doing. To have access to the help menu for end_to_end program:
{{Command|checkv end_to_end --help
|result=
Run full pipeline to estimate completeness, contamination, and identify closed genomes
usage: checkv end_to_end