[[Category:Software]] == Description == Samtools is a suite of programs for interacting with high-throughput sequencing data. It is closely related to BCFtools and to HTSlib. Primary documentation for all three of these packages can be found at https://www.htslib.org/ * Samtools is for reading, writing, editing, indexing, and viewing files in SAM, BAM, or CRAM format * BCFtools is for reading and writing files in BCF2, VCF, and gVCF format, and for calling, filtering, and summarizing SNP and short indel sequence variants * HTSlib is a C-language library for reading and writing high-throughput sequencing data. It is used by both Samtools and BCFtools. This page does not cover all features of Samtools. Please refer to [http://www.htslib.org/doc/samtools.html Samtools] for the complete list of all subtools. To load the default version of samtools use module load samtools, e.g.: {{Commands |module load samtools |samtools Program: samtools (Tools for alignments in the SAM format) Version: 1.20 (using htslib 1.20) Usage: samtools [options]}} For more on the module command, including how to find other versions of samtools, see [[Utiliser_des_modules/en|Using modules]] == General usage == SAMtools provides tools for manipulating alignments in SAM and BAM formats. A common task is to convert SAM files ("Sequence Alignment/Map") to BAM files. BAM files are compressed versions of SAM files and are much smaller in size; the "B" stands for "binary". BAM files are easy to manipulate and are ideal for storing large nucleotide sequence alignments. CRAM is a more recent format for the same type of data, and offers still greater compression. === Converting a SAM file to a BAM file === Prior to converting, verify if your SAM file carries a header section with character “@”. You can inspect the header section using the view command: {{Command|samtools view -H my_sample.sam}} If the SAM file contains a header, either of these forms can be used to convert the data to BAM format: {{Commands |samtools view -bo my_sample.bam my_sample.sam |samtools view -b my_sample.sam -o my_sample.bam}} If headers are absent, you can use the reference FASTA file to map the reads: {{Command|samtools view -bt ref_seq.fa -o my_sample.bam my_sample.sam}} === Sorting and indexing BAM files === You may also have to sort and index BAM files for many downstream applications {{Commands |samtools sort my_sample.bam -o my_sample_sorted.bam |samtools index my_sample_sorted.bam}} You can also convert a SAM file directly to a sorted BAM file using the shell pipe: {{Command| [name@server ~]$ samtools view -b my_sample.sam | samtools sort -o my_sample_sorted.bam}} A sorted BAM file, together with its index file with extension .bai, is a common prerequisite for many other processes such as variant calling, feature counting, etc. === Processing multiple files with multithreading and/or GNU parallel === You will typically have more than one SAM file to process at one time. A job script with a loop is a good way to handle multiple files, as in the following example: {{File |name=samtools.sh |lang="bash" |contents= #!/bin/bash #SBATCH --cpus-per-task 1 #SBATCH --mem-per-cpu=4G #SBATCH --time=3:00:00 module load samtools/1.20 for FILE in *.sam do time samtools view -b ${FILE} {{!}} samtools sort -o ${FILE%.*}_mt_sorted.bam done }} Samtools typically runs on a single core by default but in some cases it may improve your efficiency to use multithreading or GNU parallel. Samtools can take advantage of multiple cores ("multithreading") if given the -@ flag: {{File |name=samtools_multithreading.sh |lang="bash" |contents= #!/bin/bash #SBATCH --cpus-per-task 4 #SBATCH --mem-per-cpu=4G #SBATCH --time=3:00:00 module load samtools/1.20 for FILE in *.sam do time samtools view -@ ${SLURM_CPUS_PER_TASK} -b ${FILE} {{!}} samtools sort -o ${FILE%.*}_mt_sorted.bam done }} A different way to take advantage of multiple cores is to use GNU parallel to process multiple files concurrently: {{File |name=samtools_gnuparallel.sh |lang="bash" |contents= #!/bin/bash #SBATCH --cpus-per-task 4 #SBATCH --mem-per-cpu=4G #SBATCH --time=3:00:00 module load samtools/1.20 find . -name "*.sam" {{!}} parallel -j ${SLURM_CPUS_PER_TASK} "time samtools view -bS {} {{!}} samtools sort -o {.}_mt_sorted.bam" }} The above script will execute view and sort on four SAM files concurrently. If you have more input files, modify the --cpous-per-task request.