Overview

Many commonly used bioinformatics software packages on the HPC clusters are available as individual modules or as Python packages bundled in the bioconda modules.

Please see our HowTo for more information about using this software on the HPC system.

Software Availability

If a particular package is not available, several options are available. If it is sufficiently widely used, Research Computing staff will install it as a new module. If we determine that it is too specialized, you can install it yourself. Please use permanent storage such as your home directory to install software. If you have difficulty we can assist you to install the package.

Please see below for a full listing of available bioinformatics software. If you do not find it there, please check the bioconda package before requesting that we install it.

Reference Genomes

Research Computing makes some standard reference genomes available. For a listing and information about how to copy them, please see our HowTo.

Full List of Bioinformatics Software Modules

Below is a list of software installed as separate modules. Other packages that are based on Python are available in the bioconda environment. Please see the bioconda page for a listing of those packages.

Module Category Description
afni bio AFNI (Analysis of Functional NeuroImages) is a leading software suite of C, Python, R programs and shell scripts primarily developed for the analysis and display of anatomical and functional MRI (FMRI) data. It is freely available (both in source code and in precompiled binaries) for research purposes. The software is made to run on virtually an Unix system with X11 and Motif displays. Binary Packages are provided for MacOS and Linux systems including Fedora, Ubuntu (including Ubuntu under the Windows Subsytem for Linux)
alphafold bio Open source code for AlphaFold
alphapulldown bio AlphaPulldown is a Python package that streamlines protein-protein interaction screens and high-throughput modelling of higher-order oligomers using AlphaFold-Multimer
angsd bio Program for analysing NGS data.
anvio bio Anvi'o is an open-source, community-driven analysis and visualization platform for microbial 'omics. It brings together many aspects of today's cutting-edge strategies including genomics, metagenomics, metatranscriptomics, pangenomics, metapangenomics, phylogenomics, and microbial population genetics in an integrated and easy-to-use fashion through extensive interactive visualization capabilities.
augustus bio AUGUSTUS is a program to find genes and their structures in one or more genomes.
bamtools bio BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
bart bio BART (Binding Analysis for Regulation of Transcription) is a bioinformatics tool for predicting functional transcription factors (TFs) that bind at genomic cis-regulatory regions to regulate gene expression in the human or mouse genomes, given a query gene set or a ChIP-seq dataset as input.
bart-mri bio The Berkeley Advanced Reconstruction Toolbox (BART) toolbox is a free and open-source image-reconstruction framework for Computational Magnetic Resonance Imaging developed by the research groups of Martin Uecker (Goettingen University), Jon Tamir (UT Austin), and Michael Lustig (UC Berkeley). It consists of a programming library and a toolbox of command-line programs. The library provides common operations on multi-dimensional arrays, Fourier and wavelet transforms, as well as generic implementations of iterative optimization algorithms. The command-line tools provide direct access to basic operations on multi-dimensional arrays as well as efficient implementations of many calibration and reconstruction algorithms for parallel imaging and compressed sen.
bbmap bio BBMap includes a short read aligner, and other bioinformatic tools.
bcftools bio SAMtools is a suite of programs for interacting with high-throughput sequencing data. BCFtools - Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
bcl2fastq2 bio bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.
beagle bio Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.
bedops bio BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
bedtools bio The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.
bicseq2-norm bio BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-norm is for normalizing potential biases in the sequencing data.
bicseq2-seg bio BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-seg is for detecting CNVs based on the normalized data given by BICseq2-norm.
bioawk bio Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.
bioconda bio Bioconda is a channel for the conda package manager specializing in bioinformatics software.
bioperl bio Bioperl is the product of a community effort to produce Perl code which is useful in biology. Examples include Sequence objects, Alignment objects and database searching objects.
biopython bio Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
bismark bio A tool to map bisulfite converted sequence reads and determine cytosine methylation states
blasr bio Variation graphs provide a succinct encoding of the sequences of many genomes.
blast bio Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
blat bio BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.
bowtie2 bio Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
bracken bio Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
bsmap bio BSMAP is a short reads mapping program for bisulfite sequencing in DNA methylation study. Bisulfite treatment coupled with next generation sequencing could estimate the methylation ratio of every single Cytosine location in the genome by mapping high throughput bisulfite reads to the reference sequences.
bwa bio Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
canu bio Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing
caviar bio caviar is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants.
cd-hit bio CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
cellassign bio cellassign automatically assigns single-cell RNA-seq data to known cell types across thousands of cells accounting for patient and batch specific effects. Information about a priori known markers cell types is provided as input to the model in the form of a (binary) marker gene by cell-type matrix. cellassign then probabilistically assigns each cell to a cell type, removing subjective biases from typical unsupervised clustering workflows.
cellpose bio a generalist algorithm for cellular segmentation
cellprofiler bio CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.
cellranger bio A set of analysis piplines that perform sample demultiplexing, barcode processing, and single cell 3' gene counting.
cellranger-arc bio Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage.
cellranger-atac bio Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
cellranger-dna bio Cell Ranger DNA is a set of analysis pipelines that process Chromium single cell DNA sequencing output to align reads, identify copy number variation (CNV), and compare heterogeneity among cells.
circos bio Circos is a software package for visualizing data and information. It visualizes data in a circular layout - this makes Circos ideal for exploring relationships between objects or positions.
clara-parabricks bio NVIDIA Parabricks is the only GPU-accelerated computational genomics toolkit that delivers fast and accurate analysis for sequencing centers, clinical teams, genomics researchers, and next-generation sequencing instrument developers.
clearcut bio Clearcut is the reference implementation for the Relaxed Neighbor Joining (RNJ) algorithm by J. Evans, L. Sheneman, and J. Foster from the Initiative for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho.
cnnpeaks bio CNN-peaks is a Convolution Neural Network(CNN) based ChIP-Seq peak calling software.
cp-analyst bio CellProfiler Analyst (CPA) allows interactive exploration and analysis of data, particularly from high-throughput, image-based experiments. Included is a supervised machine learning system which can be trained to recognize complicated and subtle phenotypes, for automatic scoring of millions of cells. CellProfiler is an image processing package to generate morphometric measurements.
cufflinks bio Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
cumulus_feature_barcoding bio A fast C++ tool to extract feature-count matrix from sequence reads in FASTQ files. We uses isal-l for decompressing and Heng Li's kseq library for read parsing. It is used by Cumulus for feature-count matrix generation of cell hashing, nucleus hashing, CITE-Seq and Perturb-seq protocols, using either 10x Genomics V2 or V3 chemistry.
cutadapt bio Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
cytoscape bio Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of Apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.
danpos bio A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2.
dbg2olc bio A genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long erroneous 3GS sequencing reads and short accurate NGS sequencing reads.
decontaminer bio decontaMiner, a tool for detecting contaminating organisms in human unmapped sequences.
deeplabcut bio DeepLabCut is a toolbox for markerless pose estimation of animals performing various tasks.
deeptools bio deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome.
diamond bio DIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.
eigensoft bio The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.
emboss bio EMBOSS is 'The European Molecular Biology Open Software Suite'. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.
evm bio The EVidenceModeler (aka EVM) software combines ab intio gene predictions and protein and transcript alignments into weighted consensus gene structures. EVM provides a flexible and intuitive framework for combining diverse evidence types into a single automated gene structure annotation system.
exonerate bio Exonerate is a generic tool for pairwise sequence comparison. It allows you to align sequences using a many alignment models, using either exhaustive dynamic programming, or a variety of heuristics.
fasta bio The FASTA programs find regions of local or global (new) similarity between protein or DNA sequences, either by searching Protein or DNA databases, or by identifying local duplications within a sequence.
fastenloc bio fastENLOC: fast enrichment estimation aided colocalization analysis enables integrative genetic association analysis of molecular QTL data and GWAS data.
fastqc bio FastQC is a Java application which takes a FastQ file and runs a series of tests on it to generate a comprehensive QC report.
fastx-toolkit bio The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
finestructure bio fineSTRUCTURE is a fast and powerful algorithm for identifying population structure using dense sequencing data.
fmriprep bio fMRIPrep is a NiPreps (NeuroImaging PREProcessing toolS) application (www.nipreps.org) for the preprocessing of task-based and resting-state functional MRI (fMRI).
freebayes bio FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
freesurfer bio FreeSurfer is a set of tools for analysis and visualization of structural and functional brain imaging data. FreeSurfer contains a fully automatic structural imaging stream for processing cross sectional and longitudinal data.
fsa bio FSA:Fast Statistical Alignment, is a probabilistic multiple sequence alignment algorithm which uses a distance-based approach to aligning homologous protein, RNA or DNA sequences.
fsl bio FSL is a comprehensive library of analysis tools for FMRI, MRI and DTI brain imaging data.
gatk bio The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
gd bio GD.pm - Interface to Gd Graphics Library
gemma bio Genome-wide Efficient Mixed Model Association
genometools bio The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules.
genrich bio Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq). It analyzes alignment files generated following the assay and produces a file detailing peaks of significant enrichment.
gffcompare bio The program gffcompare can be used to compare, merge, annotate, and estimate accuracy of one or more GFF files (the 'query' files), when compared with a reference annotation (also provided as GFF).
gmap-gsnap bio GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences GSNAP: Genomic Short-read Nucleotide Alignment Program
gpunufft bio GPU Regridding of arbitrary 3-D/2-D MRI data
gsea bio Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
hic-pro bio HiC-Pro is an optimized and flexible pipeline for Hi-C data processing.
hisat2 bio HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).
hmmer bio HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.
htslib bio A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix
igvtools bio This package contains command line utilities for preprocessing, computing feature count density (coverage), sorting, and indexing data files. See also http://www.broadinstitute.org/software/igv/igvtools_commandline.
impute2 bio IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation and haplotype phasing program based on ideas from Howie et al. 2009
io_lib bio Io_lib is a library of file reading and writing code to provide a general purpose trace file (and Experiment File) reading interface. The programmer simply calls the (eg) read_reading to create a "Read" C structure with the data loaded into memory. It has been compiled and tested on a variety of unix systems, MacOS X and MS Windows.
iqtree bio Efficient phylogenomic software by maximum likelihood
irfinder bio IRFinder is a tool for detecting intron retention from RNA-Seq experiments.
isoseqenv bio IsoDeq3 is a Scalable De Novo Isoform Discovery
jcuda bio Java bindings for NVIDIA CUDA and related libraries.
jellyfish bio Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.
juicebox bio Juicer is a one-click pipeline for processing terabase scale Hi-C datasets.
kallisto bio Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
kent-tools bio A set of genome utilities developed at the University of California Santa Cruz.
kraken2 bio Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
libgtextutils bio ligtextutils is a dependency of fastx-toolkit and is provided via the same upstream
locuszoom bio LocusZoom Standalone is for the command line (standalone) version of LocusZoom, an application for creating regional plots from genome-wide association studies built in Python and R.
longranger bio Long Ranger is a set of analysis pipelines that processes Chromium sequencing output to align reads and call and phase SNPs, indels, and structural variants.
macs2 bio With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we presented the Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation.
maestro bio MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis.
mafft bio MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
manta bio Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow.
marge bio MARGE is a robust methodology that leverages a comprehensive library of genome-wide H3K27ac ChIP-seq profiles to predict key regulated genes and cis-regulatory regions in human or mouse.
maxquant bio MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. Several labeling techniques as well as label-free quantification are supported.
meme bio The MEME Suite allows you to: * discover motifs using MEME, DREME (DNA only) or GLAM2 on groups of related DNA or protein sequences, * search sequence databases with motifs using MAST, FIMO, MCAST or GLAM2SCAN, * compare a motif to all motifs in a database of motifs, * associate motifs with Gene Ontology terms via their putative target genes, and * analyse motif enrichment using SpaMo or CentriMo.
metamorpheus bio MetaMorpheus is a bottom-up proteomics database search software with integrated post-translational modification (PTM) discovery capability. This program combines features of Morpheus and G-PTM-D in a single tool.
mirdeep2 bio miRDeep2 discovers active known or novel miRNAs from deep sequencing data (Solexa/Illumina, 454, ...).
mothur bio Mothur is a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
mrtrix3 bio MRtrix3 provides a set of tools to perform various types of diffusion MRI analyses, from various forms of tractography through to next-generation group-level analyses. It is designed with consistency, performance, and stability in mind, and is freely available under an open-source license. It is developed and maintained by a team of experts in the field, fostering an active community of users from diverse backgrounds.
mrtrix3tissue bio MRtrix3Tissue is a fork of the MRtrix3 project. It aims to add capabilities for 3-Tissue CSD modelling and analysis to a complete version of the MRtrix3 software.
multiqc bio MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
mummer bio MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.
muscle bio MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds. Most users learn everything they need to know about MUSCLE in a few minutes-only a handful of command-line options are needed to perform common alignment tasks.
mutect bio MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.
mutsigcv bio MutSig stands for "Mutation Significance". MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.
nanopolish bio Software package for signal-level analysis of Oxford Nanopore sequencing data.
ncbi-vdb bio The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
neuron bio Empirically-based simulations of neurons and networks of neurons.
ngs bio NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.
ngsf bio ngsF is a program to estimate per-individual inbreeding coefficients under a probabilistic framework that takes the uncertainty of genotype's assignation into account. It avoids calling genotypes by using genotype likelihoods or posterior probabilities.
nibabies bio NiBabies is an open-source software pipeline designed to process anatomical and functional magnetic resonance imaging data. A member of the NeuroImaging PREProcessing toolS (NiPreps) family, NiBabies is designed and optimized for human infants between 0-2 years old.
nseg bio Nseg is used to identify low complexity sequencesi.
openms bio OpenMS is an open-source software C++ library for LC-MS data management and analyses. It offers an infrastructure for rapid development of mass spectrometry related software.
paintor bio PAINTOR is a statistical fine-mapping method that integrates functional genomic data with association strength from potentially multiple populations (or traits) to prioritize variants for follow-up analysis.
pasapipeline bio PASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments.
pbwt bio The pbwt package provides a core implementation and development environment for PBWT (Positional Burrows-Wheeler Transform) methods for storing and computing on genome variation data sets.
peakseq bio PeakSeq is a program for identifying and ranking peak regions in ChIP-Seq experiments. It takes as input, mapped reads from a ChIP-Seq experiment, mapped reads from a control experiment and outputs a file with peak regions ranked with increasing Q-values.
peer bio PEER is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods.
picard bio A set of tools (in Java) for working with next generation sequencing data in the BAM format.
plink bio PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
proteowiz bio ProteoWizard provides a set of open-source, cross-platform software libraries and tools (e.g. msconvert, Skyline, IDPicker, SeeMS) that facilitate proteomics data analysis. The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations.
psipred bio The PSIPRED Workbench provides a range of protein structure prediction methods.
psmc bio PSMC infers population size history from a diploid sequence using the Pairwise Sequentially Markovian Coalescent (PSMC) model.
qtltools bio QTLtools is a tool set for molecular QTL discovery and analysis. It allows to go from the raw sequence data to collection of molecular Quantitative Trait Loci (QTLs) in few easy-to-perform steps.
qualimap bio Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
rasqual bio RASQUAL (Robust Allele Specific QUAntification and quality controL) maps QTLs for sequenced based cellular traits by combining population and allele-specific signals.
raxml bio RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.
rdp-classifier bio The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic assignments from domain to genus, with confidence estimates for each assignment.
regtools bio RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context.
relion bio RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).
rip-md bio RIP-MD allows to apply Residue Interaction Networks (RINs) to the analysis of molecular dynamics simulations of protein.
rmats-turbo bio rMATS turbo is the C/Cython version of rMATS (refer to http://rnaseq-mats.sourceforge.net). The major difference between rMATS turbo and rMATS is speed and space usage. rMATS turbo is 100 times faster and the output file is 1000 times smaller than rMATS. These advantages make analysis and storage of a large scale dataset easy and convenient.
rosetta bio The Rosetta software suite includes algorithms for computational modeling and analysis of protein structures. It has enabled notable scientific advances in computational biology, including de novo protein design, enzyme design, ligand docking, and structure prediction of biological macromolecules and macromolecular complexes.
rsem bio RNA-Seq by Expectation-Maximization
saint bio Significance Analysis of INTeractome (SAINT) consists of a series of software tools for assigning confidence scores to protein-protein interactions based on quantitative proteomics data in AP-MS experiments.
saintexpress bio Significance Analysis of INTeractome (SAINT) consists of a series of software tools for assigning confidence scores to protein-protein interactions based on quantitative proteomics data in AP-MS experiments.
salmon bio Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.
sambamba bio Sambamba is a tool for processing BAM files.
samtools bio SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
seacr bio SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by zeroes (i.e. regions with no read coverage).
seqoutbias bio Molecular biology enzymes have nucleic acid preferences for their substrates; the preference of an enzyme is typically dictated by the sequence at or near the active site of the enzyme. This bias may result in spurious read count patterns when used to interpret high-resolution molecular genomics data. The seqOutBias program aims to correct this issue by scaling the aligned read counts by the ratio of genome-wide observed read counts to the expected sequence based counts for each k-mer.
sga bio SGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.
shapeit4 bio SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm.
slim bio SLiM is an evolutionary simulation package that provides facilities for very easily and quickly constructing genetically explicit individual-based evolutionary models.
smrtlink bio PacBio’s open-source SMRT Analysis software suite is designed for use with Single Molecule, Real-Time (SMRT) Sequencing data. You can analyze, visualize, and manage your data through an intuitive GUI or command-line interface. You can also integrate SMRT Analysis in your existing data workflow through the extensive set of APIs provided
sortmerna bio SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads.
spaceranger bio A set of analysis piplines that perform sample demultiplexing, barcode processing, and single cell 3' gene counting.
spades bio SPAdes - St. Petersburg genome assembler - is an assembly toolkit containing various assembly pipelines.
sparc bio Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads
sparseassembler bio A sparse graph approach to de novo genome assembly
sratoolkit bio The SRA Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format
stacks bio Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
star bio STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.
stringtie bio StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.
thermorawfileparser bio Wrapper around the .net (C#) ThermoFisher ThermoRawFileReader library for running on Linux with mono (works on Windows too).
tophat bio TopHat is a fast splice junction mapper for RNA-Seq reads.
torus bio TORUS - QTL Discovery utilizing Genomic Annotations is a free software package that implements a computational procedure for discovering molecular QTLs incorporating genomic annotations.
trf bio Tandem Repeats Finder: a program to analyze DNA sequences.
trimgalore bio Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.
trimmomatic bio Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
trinity bio Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads.
varscan bio VarScan - Variant calling and somatic mutation/CNV detection for next-generation sequencing data
vcell bio VCell (Virtual Cell) is a comprehensive platform for modeling cell biological systems that is built on a central database and disseminated as a web application.
vcftools bio The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
vg bio Variation graphs provide a succinct encoding of the sequences of many genomes.
viennarna bio The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
vsearch bio VSEARCH which supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
wasp bio WASP is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs.
wigtobigwig bio The bigWig format is useful for dense, continuous data that will be displayed in the Genome Browser as a graph. BigWig files are created from wiggle (wig) type files using the program wigToBigWig.