afni |
bio |
AFNI (Analysis of Functional NeuroImages) is a leading software suite of C, Python, R programs and shell scripts primarily developed for the analysis and display of anatomical and functional MRI (FMRI) data. It is freely available (both in source code and in precompiled binaries) for research purposes. The software is made to run on virtually an Unix system with X11 and Motif displays. Binary Packages are provided for MacOS and Linux systems including Fedora, Ubuntu (including Ubuntu under the Windows Subsytem for Linux) |
alphafold |
bio |
Open source code for AlphaFold
|
alphapulldown |
bio |
AlphaPulldown is a Python package that streamlines protein-protein interaction screens and high-throughput modelling of higher-order oligomers using AlphaFold-Multimer |
angsd |
bio |
Program for analysing NGS data. |
anvio |
bio |
Anvi'o is an open-source, community-driven analysis and visualization platform for microbial 'omics. It brings together many aspects of today's cutting-edge strategies including genomics, metagenomics, metatranscriptomics, pangenomics, metapangenomics, phylogenomics, and microbial population genetics in an integrated and easy-to-use fashion through extensive interactive visualization capabilities.
|
augustus |
bio |
AUGUSTUS is a program to find genes and their structures in one or more genomes. |
bamtools |
bio |
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files. |
bart |
bio |
BART (Binding Analysis for Regulation of Transcription) is a bioinformatics tool for predicting functional transcription factors (TFs) that bind at genomic cis-regulatory regions to regulate gene expression in the human or mouse genomes, given a query gene set or a ChIP-seq dataset as input. |
bart-mri |
bio |
The Berkeley Advanced Reconstruction Toolbox (BART) toolbox is a free and open-source image-reconstruction framework for Computational Magnetic Resonance Imaging developed by the research groups of Martin Uecker (Goettingen University), Jon Tamir (UT Austin), and Michael Lustig (UC Berkeley). It consists of a programming library and a toolbox of command-line programs. The library provides common operations on multi-dimensional arrays, Fourier and wavelet transforms, as well as generic implementations of iterative optimization algorithms. The command-line tools provide direct access to basic operations on multi-dimensional arrays as well as efficient implementations of many calibration and reconstruction algorithms for parallel imaging and compressed sen.
|
bbmap |
bio |
BBMap includes a short read aligner, and other bioinformatic tools. |
bcftools |
bio |
SAMtools is a suite of programs for interacting with high-throughput sequencing data.
BCFtools - Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence
variants |
bcl2fastq2 |
bio |
bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis. |
beagle |
bio |
Beagle is a software package for phasing genotypes and for imputing ungenotyped markers. |
bedops |
bio |
BEDOPS is an open-source command-line toolkit that performs highly efficient and
scalable Boolean and other set operations, statistical calculations, archiving, conversion and
other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for
distributing whole-genome analyses across a computational cluster. |
bedtools |
bio |
The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps
and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF,
and SAM/BAM. |
bicseq2-norm |
bio |
BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-norm is for normalizing potential biases in the sequencing data. |
bicseq2-seg |
bio |
BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-seg is for detecting CNVs based on the normalized data given by BICseq2-norm. |
bioawk |
bio |
Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk. |
bioconda |
bio |
Bioconda is a channel for the conda package manager specializing in bioinformatics software. |
bioperl |
bio |
Bioperl is the product of a community effort to produce Perl code which is useful in biology.
Examples include Sequence objects, Alignment objects and database searching objects. |
biopython |
bio |
Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in
bioinformatics. |
bismark |
bio |
A tool to map bisulfite converted sequence reads and
determine cytosine methylation states |
blasr |
bio |
Variation graphs provide a succinct encoding of the sequences of many genomes. |
blast |
bio |
Basic Local Alignment Search Tool, or BLAST, is an algorithm
for comparing primary biological sequence information, such as the amino-acid
sequences of different proteins or the nucleotides of DNA sequences. |
blat |
bio |
BLAT on DNA is designed to quickly find sequences of 95% and
greater similarity of length 25 bases or more. |
bowtie2 |
bio |
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads
to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s
of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome,
its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes. |
bracken |
bio |
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. |
bsmap |
bio |
BSMAP is a short reads mapping program for bisulfite sequencing in DNA methylation study. Bisulfite treatment coupled with next generation sequencing could estimate the methylation ratio of every single Cytosine location in the genome by mapping high throughput bisulfite reads to the reference sequences. |
bwa |
bio |
Burrows-Wheeler Aligner (BWA) is an efficient program that aligns
relatively short nucleotide sequences against a long reference sequence such as the human genome. |
canu |
bio |
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing |
caviar |
bio |
caviar is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants. |
cd-hit |
bio |
CD-HIT is a very widely used program for clustering and
comparing protein or nucleotide sequences. |
cellassign |
bio |
cellassign automatically assigns single-cell RNA-seq data to known cell types across thousands of cells accounting for patient and batch specific effects. Information about a priori known markers cell types is provided as input to the model in the form of a (binary) marker gene by cell-type matrix. cellassign then probabilistically assigns each cell to a cell type, removing subjective biases from typical unsupervised clustering workflows.
|
cellpose |
bio |
a generalist algorithm for cellular segmentation |
cellprofiler |
bio |
CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.
|
cellranger |
bio |
A set of analysis piplines that perform sample demultiplexing, barcode processing, and single cell 3' gene counting. |
cellranger-arc |
bio |
Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage. |
cellranger-atac |
bio |
Cell Ranger ATAC is a set of analysis pipelines that process
Chromium Single Cell ATAC data. |
cellranger-dna |
bio |
Cell Ranger DNA is a set of analysis pipelines that process Chromium single cell DNA sequencing output to align reads, identify copy number variation (CNV), and compare heterogeneity among cells. |
circos |
bio |
Circos is a software package for visualizing data and information.
It visualizes data in a circular layout - this makes Circos ideal for exploring
relationships between objects or positions. |
clara-parabricks |
bio |
NVIDIA Parabricks is the only GPU-accelerated computational genomics toolkit that delivers fast and accurate analysis for sequencing centers, clinical teams, genomics researchers, and next-generation sequencing instrument developers.
|
clearcut |
bio |
Clearcut is the reference implementation for the Relaxed Neighbor Joining (RNJ) algorithm by J. Evans, L. Sheneman, and J. Foster from the Initiative for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho. |
cnnpeaks |
bio |
CNN-peaks is a Convolution Neural Network(CNN) based ChIP-Seq peak calling software.
|
cp-analyst |
bio |
CellProfiler Analyst (CPA) allows interactive exploration and analysis of data, particularly from high-throughput, image-based experiments. Included is a supervised machine learning system which can be trained to recognize complicated and subtle phenotypes, for automatic scoring of millions of cells. CellProfiler is an image processing package to generate morphometric measurements. |
cufflinks |
bio |
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. |
cumulus_feature_barcoding |
bio |
A fast C++ tool to extract feature-count matrix from sequence reads in FASTQ files. We uses isal-l for decompressing and Heng Li's kseq library for read parsing. It is used by Cumulus for feature-count matrix generation of cell hashing, nucleus hashing, CITE-Seq and Perturb-seq protocols, using either 10x Genomics V2 or V3 chemistry.
|
cutadapt |
bio |
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. |
cytoscape |
bio |
Cytoscape is an open source software platform for visualizing
complex networks and integrating these with any type of attribute data.
A lot of Apps are available for various kinds of problem domains,
including bioinformatics, social network analysis, and semantic web. |
danpos |
bio |
A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2. |
dbg2olc |
bio |
A genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long erroneous 3GS sequencing reads and short accurate NGS sequencing reads. |
decontaminer |
bio |
decontaMiner, a tool for detecting contaminating organisms in human unmapped sequences. |
deeplabcut |
bio |
DeepLabCut is a toolbox for markerless pose estimation of animals performing various tasks.
|
deeptools |
bio |
deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome. |
diamond |
bio |
DIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000. |
eigensoft |
bio |
The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al.
2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal
components analysis to explicitly model ancestry differences between cases and controls along continuous axes of
variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral
populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT
package has a built-in plotting script and supports multiple file formats and quantitative phenotypes. |
emboss |
bio |
EMBOSS is 'The European Molecular Biology Open Software Suite'.
EMBOSS is a free Open Source software analysis package specially developed for
the needs of the molecular biology (e.g. EMBnet) user community. |
evm |
bio |
The EVidenceModeler (aka EVM) software combines ab intio gene predictions and protein and transcript alignments into weighted consensus gene structures. EVM provides a flexible and intuitive framework for combining diverse evidence types into a single automated gene structure annotation system. |
exonerate |
bio |
Exonerate is a generic tool for pairwise sequence comparison.
It allows you to align sequences using a many alignment models, using either
exhaustive dynamic programming, or a variety of heuristics. |
fasta |
bio |
The FASTA programs find regions of local or global (new) similarity between
protein or DNA sequences, either by searching Protein or DNA databases, or by identifying
local duplications within a sequence. |
fastenloc |
bio |
fastENLOC: fast enrichment estimation aided colocalization analysis enables integrative genetic association analysis of molecular QTL data and GWAS data. |
fastqc |
bio |
FastQC is a Java application which takes a FastQ file and runs a series
of tests on it to generate a comprehensive QC report. |
fastx-toolkit |
bio |
The FASTX-Toolkit is a collection of command line tools for
Short-Reads FASTA/FASTQ files preprocessing. |
finestructure |
bio |
fineSTRUCTURE is a fast and powerful algorithm for identifying population structure using
dense sequencing data. |
fmriprep |
bio |
fMRIPrep is a NiPreps (NeuroImaging PREProcessing toolS) application (www.nipreps.org) for the preprocessing of task-based and resting-state functional MRI (fMRI). |
freebayes |
bio |
FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment. |
freesurfer |
bio |
FreeSurfer is a set of tools for analysis and visualization
of structural and functional brain imaging data. FreeSurfer contains a fully
automatic structural imaging stream for processing cross sectional and
longitudinal data. |
fsa |
bio |
FSA:Fast Statistical Alignment, is a probabilistic multiple sequence alignment algorithm which uses a distance-based approach to aligning homologous protein, RNA or DNA sequences. |
fsl |
bio |
FSL is a comprehensive library of analysis tools for FMRI, MRI and DTI brain imaging data. |
gatk |
bio |
The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute
to analyse next-generation resequencing data. The toolkit offers a wide variety of tools,
with a primary focus on variant discovery and genotyping as well as strong emphasis on
data quality assurance. Its robust architecture, powerful processing engine and
high-performance computing features make it capable of taking on projects of any size. |
gd |
bio |
GD.pm - Interface to Gd Graphics Library |
gemma |
bio |
Genome-wide Efficient Mixed Model Association |
genometools |
bio |
The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules. |
genrich |
bio |
Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq). It analyzes alignment files generated following the assay and produces a file detailing peaks of significant enrichment. |
gffcompare |
bio |
The program gffcompare can be used to compare, merge, annotate, and estimate accuracy of one or more GFF files (the 'query' files), when compared with a reference annotation (also provided as GFF). |
gmap-gsnap |
bio |
GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences
GSNAP: Genomic Short-read Nucleotide Alignment Program |
gpunufft |
bio |
GPU Regridding of arbitrary 3-D/2-D MRI data |
gsea |
bio |
Gene Set Enrichment Analysis (GSEA) is a computational method that
determines whether an a priori defined set of genes shows statistically
significant, concordant differences between two biological states
(e.g. phenotypes). |
hic-pro |
bio |
HiC-Pro is an optimized and flexible pipeline for Hi-C data processing. |
hisat2 |
bio |
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads
(both DNA and RNA) against the general human population (as well as against a single reference genome). |
hmmer |
bio |
HMMER is used for searching sequence databases for homologs
of protein sequences, and for making protein sequence alignments. It
implements methods using probabilistic models called profile hidden Markov
models (profile HMMs). Compared to BLAST, FASTA, and other sequence
alignment and database search tools based on older scoring methodology,
HMMER aims to be significantly more accurate and more able to detect remote
homologs because of the strength of its underlying mathematical models. In the
past, this strength came at significant computational expense, but in the new
HMMER3 project, HMMER is now essentially as fast as BLAST. |
htslib |
bio |
A C library for reading/writing high-throughput sequencing data.
This package includes the utilities bgzip and tabix |
igvtools |
bio |
This package contains command line utilities for preprocessing,
computing feature count density (coverage), sorting, and indexing data files.
See also http://www.broadinstitute.org/software/igv/igvtools_commandline. |
impute2 |
bio |
IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation
and haplotype phasing program based on ideas from Howie et al. 2009 |
io_lib |
bio |
Io_lib is a library of file reading and writing code to provide a general purpose trace file (and Experiment File) reading interface. The programmer simply calls the (eg) read_reading to create a "Read" C structure with the data loaded into memory. It has been compiled and tested on a variety of unix systems, MacOS X and MS Windows. |
iqtree |
bio |
Efficient phylogenomic software by maximum likelihood |
irfinder |
bio |
IRFinder is a tool for detecting intron retention from RNA-Seq experiments. |
isoseqenv |
bio |
IsoDeq3 is a Scalable De Novo Isoform Discovery |
jcuda |
bio |
Java bindings for NVIDIA CUDA and related libraries. |
jellyfish |
bio |
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. |
juicebox |
bio |
Juicer is a one-click pipeline for processing terabase scale Hi-C datasets. |
kallisto |
bio |
Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. |
kent-tools |
bio |
A set of genome utilities developed at the University of California Santa Cruz. |
kraken2 |
bio |
Kraken is a system for assigning taxonomic labels to short DNA sequences,
usually obtained through metagenomic studies. Previous attempts by other
bioinformatics software to accomplish this task have often used sequence
alignment or machine learning techniques that were quite slow, leading to
the development of less sensitive but much faster abundance estimation
programs. Kraken aims to achieve high sensitivity and high speed by
utilizing exact alignments of k-mers and a novel classification algorithm. |
libgtextutils |
bio |
ligtextutils is a dependency of fastx-toolkit and is provided via the same upstream |
locuszoom |
bio |
LocusZoom Standalone is for the command line (standalone) version of LocusZoom, an application for creating regional plots from genome-wide association studies built in Python and R. |
longranger |
bio |
Long Ranger is a set of analysis pipelines that processes Chromium sequencing output to align reads and call and phase SNPs, indels, and structural variants. |
macs2 |
bio |
With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing
(ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq
analysis method, we presented the Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding
sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions and MACS
improves the spatial resolution of binding sites through combining the information of both sequencing tag position and
orientation.
|
maestro |
bio |
MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis.
|
mafft |
bio |
MAFFT is a multiple sequence alignment program for unix-like operating systems.
It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment
of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc. |
manta |
bio |
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow. |
marge |
bio |
MARGE is a robust methodology that leverages a comprehensive library of genome-wide H3K27ac ChIP-seq profiles to predict key regulated genes and cis-regulatory regions in human or mouse. |
maxquant |
bio |
MaxQuant is a quantitative proteomics software package designed for analyzing large
mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. Several labeling
techniques as well as label-free quantification are supported. |
meme |
bio |
The MEME Suite allows you to: * discover motifs using MEME, DREME (DNA only) or
GLAM2 on groups of related DNA or protein sequences, * search sequence databases with motifs using
MAST, FIMO, MCAST or GLAM2SCAN, * compare a motif to all motifs in a database of motifs, * associate
motifs with Gene Ontology terms via their putative target genes, and * analyse motif enrichment
using SpaMo or CentriMo. |
metamorpheus |
bio |
MetaMorpheus is a bottom-up proteomics database search software with integrated post-translational modification (PTM) discovery capability. This program combines features of Morpheus and G-PTM-D in a single tool.
|
mirdeep2 |
bio |
miRDeep2 discovers active known or novel miRNAs from deep sequencing data (Solexa/Illumina, 454, ...). |
mothur |
bio |
Mothur is a single piece of open-source, expandable software
to fill the bioinformatics needs of the microbial ecology community. |
mrtrix3 |
bio |
MRtrix3 provides a set of tools to perform various types of diffusion MRI analyses, from various forms of tractography through to next-generation group-level analyses. It is designed with consistency, performance, and stability in mind, and is freely available under an open-source license. It is developed and maintained by a team of experts in the field, fostering an active community of users from diverse backgrounds. |
mrtrix3tissue |
bio |
MRtrix3Tissue is a fork of the MRtrix3 project. It aims to add capabilities for 3-Tissue CSD modelling and analysis to a complete version of the MRtrix3 software. |
multiqc |
bio |
MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools. |
mummer |
bio |
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it. |
muscle |
bio |
MUSCLE is one of the best-performing multiple alignment programs
according to published benchmark tests, with accuracy and speed that are consistently
better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds. Most users
learn everything they need to know about MUSCLE in a few minutes-only a handful of
command-line options are needed to perform common alignment tasks. |
mutect |
bio |
MuTect is a method developed at the Broad Institute for the reliable
and accurate identification of somatic point mutations in next generation sequencing
data of cancer genomes. |
mutsigcv |
bio |
MutSig stands for "Mutation Significance". MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes. |
nanopolish |
bio |
Software package for signal-level analysis of Oxford Nanopore sequencing data. |
ncbi-vdb |
bio |
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for
using data in the INSDC Sequence Read Archives. |
neuron |
bio |
Empirically-based simulations of neurons and networks of neurons. |
ngs |
bio |
NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from
Next Generation Sequencing. |
ngsf |
bio |
ngsF is a program to estimate per-individual inbreeding coefficients under a probabilistic framework that takes the uncertainty of genotype's assignation into account. It avoids calling genotypes by using genotype likelihoods or posterior probabilities. |
nibabies |
bio |
NiBabies is an open-source software pipeline designed to process anatomical and functional magnetic resonance imaging data. A member of the NeuroImaging PREProcessing toolS (NiPreps) family, NiBabies is designed and optimized for human infants between 0-2 years old. |
nseg |
bio |
Nseg is used to identify low complexity sequencesi. |
openms |
bio |
OpenMS is an open-source software C++ library for LC-MS data management and analyses. It offers an infrastructure for rapid development of mass spectrometry related software. |
paintor |
bio |
PAINTOR is a statistical fine-mapping method that integrates functional genomic data with association strength from potentially multiple populations (or traits) to prioritize variants for follow-up analysis. |
pasapipeline |
bio |
PASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments. |
pbwt |
bio |
The pbwt package provides a core implementation and development environment for PBWT (Positional Burrows-Wheeler Transform) methods for storing and computing on genome variation data sets. |
peakseq |
bio |
PeakSeq is a program for identifying and ranking peak regions in ChIP-Seq
experiments. It takes as input, mapped reads from a ChIP-Seq experiment, mapped reads from
a control experiment and outputs a file with peak regions ranked with increasing Q-values. |
peer |
bio |
PEER is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods. |
picard |
bio |
A set of tools (in Java) for working with next generation sequencing data in the BAM format. |
plink |
bio |
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. |
proteowiz |
bio |
ProteoWizard provides a set of open-source, cross-platform software libraries and tools (e.g. msconvert, Skyline, IDPicker, SeeMS) that facilitate proteomics data analysis. The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations. |
psipred |
bio |
The PSIPRED Workbench provides a range of protein structure prediction methods. |
psmc |
bio |
PSMC infers population size history from a diploid sequence using the Pairwise Sequentially Markovian Coalescent (PSMC) model. |
qtltools |
bio |
QTLtools is a tool set for molecular QTL discovery and analysis.
It allows to go from the raw sequence data to collection of molecular Quantitative Trait Loci (QTLs)
in few easy-to-perform steps. |
qualimap |
bio |
Qualimap 2 is a platform-independent application written in Java and R that provides both
a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of
alignment sequencing data and its derivatives like feature counts. |
rasqual |
bio |
RASQUAL (Robust Allele Specific QUAntification and quality controL) maps QTLs for sequenced based cellular traits by combining population and allele-specific signals. |
raxml |
bio |
RAxML search algorithm for maximum likelihood based inference of phylogenetic trees. |
rdp-classifier |
bio |
The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic
assignments from domain to genus, with confidence estimates for each assignment. |
regtools |
bio |
RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a
regulatory and splicing context. |
relion |
bio |
RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer
program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class
averages in electron cryo-microscopy (cryo-EM). |
rip-md |
bio |
RIP-MD allows to apply Residue Interaction Networks (RINs) to the analysis of molecular dynamics simulations of protein. |
rmats-turbo |
bio |
rMATS turbo is the C/Cython version of rMATS (refer to http://rnaseq-mats.sourceforge.net). The major difference between rMATS turbo and rMATS is speed and space usage. rMATS turbo is 100 times faster and the output file is 1000 times smaller than rMATS. These advantages make analysis and storage of a large scale dataset easy and convenient.
|
rosetta |
bio |
The Rosetta software suite includes algorithms for computational modeling and analysis of protein structures.
It has enabled notable scientific advances in computational biology, including de novo protein design, enzyme
design, ligand docking, and structure prediction of biological macromolecules and macromolecular complexes. |
rsem |
bio |
RNA-Seq by Expectation-Maximization |
saint |
bio |
Significance Analysis of INTeractome (SAINT) consists of a series of software tools for assigning confidence scores to protein-protein interactions based on quantitative proteomics data in AP-MS experiments. |
saintexpress |
bio |
Significance Analysis of INTeractome (SAINT) consists of a series of software tools for assigning confidence scores to protein-protein interactions based on quantitative proteomics data in AP-MS experiments. |
salmon |
bio |
Salmon is a wicked-fast program to produce a highly-accurate,
transcript-level quantification estimates from RNA-seq data. |
sambamba |
bio |
Sambamba is a tool for processing BAM files. |
samtools |
bio |
SAM Tools provide various utilities for manipulating alignments in the SAM format,
including sorting, merging, indexing and generating alignments in a per-position format. |
seacr |
bio |
SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by zeroes (i.e. regions with no read coverage). |
seqoutbias |
bio |
Molecular biology enzymes have nucleic acid preferences for their substrates; the preference of an
enzyme is typically dictated by the sequence at or near the active site of the enzyme. This bias may result
in spurious read count patterns when used to interpret high-resolution molecular genomics data. The
seqOutBias program aims to correct this issue by scaling the aligned read counts by the ratio of genome-wide
observed read counts to the expected sequence based counts for each k-mer.
|
sga |
bio |
SGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.
|
shapeit4 |
bio |
SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm. |
slim |
bio |
SLiM is an evolutionary simulation package that provides facilities for very easily and quickly constructing genetically explicit individual-based evolutionary models. |
smrtlink |
bio |
PacBio’s open-source SMRT Analysis software suite is designed for use with Single Molecule,
Real-Time (SMRT) Sequencing data. You can analyze, visualize, and manage your data through an intuitive GUI
or command-line interface. You can also integrate SMRT Analysis in your existing data workflow through
the extensive set of APIs provided |
sortmerna |
bio |
SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads. |
spaceranger |
bio |
A set of analysis piplines that perform sample demultiplexing, barcode processing, and single cell 3' gene counting. |
spades |
bio |
SPAdes - St. Petersburg genome assembler - is an assembly toolkit containing various assembly pipelines. |
sparc |
bio |
Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads |
sparseassembler |
bio |
A sparse graph approach to de novo genome assembly |
sratoolkit |
bio |
The SRA Toolkit, and the source-code SRA System Development
Kit (SDK), will allow you to programmatically access data housed within SRA
and convert it from the SRA format |
stacks |
bio |
Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
|
star |
bio |
STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays. |
stringtie |
bio |
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. |
thermorawfileparser |
bio |
Wrapper around the .net (C#) ThermoFisher ThermoRawFileReader library for running on Linux with mono (works on Windows too). |
tophat |
bio |
TopHat is a fast splice junction mapper for RNA-Seq reads. |
torus |
bio |
TORUS - QTL Discovery utilizing Genomic Annotations is a free software package that implements a computational procedure for discovering molecular QTLs incorporating genomic annotations. |
trf |
bio |
Tandem Repeats Finder: a program to analyze DNA sequences. |
trimgalore |
bio |
Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data. |
trimmomatic |
bio |
Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data. |
trinity |
bio |
Trinity represents a novel method for the efficient and robust de novo reconstruction
of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm,
Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. |
varscan |
bio |
VarScan - Variant calling and somatic mutation/CNV detection for next-generation sequencing data |
vcell |
bio |
VCell (Virtual Cell) is a comprehensive platform for modeling cell biological systems that is built on a central database and disseminated as a web application. |
vcftools |
bio |
The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files. |
vg |
bio |
Variation graphs provide a succinct encoding of the sequences of many genomes. |
viennarna |
bio |
The Vienna RNA Package consists of a C code library and several
stand-alone programs for the prediction and comparison of RNA secondary structures. |
vsearch |
bio |
VSEARCH which supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads. |
wasp |
bio |
WASP is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. |
wigtobigwig |
bio |
The bigWig format is useful for dense, continuous data that will be displayed in the Genome Browser as a graph.
BigWig files are created from wiggle (wig) type files using the program wigToBigWig.
|