Hero Image
Bioinformatics Software

Personal Collection of tools for stuff I work with can be found in this repo. Community Reference A common collection of tools from community members around the globe for organization and accessibility. Workflow Tools Program Description Source Artemis A genome browser and annotation tool that allows visualization of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation. Download BamTools C++ API & command-line toolkit for working with BAM (Binary SAM file) data. Provides a programmer’s API and an end-user’s toolkit for handling BAM files. Clone BaseMount Explore runs, projects, samples, app results and analyses by interacting directly with BaseSpace’s API as a locally mounted file system Install BaseSpace The BaseSpace Sequence Hub is a cloud-based genomics analysis and storage platform that directly integrates with all Illumina sequencers. N/A BaseSpace CLI Work with the BaseSpace Sequence Hub data using the command line interface (CLI). Supports scripting and programmatic access to BaseSpace Sequence Hub for automation, bulk operations, and other routine functions. It can be used independently or in conjunction with BaseMount. Install bcl2fastq Demultiplexes data and converts base calls in the per-cycle BCL files generated by Illumina sequencing systems to standard FASTQ file formats in a single step for downstream analysis. Download BLAST+ Command line application suite of BLAST tools that utilizes the NCBI C++ Toolkit. Download EDirect An advanced method for accessing the NCBI’s set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal. N/A E-utilities Entrez Programming Utilities (E-utilities) are a set of nine server-side programs that provide a stable interface into the Entrez query and database system at the NCBI. N/A FastQC A quality control tool for high throughput sequence data. Clone Download IGV Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. The igvtools utility provides a set of tools for pre-processing data files. Download Martian Martian is a language and framework for developing and executing complex computational pipelines. Clone Download Nextflow Data-driven computational pipelines. Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages. Clone Samtools A suite of programs for interacting with high-throughput sequencing data (HTS) from next generation sequencing data. It consists of three separate repositories: Samtools: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format. BCFtools: Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants. HTSlib: A C library for reading/writing high-throughput sequencing data. Download Seqtk Fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Clone SRA Toolkit The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. Download VCFtools Package designed for working with complex genetic variation data in the form of VCF files. Download WebLogo Create sequence logos, a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Clone Visit Analysis DNA Program Description Purpose Source AUGUSTUS ab initio, trainable gene prediction in eukaryotic genomic sequences. Gene Prediction Download BUSCO Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB. Assembly Quality Assesment Download Circlator Predict and automate assembly circularization and produce accurate linear representations of circular sequences. Circularize Genome Download Clustal Fast and scalable multiple sequence alignment (can align hundreds of thousands of sequences in hours) MSA Download Galaxy Web portal for accessible, reproducible, and transparent computational research. Analysis package Download HOMER HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis. Prediction and analysis Download HMMER Search sequence databases for sequence homologs, and for making sequence alignments, analyzed by using profile hidden Markov models Detect Homologs Download HTSeq HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. Analysis Package Clone Download Mauve A system for constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Genome Aligner Download Mothur Expandable software to fill the bioinformatics needs of the microbial ecology community. Microbial Ecology Pipeline Download MUMmer Package Ultra-fast alignment of large-scale DNA and protein sequences. A system for rapidly aligning entire genomes, whether in complete or draft form. MUMmer is a suffix tree algorithm designed to find maximal exact matches of some minimum length between two input sequences. NUCmer is a standard DNA sequence alignment. It is a robust pipeline that allows for multiple reference and multiple query sequences to be aligned in a many vs. many fashion. PROmer is like NUCmer with one exception - all matching and alignment routines are performed on the six frame amino acid translation of the DNA input sequence. Genome Aligner Download MUSCLE MUSCLE can align hundreds of sequences in seconds. MSA Download Picard Set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. HTS Toolkit Download QIIME Bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics. Microbial Ecology Pipeline Install QUAST Evaluates genome assemblies. Evaluate Genome Assemblies Download T-Coffee A multiple sequence alignment package that can align sequences (Protein, DNA, and RNA) or combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle…) into one unique alignment (M-Coffee). It is also able to combine sequence information with protein structural information (3D-Coffee/Expresso), profile information (PSI-Coffee) or RNA secondary structures. MSA Download ViennaRNA Package Programs for the prediction and comparison of RNA secondary structures. Prediction Download PacBio Sequencing Program Description Purpose Source BLASR PacBio® long read aligner Sequence Aligner Download Canu Fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). Genome Assembly Download Celera Assembler Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler, and can use any combination of platform reads. Genome Assembly Download Cerulean Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads. Hybrid Assembly Download PBSuite PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants. Reference Mapping Variant Calling Download SMRT Analysis Self-contained software suite designed for use with Single Molecule, Real-Time (SMRT) Sequencing data. Analysis Package Download SPAdes Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. Hybrid Assembly Download Sprai Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly. Sequencing Error-correction Download Illumina Sequencing Referenced Program Description Purpose Source Bowtie2 An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Reference Aligner Download BWA Mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. Reference Mapping Download HISAT2 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Whole-Genome Mapping Clone De novo Program Description Purpose Source ABySS De novo, parallel, paired-end sequence assembler designed for short reads and large genomes. Genome Assembly Download Install ALLPATHS-LG Short read assembler and it works on both small and large (mammalian size) genomes. Genome Assembly Download DISCOVAR Genome assembler and variant caller. Genome Assembly Download SOAPdenovo Novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. Genome Assembly Download SPAdes Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. Genome/Hybrid Assembly Download Velvet Short read de novo assembler using de Bruijn graphs. Genome Assembly Download RNA-Seq Program Description Purpose Source Ballgown A program for computing differentially expressed genes in two or more RNA-seq experiments, using the output of StringTie or Cufflinks. The Ballgown package provides functions to organize, visualize, and analyze expression measurements. Transcriptome Assembly Clone Bioconductor Cufflinks Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. Transcriptome Assembly Clone DESeq2 The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models. Differential Expression Clone Bioconductor edgeR Differential expression analysis of RNA-seq expression profiles with biological replication. It can be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE. Differential Expression Bioconductor HISAT2 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Transcriptome Mapping Clone HTSeq HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. Analysis Package Clone Download START Ultrafast universal RNA-seq aligner. RNA-seq Aligner Clone StringTie StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. Transcriptome Assembly Clone Trinity Trinity assembles transcript sequences from Illumina RNA-Seq data. Transcriptome Assembly Download Single Cell Program Description Purpose Source Celda Bayesian hierarchical modeling for clustering Single Cell RNA-Seq Data. Differential Expression Clone cellTree This packages computes a Latent Dirichlet Allocation (LDA) model of single-cell RNA-seq data and builds a compact tree modelling the relationship between individual cells over time or space. Visualization Bioconductor Chromium Single Cell Software Suite Package for analyzing and visualizing single cell 3’ RNA-seq data produced by the 10x Chromium Platform. Cell Ranger (Pipelines) is a set of analysis pipeline tools that perform sample demultiplexing, barcode processing, and single cell 3’ gene counting. Loupe™ Cell Browser is an interactive desktop application that helps find significant genes, cell types, and substructure within your single cell data. Cell Ranger (R Kit) is a R package for secondary analysis of Cell Ranger matrix data, including PCA and t-SNE projection, and k-means clustering. Analysis Package Clone Download Pagoda Framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells. Pathway/Gene Set Analysis Clone SCDE The SCDE package implements a set of statistical methods for analyzing single cell RNA-seq data, including differential expression analysis and pathway and geneset overdispersion analysis PAGODA. Differential Expression Clone Download Seurat R package designed for QC, analysis, and exploration of single cell RNA-seq data. Differential Expression Clone Install SPRING SPRING is a kinetic interface tool for uncovering high-dimensional structure in single cell gene expression data. Visualization Clone Visit Monocle An analysis toolkit for single cell RNA-seq that performs differential expression and time-series analysis for single cell expression experiments. Differential Expression Clone Bioconductor