/tag/docker
Software Containers
[Deprecated] On Dec 18, 2023 Singularity has been upgraded to Apptainer, a continuation of the Singularity project. Overview Singularity is a container application targeted to multi-user, high-performance computing systems. It interoperates well with Slurm and with the Lmod modules system. Singularity can be used to create and run its own containers, or it can import Docker containers.
Creating Singularity Containers To create your own image from scratch, you must have root privileges on some computer running Linux (any version). Follow the instructions at the Singularity site. If you have only Mac or Windows, you can use the Vagrant environment. Vagrant is a pre-packed system that runs under several virtual-machine environments, including the free Virtualbox environment.
Software Containers
Overview Containers bundle an application, the libraries and other executables it may need, and even the data used with the application into portable, self-contained files called images. Containers simplify installation and management of software with complex dependencies and can also be used to package workflows.
Please refer to the following pages for further information.
Singularity (before Dec 18, 2023) Apptainer (after Dec 18, 2023) Short course: Software Containers for HPC Container Registries for UVA Research Computing Images built by Research Computing are hosted on Docker Hub (and previously Singularity Library).
Singularity Library Due to storage limits we can no longer add Singularity images to Singularity Library.
Microservice Deployments
Kubernetes is a container orchestrator for both short-running (such as workflow/pipeline stages) jobs and long-running (such as web and database servers) services. Containerized applications running in the UVARC Kubernetes cluster are visible to UVA Research networks (and therefore from Rivanna, Afton, Skyline, etc.). Web applications can be made visible to the UVA campus or the public Internet. Kubernetes Research Computing runs microservices in a Kubernetes cluster that automates the deployment of many containers, making their
management easy and scalable. This cluster will eventually consist of several dozen instances, >2000 cores and >2TB of memory allocated to
running containerized services. It will also have over 300TB of cluster storage and can attach to both project and
Container Services
– Container-based architecture, also known as “microservices,” is an approach to designing and running applications as a distributed set of components or layers. Such applications are typically run within containers, made popular in the last few years by Docker. Containers are portable, efficient, reusable, and contain code and any dependencies in a single package. Containerized services typically run a single process, rather than an entire stack within the same environment. This allows developers to replace, scale, or troubleshoot portions of their entire application at a time. General Availability (GA) of Kubernetes - Research Computing now manages microservice orchestration with Kubernetes, the open-source tool from Google.
BART Web
BART (Binding Analysis for Regulation of Transcription) Web Working with researchers in the Zang Lab in the Center for Public Health Genomics
(CPHG), RC helped launch BARTweb,
an interactive web-based tool for users to analyze their Genelist or ChIP-seq datasets. BARTweb is a containerized
Flask front-end (written in Python) that ingests files and submits them to a more robust Python-based genomics pipeline
running on Rivanna, UVA’s high performance computing cluster (HPC). This architecture – of a public web application that
uses a supercomputer to process data – is a new model for UVA, and one that eases the learning curve for researchers who
LOLAweb
The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data resources and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, annotations from external data sources can be easily connected to new genomic data.
SOM Research Computing is working with faculty in the UVA Center for Public Health Genomics to implement LOLAweb, an online tool for performing genomic locus overlap annotations and analyses. This project, written in the statistical programming language R, allows users to specify region set data in BED format for automated enrichment analysis.
Refgenie: A Reference Genome Resource Manager
Reference genome assemblies are essential for high-throughput sequencing analysis projects. Typically, genome assemblies are stored on disk alongside related resources; e.g., many sequence aligners require the assembly to be indexed. The resulting indexes are broadly applicable for downstream analysis, so it makes sense to share them. However, there is no simple tool to do this.
Refgenie is a reference genome assembly asset manager. Refgenie makes it easier to organize, retrieve, and share genome analysis resources. In addition to genome indexes, refgenie can manage any files related to reference genomes, including sequences and annotation files. Refgenie includes a command line interface and a server application that provides a RESTful API, so it is useful for both tool development and analysis.