# UMI-tools **Repository Path**: niceb2/UMI-tools ## Basic Information - **Project Name**: UMI-tools - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-06-29 - **Last Updated**: 2022-09-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README .. image:: https://user-images.githubusercontent.com/6096414/93030687-c7cf7300-f61c-11ea-92b8-102ec17ef6aa.png UMI-tools was published in `Genome Research `_ on 18 Jan '17 (open access) For full documentation see https://umi-tools.readthedocs.io/en/latest/ Tools for dealing with Unique Molecular Identifiers ==================================================== This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes. Currently there are 6 commands. The ``extract`` and ``whitelist`` commands are used to prepare a fastq containg UMIs +/- cell barcodes for alignment. * whitelist: **Builds a whitelist of the 'real' cell barcodes** This is useful for droplet-based single cell RNA-Seq where the identity of the true cell barcodes is unknown. Whitelist can then be used to filter with extract (see below) * extract: **Flexible removal of UMI sequences from fastq reads.** UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read. Can also filter reads by quality or against a whitelist (see above) The remaining commands, ``group``, ``dedup`` and ``count``/``count_tab``, are used to identify PCR duplicates using the UMIs and perform different levels of analysis depending on the needs of the user. A number of different UMI deduplication schemes are enabled - The recommended method is *directional*. * dedup: **Groups PCR duplicates and deduplicates reads to yield one read per group** Use this when you want to remove the PCR duplicates prior to any downstream analysis * group: **Groups PCR duplicates using the same methods available through `dedup`.** This is useful when you want to manually interrogate the PCR duplicates * count: **Groups and deduplicates PCR duplicates and counts the unique molecules per gene** Use this when you want to obtain a matrix with unique molecules per gene, per cell, for scRNA-Seq. * count_tab: **As per count except input is a flatfile** See `QUICK_START.md <./doc/QUICK_START.md>`_ for a quick tutorial on the most common usage pattern. If you want to use UMI-tools in single-cell RNA-Seq data processing, see `Single_cell_tutorial.md <./doc/Single_cell_tutorial.md>`_ **Important update**: We now recommend the use of `alevin` for droplet-based scRNA-Seq (e.g 10X, inDrop etc). `alevin` is an accurate, fast and convenient end-to-end tool to go from fastq -> count matrix and extends the UMI error correction in `UMI-tools` within a framework that also enables quantification of droplet scRNA-Seq without discarding multi-mapped reads. See `alevin documentation `_ and `alevin pre-print `_ for more information The ``dedup``, ``group``, and ``count`` / ``count_tab`` commands make use of network-based methods to resolve similar UMIs with the same alignment coordinates. For a background regarding these methods see: `Genome Research Publication `_ `Blog post discussing network-based methods `_. Installation ------------ If you're using Conda, you can use: .. code:: bash $ conda install -c bioconda -c conda-forge umi_tools Or pip: .. code:: bash $ pip install umi_tools Or if you'd like to work directly from the git repository: .. code:: bash $ git clone https://github.com/CGATOxford/UMI-tools.git Enter repository and run: .. code:: bash $ python setup.py install For more detail see `INSTALL.rst <./doc/INSTALL.rst>`_ Help ----- For full documentation see https://umi-tools.readthedocs.io/en/latest/ See `QUICK_START.md <./doc/QUICK_START.md>`_ and `Single_cell_tutorial.md <./doc/Single_cell_tutorial.md>`_ for tutorials on the most common usage patterns. To get help on umi_tools run .. code:: bash $ umi_tools --help To get help on the options for a specific [COMMAND], run .. code:: bash $ umi_tools [COMMAND] --help Dependencies ------------ umi_tools is dependent on `python>=3.5`, `numpy`, `pandas`, `scipy`, `cython`, `pysam`, `future`, `regex` and `matplotlib`