The "analysis set" is a version of the genome prepared for next-gen sequencing alignment pipelines. It is a version of the genome with one PAR region masked with Ns, outdated patches removed, alternate sequences marked as such and an added EBV sequence as a decoy for reads. For a full description of the files' contents, see NCBI's original README file: https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p13/seqs_for_alignment_pipelines/README_ANALYSIS_SETS In this directory, we are making available modified versions of these NCBI files. We only changed the sequence names and nothing else. For example, chr10_GL383543_patch was changed to chr10_gl383543_fix, chr17_GL000258_alt to chr17_ctg5_hap1 and chrM to chrMT. All changes can be found in chromAlias/ncbiToUcsc.txt The no_alt_analysis_set is the one most likely to be relevant for most aligners. It removes alternate alleles. Most aligners cannot yet use alternate alleles. Note that the chrMT mitochondrial sequence is the official GRCh37 one, it was added to the Genome Browser in addition to our old chrM as part of patch 13. The hg19 patch13 genome is available and explained at https://hgdownload.gi.ucsc.edu/goldenPath/hg19/bigZips/p13.plusMT/ We provide mappings with the chromosome sequence names of genome versions from NCBI, 1000 Genomes and Ensembl in the directory ./chromAlias/. We also provide pre-indexed versions of the no_alt_analysis_set version of the genome file here. The versions of the software used for indexing were: bwa 0.7.12-r1039 hisat2 2.1.0 bowtie2 2.3.4.3 This blog post by Heng Li explains problems with the different GRCh37 versions: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use The no_alt_analysis_set file here has none of the mentioned problems. GATK has a nice page on the background, written for hg38, but it applies mostly also to the hg19 analysis set: https://gatk.broadinstitute.org/hc/en-us/articles/360035890951-Human-genome-reference-builds-GRCh38-or-hg38-b37-hg19
Name Last modified Size Description
Parent Directory - md5sum.txt 2020-03-13 17:39 418 hg19.p13.plusMT.no_alt_analysis_set.hisat2_index.tar.gz 2020-03-10 03:20 3.9G hg19.p13.plusMT.no_alt_analysis_set.fa.gz 2020-03-09 10:21 823M hg19.p13.plusMT.no_alt_analysis_set.bwa_index.tar.gz 2020-03-10 03:41 3.2G hg19.p13.plusMT.no_alt_analysis_set.bowtie2_index.tar.gz 2020-03-10 03:26 3.4G hg19.p13.plusMT.full_analysis_set.fa.gz 2020-03-09 10:24 859M chromAlias/ 2020-03-09 11:46 -