This directory contains the mapping from sequence name in genomes distributed by other
websites to the ones used by UCSC for the genome hg19.p13.plusMT. The format is
<otherGenomeSeqName> <tab> <ucscName>
Text files like .bed, .sam or .bedGraph that contain sequence identifiers from other genome
versions can be easily converted with these mapping files and our little tool chromToUcsc.
Example:
wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/chromToUcsc
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/analysisSet/chromAlias/ncbiToUcsc.txt
chmod a+x chromToUcsc
chromToUcsc -i test2.bed -o test2.ucsc.bed -a ncbiToUcsc.txt
The genomes used were:
g1k:
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
NCBI:
https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p13/seqs_for_alignment_pipelines/GCA_000001405.14_GRCh37.p13_full_analysis_set.fna.gz
Ensembl:
http://ftp.ensembl.org/pub/grch37/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna_sm.primary_assembly.fa.gz
A full log of the Unix commands that were run to create these files is
as always available in our makeDoc directory:
https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg19.analysisSet.txt
Name Last modified Size Description
Parent Directory -
g1kToUcsc.txt 2020-03-09 08:25 1.8K
ensemblToUcsc.txt 2020-03-09 09:18 1.8K
ncbiToUcsc.txt 2020-03-09 09:36 10K