These subfolders contain pairwise alignment chains and nets with the following for genomes as used as the reference: For placental mammals: - human hg38 - mouse mm10 - cow HLbosTau10=GCF_002263795.3 - elephant HLeleMaxInd3A=GCF_024166365.1 For birds: - chicken HLgalGal7=GCF_016699485.2 - emu HLdroNov3=GCF_036370855.1 - zebrafinch HLtaeGut5=GCF_003957565.2 - crow HLcorHaw3=GCF_020740725.1 - kittiwake HLrisTri2=GCF_028500815.1 In each of these folders, you will find 'vs_{query}' subfolders that contain the following three files: - reference.query.allfilled.chain.gz --> sensitive pairwise alignment chains - reference.query.net.gz --> alignment nets (netting the allfilled.chains) - reference.query.over.chain.gz --> liftOver chains (ONLY to be used with UCSC's liftOver tool) {query} refers to our internal assembly ID (prefixed with HL) or a UCSC assembly (e.g. mm10). Information about the genome assembly (with its source, NCBI accession numbers, contig/scaffold N50) and species (taxonomy, scientific and common names) can be found in the assemblies_and_species.tsv file in the parent directory. ====== Usages of these files ======= *allfilled.chain.gz --> TOGA2 needs this as a key input *net.gz --> Alignment nets can be used to reconstruct ancestral karyotypes (e.g Deschrambler) and chromosome rearrangements. In addition, after filtering (see Hecker and Hiller 2020), they can be input for Mulitz to produce a reference-based multiple genome alignment. *over.chain.gz --> These are the nets back-converted to chains to be used with UCSC's liftOver tool to transfer annotations from reference to query. ====== Methods ====== Alignment chains were generated with our sensitive alignment pipeline (Hecker and Hiller, 2020). Briefly, we used lastz (Harris 2007) with the sensitive parameters (K = 2400, L = 3000, Y = 9400, H = 2000, default scoring matrix) from (Sharma and Hiller 2017) to obtain local alignments between the reference and query genome, axtChain (Kent et al. 2003) (default parameters except linearGap = loose) to obtain alignment chains, RepeatFiller (Osipova et al. 2019) with default parameters to increase alignment sensitivity by adding missed repeat-overlapping local alignments to the alignment chains, and chainCleaner (Suarez et al. 2017) (default parameters except for minBrokenChainScore = 75,000 and -doPairs) to improve alignment specificity. Finally, pairwise alignment chains were converted into alignment nets using a modified version of chainNet that computes real scores of partial nets (Suarez et al. 2017). ====== References: ====== Harris, Robert. 2007. "Improved Pairwise Alignment of Genomic DNA." Pennsylvania State University. https://www.bx.psu.edu/~rsharris/rsharris_phd_thesis_2007.pdf. Hecker, Nikolai, and Michael Hiller. 2020. "A Genome Alignment of 120 Mammals Highlights Ultraconserved Element Variability and Placenta-Associated Enhancers." GigaScience 9 (1). https://doi.org/10.1093/gigascience/giz159. Kent, W. James, Robert Baertsch, Angie Hinrichs, Webb Miller, and David Haussler. 2003. "Evolution's Cauldron: Duplication, Deletion, and Rearrangement in the Mouse and Human Genomes." Proceedings of the National Academy of Sciences of the United States of America 100 (20): 11484-89. Osipova, Ekaterina, Nikolai Hecker, and Michael Hiller. 2019. "RepeatFiller Newly Identifies Megabases of Aligning Repetitive Sequences and Improves Annotations of Conserved Non-Exonic Elements." GigaScience 8 (11). https://doi.org/10.1093/gigascience/giz132. Sharma, Virag, and Michael Hiller. 2017. "Increased Alignment Sensitivity Improves the Usage of Genome Alignments for Comparative Gene Annotation." Nucleic Acids Research 45 (14): 8369-77. Suarez, Hernando G., Bjoern E. Langer, Pradnya Ladde, and Michael Hiller. 2017. "chainCleaner Improves Genome Alignment Specificity and Sensitivity." Bioinformatics (Oxford, England) 33 (11): 1596-1603.