Chromosome 1
Chromosome 1 is the largest human chromosome, comprising 8% of the human genome. Over 350 human diseases are associated with disruptions in the sequence to chromosome 1, including neurological and developmental disorders, cancers and Mendelian disorders, for which many of the corresponding genes have not yet been identified. We have generated 223.5Mb of finished sequence which accounts for 99.4% of the euchromatic region of the chromosome, and have manually annotated 3,141 gene structures and 991 pseudogenes. The finished of chromosome 1 has enabled us to draw together an accurate framework to relate genetic and biological features such as sequence variation, selection, recombination and replication timing to genomic sequence features, which will provide a strong foundation for future studies.
Chromosome 6
Chromosome 6 is a submetacentric chromosome that constitutes about 6% of the human genome. The finished sequence comprises 166,880,988 base pairs. The entire sequence has been subjected to high-quality manual annotation, resulting in the evidence-supported identification of 1,557 genes and 633 pseudogenes. Our analysis further shows that at least 96% of the protein-coding genes have been identified, as assessed by multi-species comparative sequence analysis, and provides evidence for the presence of further, otherwise unsupported exons/genes. Among these are genes directly implicated in cancer, schizophrenia, autoimmunity and many other diseases. Chromosome 6 harbours the largest transfer RNA gene cluster in the genome; we have shown that this cluster co-localizes with a region of high transcriptional activity. Within the essential immune loci of the major histocompatibility complex, we found HLA-B to be the most polymorphic gene on chromosome 6 and in the human genome
Chromosome 9
Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection.
The human chromosome 9 is approximately 145 Megabases in length. Our aim is to map and sequence this entire chromosome in collaboration with the chromosome 9 community.
The sequencing procedure being used for chromosome 9 is as follows. Bacterial clones of genomic origin are shotgun subcloned into M13 and pUC vectors and sequenced using fluorescent dye primer and dye terminator chemistries. All clones including clones provided by collaborators must pass a series of quality checks prior to sequencing. Once the sequence assembly has started, each contig >1 kb will be made available via our ftp site as unfinished sequence. The sequences will be updated every night and, once finished, they will be moved to the finished sequences page. There are also summary tables of all projects in progress. Finished sequences undergo comprehensive, semi-automatic, analysis prior to submission to EMBL and entry into the Chromosome 9 database (9ace).
The Chromosome 9 ACEDB database (9ace) is used as a tool for managing the in-house data and acts as the primary means by which chromosome 9 data generated at The Sanger Institute will be released into the public domain in an annotated and usable form. Additionally, with the co-operation of external groups, we are collating information from the global community with the hope that 9ace will provide a cohesive and dynamic representation of the state of the global project.
Chromosome 10
The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.
Chromosome 13
Chromosome 13 is the largest human acrocentric chromosome. The short arm of the chromosome is heterochromatic and is homologous to the short arms of chromosomes 14, 15, 21 and 22. The sequence of the euchromatic, long arm of the chromosome was determined at the Sanger Institute and covers 95,567,076 base pairs. The analysis of the sequence, reported in Nature, identifies 633 gene structures and 296 pseudogenes, which means that chromosome 13 has the lowest gene density of the autosomes analysed to date. The genes present include ones linked to various cancers (BRCA2, RB1) and to schizophrenia. 105 putative non-coding RNA genes have also been identified, including 9 microRNAs. Multi-species sequence comparison indicates that over 95% of protein coding genes on the chromosome have been identified. This analysis also reveals 112 non-exonic conserved regions, some of which could be regulatory or structural elements.
Chromosome 20
Chromosome 20 is the largest human acrocentric chromosome. The short arm of the chromosome is heterochromatic and is homologous to the short arms of chromosomes 14, 15, 21 and 22. The sequence of the euchromatic, long arm of the chromosome was determined at the Sanger Institute and covers 95,567,076 base pairs. The analysis of the sequence, reported in Nature, identifies 633 gene structures and 296 pseudogenes, which means that chromosome 13 has the lowest gene density of the autosomes analysed to date. The genes present include ones linked to various cancers (BRCA2, RB1) and to schizophrenia. 105 putative non-coding RNA genes have also been identified, including 9 microRNAs. Multi-species sequence comparison indicates that over 95% of protein coding genes on the chromosome have been identified. This analysis also reveals 112 non-exonic conserved regions, some of which could be regulatory or structural elements.
Chromosome 20 is metacentric and has an estimated size of 63.7 Mb (NCBI build 34). We completed 99.4% of the sequence of the euchromatic part of the short (p) and long (q) arm of the chromosome in 6 contigs (59,187,298 bp). An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm.
Chromosome 22
The sequence of human chromosome 22 was completed in December 1999. An international consortium of sequencing centres released into the public domain the genetic code of the 33.5 million bps that comprise the euchromatic portion of human chromosome 22. This was the first time a human chromosome has been sequenced and included the largest continuous sequence determined from any organism at the time (23 million bps).
Chromosome 22 is the second smallest of the human autosomes. The short arm (22p) contains a series of tandem repeat structures including the array of genes that encode the structural RNAs of the ribosomes, and is highly similar to the short arms of chromosomes 13, 14, 15 and 21. The long arm (22q) is the portion of human chromosome 22 that contains the protein coding genes and this is the region that has now been sequenced. The completed sequence consisted of 12 contiguous segments covering 33.4 million bps separated by 11 gaps of known size. One of these gaps has subsequenctly been closed by the Oklahoma group. The sequence is estimated to cover 97% of 22q, and is complete to the limits of currently available reagents and methodologies. The largest contiguous contig is >23 million bps, and at that time, this was the largest piece of continuous sequence determined.
Chromosome X
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. Together with colleagues in the USA and Germany, we have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterised with the aid of the DNA sequence
References
Nature 2004;429;6990;369-74
.
No comments:
Post a Comment