Information

Will gene mutations give rise to a protein that is shorter than the full length protein?

Will gene mutations give rise to a protein that is shorter than the full length protein?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

What is the relationship between deletion or insertion of nucleotide in the gene sequence and the protein's length that is coded out from the mutated gene?


It depends on the nature of the insertion or deletion.

Many mutations are point mutations, where one single nucleotide is changed:

ATGACTACGCTATTCAGT M T T L F S ATGTCTACGCTATTCAGT M S T L F S

Notice that changing the fourth A to a T caused a T->S change in the final protein, and didn't change its length. However, an insertion/deletion is different. Starting with this sequence:

ATGTCTATTGACGCTATTCAG M S I D A I Q

Let's have an insertion, an A after the A in position 7:

ATGTCTAATTGACGCTATTCAG M S N STOP

Notice we now have a stop codon (TGA). This means the protein will be truncated, and will be much smaller. That isn't always the case. Rather than adding an A at position 7, let's add it after position 9:

ATGTCTATTAGACGCTATTCA M S I R R Y S

The protein is about the same length, but the AAs are completely different than the original version after the insertion:

M S I D A I Q M S I R R Y S

This is called a frameshift mutation. Biologically, these mutations are harmful because they change all the amino acids after the mutation, the protein is often not functional and can end up being different sizes (if a stop codon is now in frame or out of frame) and the protein can even end up being unstable and rapidly degraded.

So, to answer your question about the length of the protein, it'll increase/decrease by one AA every time 3 nucleotides are added/removed, but can increase or decrease drastically if a stop codon is added or removed by a frameshift mutation.


Strong association between pseudogenization mechanisms and gene sequence length

Pseudogenes arise from the decay of gene copies following either RNA-mediated duplication (processed pseudogenes) or DNA-mediated duplication (nonprocessed pseudogenes). Here, we show that long protein-coding genes tend to produce more nonprocessed pseudogenes than short genes, whereas the opposite is true for processed pseudogenes. Protein-coding genes longer than 3000 bp are 6 times more likely to produce nonprocessed pseudogenes than processed ones.

Reviewers

This article was reviewed by Dr. Dan Graur and Dr. Craig Nelson (nominated by Dr. J Peter Gogarten).


Abstract

The list of genetic diseases caused by mutations that affect mRNA translation is rapidly growing. Although protein synthesis is a fundamental process in all cells, the disease phenotypes show a surprising degree of heterogeneity. Studies of some of these diseases have provided intriguing new insights into the functions of proteins involved in the process of translation for example, evidence suggests that several have other functions in addition to their roles in translation. Given the numerous proteins involved in mRNA translation, it is likely that further inherited diseases will turn out to be caused by mutations in genes that are involved in this complex process.


References

Taylor, J.P., Hardy, J. & Fischbeck, K.H. Toxic proteins in neurodegenerative disease. Science 296, 1991–1995 (2002).

Bates, G. Huntingtin aggregation and toxicity in Huntington's disease. Lancet 361, 1642–1644 (2003).

Caughey, B. & Lansbury, P.T. Protofibrils, pores, fibrils, and neurodegeneration: separating the responsible protein aggregates from the innocent bystanders. Annu. Rev. Neurosci. 26, 267–298 (2003).

Berke, S.J. & Paulson, H.L. Protein aggregation and the ubiquitin proteasome pathway: gaining the UPPer hand on neurodegeneration. Curr. Opin. Genet. Dev. 13, 253–261 (2003).

Ross, C.A. & Pickart, C. The ubiquitin-proteasome pathway in Parkinson's and other neurodegenerative diseases. Trends Cell Biol. (2004).

Nussbaum, R.L. & Ellis, C.E. Alzheimer's disease and Parkinson's disease. N. Engl. J. Med. 348, 1356–1364 (2003).

Wong, P.C., Cai, H., Borchelt, D.R. & Price, D.L. Genetically engineered mouse models of neurodegenerative diseases. Nat. Neurosci. 5, 633–639 (2002).

Ross, C.A. When more is less: pathogenesis of glutamine repeat neurodegenerative diseases. Neuron 15, 493–496 (1995).

Selkoe, D.J. Folding proteins in fatal ways. Nature 426, 900–904 (2003).

Davies, S.W. et al. Formation of neuronal intranuclear inclusions underlies the neurological dysfunction in mice transgenic for the HD mutation. Cell 90, 537–548 (1997).

Scherzinger, E. et al. Self-assembly of polyglutamine-containing huntingtin fragments into amyloid-like fibrils: implications for Huntington's disease pathology. Proc. Natl. Acad. Sci. USA 96, 4604–4609 (1999).

Vonsattel, J.P. et al. Neuropathological classification of Huntington's disease. J. Neuropathol. Exp. Neurol. 44, 559–577 (1985).

Kuemmerle, S. et al. Huntington aggregates may not predict neuronal death in Huntington's disease. Ann. Neurol. 46, 842–849 (1999).

Gutekunst, C.A. et al. Nuclear and neuropil aggregates in Huntington's disease: relationship to neuropathology. J. Neurosci. 19, 2522–2534 (1999).

Becher, M.W. et al. Intranuclear neuronal inclusions in Huntington's disease and dentatorubral and pallidoluysian atrophy: correlation between the density of inclusions and IT15 CAG triplet repeat length. Neurobiol. Dis. 4, 387–397 (1998).

Myers, R.H. et al. Clinical and neuropathologic assessment of severity in Huntington disease. Neurology 38, 341–347 (1988).

Venkatraman, P., Wetzel, R., Tanaka, M., Nukina, N. & Goldberg, A.L. Eukaryotic proteasomes cannot digest polyglutamine sequences and release them during degradation of polyglutamine-containing proteins. Mol. Cell 14, 95–104 (2004).

Huang, C.C. et al. Amyloid formation by mutant huntingtin: threshold, progressivity and recruitment of normal polyglutamine proteins. Somat. Cell Mol. Genet. 24, 217–233 (1998).

Kazantsev, A., Preisinger, E., Dranovsky, A., Goldgaber, D. & Housman, D. Insoluble detergent-resistant aggregates form between pathological and nonpathological lengths of polyglutamine in mammalian cells. Proc. Natl. Acad. Sci. USA 96, 11404–11409 (1999).

Margolis, R.L. & Ross, C.A. Expansion explosion: new clues to the pathogenesis of repeat expansion neurodegenerative diseases. Trends Mol. Med. 7, 479–482 (2001).

Orr, H.T. & Zoghbi, H.Y. SCA1 molecular genetics: a history of a 13 year collaboration against glutamines. Hum. Mol. Genet. 10, 2307–2311 (2001).

Sen, S., Dash, D., Pasha, S. & Brahmachari, S.K. Role of histidine interruption in mitigating the pathological effects of long polyglutamine stretches in SCA1: a molecular approach. Protein Sci. 12, 953–962 (2003).

Selkoe, D.J. Alzheimer's disease is a synaptic failure. Science 298, 789–791 (2002).

Hardy, J. & Selkoe, D.J. The amyloid hypothesis of Alzheimer's disease: progress and problems on the road to therapeutics. Science 297, 353–356 (2002).

Serpell, L.C. & Smith, J.M. Direct visualisation of the β-sheet structure of synthetic Alzheimer's amyloid. J. Mol. Biol. 299, 225–231 (2000).

Esler, W.P. & Wolfe, M.S. A portrait of Alzheimer secretases—new features and familiar faces. Science 293, 1449–1454 (2001).

Citron, M. Alzheimer's disease: treatments in discovery and development. Nat. Neurosci. 5 (suppl.), 1055–1057 (2002).

Goedert, M. Tau protein and neurodegeneration. Semin. Cell Dev. Biol. 15, 45–49 (2004).

Ingram, E.M. & Spillantini, M.G. Tau gene mutations: dissecting the pathogenesis of FTDP–17. Trends Mol. Med. 8, 555–562 (2002).

Forno, L.S. Neuropathology of Parkinson's disease. J. Neuropathol. Exp. Neurol. 55, 259–272 (1996).

Dawson, T.M. & Dawson, V.L. Rare genetic mutations shed light on the pathogenesis of Parkinson disease. J. Clin. Invest 111, 145–151 (2003).

Valente, E.M. et al. Hereditary early-onset Parkinson's disease caused by mutations in PINK1. Science 304, 1158–1160 (2004).

Cleveland, D.W. & Rothstein, J.D. From Charcot to Lou Gehrig: deciphering selective motor neuron death in ALS. Nat. Rev. Neurosci. 2, 806–819 (2001).

Bruijn, L.I. et al. Aggregation and motor neuron toxicity of an ALS-linked SOD1 mutant independent from wild-type SOD1. Science 281, 1851–1854 (1998).

Rakhit, R. et al. Oxidation-induced misfolding and aggregation of superoxide dismutase and its implications for amyotrophic lateral sclerosis. J. Biol. Chem. 277, 47551–47556 (2002).

Prusiner, S.B. Shattuck lecture—neurodegenerative diseases and prions. N. Engl. J. Med. 344, 1516–1526 (2001).

Lindquist, S., Krobitsch, S., Li, L. & Sondheimer, N. Investigating protein conformation-based inheritance and disease in yeast. Phil. Trans. R. Soc. Lond. B 356, 169–176 (2001).

Scheibel, T., Bloom, J. & Lindquist, S.L. The elongation of yeast prion fibers involves separable steps of association and conversion. Proc. Natl. Acad. Sci. USA 101, 2287–2292 (2004).

Ma, J., Wollmann, R. & Lindquist, S. Neurotoxicity and neurodegeneration when PrP accumulates in the cytosol. Science 298, 1781–1785 (2002).

Ma, J. & Lindquist, S. Conversion of PrP to a self-perpetuating PrPSc-like conformation in the cytosol. Science 298, 1785–1788 (2002).

Eanes, E.D. & Glenner, G.G. X-ray diffraction studies on amyloid filaments. J. Histochem. Cytochem. 16, 673–677 (1968).

Sunde, M. & Blake, C.C. From the globular to the fibrous state: protein structure and structural conversion in amyloid formation. Q. Rev. Biophys. 31, 1–39 (1998).

Benzinger, T.L. et al. Propagating structure of Alzheimer's β-amyloid(10–35) is parallel β-sheet with residues in exact register. Proc. Natl. Acad. Sci. USA 95, 13407–13412 (1998).

Tycko, R. Insights into the amyloid folding problem from solid-state NMR. Biochemistry 42, 3151–3159 (2003).

Torok, M. et al. Structural and dynamic features of Alzheimer's Aβ peptide in amyloid fibrils studied by site-directed spin labeling. J. Biol. Chem. 277, 40810–40815 (2002).

Der-Sarkissian, A., Jao, C.C., Chen, J. & Langen, R. Structural organization of α-synuclein fibrils studied by site-directed spin labeling. J. Biol. Chem. 278, 37530–37535 (2003).

Benzinger, T.L. et al. Two-dimensional structure of β-amyloid(10–35) fibrils. Biochemistry 39, 3491–3499 (2000).

Balbach, J.J. et al. Amyloid fibril formation by Aβ16–22, a seven-residue fragment of the Alzheimer's β-amyloid peptide, and structural characterization by solid state NMR. Biochemistry 39, 13748–13759 (2000).

Williams, A.D. et al. Mapping Aβ amyloid fibril secondary structure using scanning proline mutagenesis. J. Mol. Biol. 335, 833–842 (2004).

Thakur, A.K. & Wetzel, R. Mutational analysis of the structural organization of polyglutamine aggregates. Proc. Natl. Acad. Sci. USA 99, 17014–17019 (2002).

Ross, C.A., Poirier, M.A., Wanker, E.E. & Amzel, M. Polyglutamine fibrillogenesis: the pathway unfolds. Proc. Natl. Acad. Sci. USA 100, 1–3 (2003).

Chen, S., Berthelier, V., Hamilton, J.B., O'Nuallain, B. & Wetzel, R. Amyloid-like features of polyglutamine aggregates and their assembly kinetics. Biochemistry 41, 7391–7399 (2002).

O'Nuallain, B. & Wetzel, R. Conformational Abs recognizing a generic amyloid fibril epitope. Proc. Natl. Acad. Sci. USA 99, 1485–1490 (2002).

Uversky, V.N. Protein folding revisited. A polypeptide chain at the folding-misfolding-nonfolding cross-roads: which way to go? Cell Mol. Life Sci. 60, 1852–1871 (2003).

Clarke, G. et al. A one-hit model of cell death in inherited neuronal degenerations. Nature 406, 195–199 (2000).

Dobson, C.M. Principles of protein folding, misfolding and aggregation. Semin. Cell Dev. Biol. 15, 3–16 (2004).

Sacchettini, J.C. & Kelly, J.W. Therapeutic strategies for human amyloid diseases. Nat. Rev. Drug Discov. 1, 267–275 (2002).

Lansbury, P.T., Jr. Structural neurology: are seeds at the root of neuronal degeneration? Neuron 19, 1151–1154 (1997).

Soto, C. Unfolding the role of protein misfolding in neurodegenerative diseases. Nat. Rev. Neurosci. 4, 49–60 (2003).

Singleton, A.B. et al. α-Synuclein locus triplication causes Parkinson's disease. Science 302, 841 (2003).

Singleton, A., Myers, A. & Hardy, J. The law of mass action applied to neurodegenerative disease: a hypothesis concerning the etiology and pathogenesis of complex diseases. Hum. Mol. Genet. 13 (special no 1), R123–R126 (2004).

Conway, K.A., Rochet, J.C., Bieganski, R.M. & Lansbury, P.T., Jr. Kinetic stabilization of the α-synuclein protofibril by a dopamine-α-synuclein adduct. Science 294, 1346–1349 (2001).

Giasson, B.I. et al. Oxidative damage linked to neurodegeneration by selective α-synuclein nitration in synucleinopathy lesions. Science 290, 985–989 (2000).

Iwatsubo, T. et al. Purification and characterization of Lewy bodies from the brains of patients with diffuse Lewy body disease. Am. J. Pathol. 148, 1517–1529 (1996).

Iwatsubo, T. Aggregation of α-synuclein in the pathogenesis of Parkinson's disease. J. Neurol. 250 (suppl. 3), III11–III14 (2003).

Okochi, M. et al. Constitutive phosphorylation of the Parkinson's disease associated α-synuclein. J. Biol. Chem. 275, 390–397 (2000).

Spillantini, M.G. et al. α-synuclein in Lewy bodies. Nature 388, 839–840 (1997).

Emamian, E.S. et al. Serine 776 of ataxin-1 is critical for polyglutamine-induced disease in SCA1 transgenic mice. Neuron 38, 375–387 (2003).

Steffan, J.S. et al. Modification of Huntingtin and Huntington's disease pathology. Science 304, 100–104 (2004).

DiFiglia, M. et al. Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain. Science 277, 1990–1993 (1997).

Saudou, F., Finkbeiner, S., Devys, D. & Greenberg, M.E. Huntingtin acts in the nucleus to induce apoptosis but death does not correlate with the formation of intranuclear inclusions. Cell 95, 55–66 (1998).

Peters, M.F. et al. Nuclear targeting of mutant Huntingtin increases toxicity. Mol. Cell Neurosci. 14, 121–128 (1999).

de Almeida, L.P., Ross, C.A., Zala, D., Aebischer, P. & Deglon, N. Lentiviral-mediated delivery of mutant huntingtin in the striatum of rats induces a selective neuropathology modulated by polyglutamine repeat size, huntingtin expression levels, and protein length. J. Neurosci. 22, 3473–3483 (2002).

Wellington, C.L. et al. Caspase cleavage of mutant huntingtin precedes neurodegeneration in Huntington's disease. J. Neurosci. 22, 7862–7872 (2002).

Gafni, J. et al. Inhibition of calpain cleavage of Huntingtin reduces toxicity: accumulation of calpain/caspase fragments in the nucleus. J. Biol. Chem. 279, 21211–21220 (2004).

Lunkes, A. et al. Proteases acting on mutant huntingtin generate cleaved products that differentially build up cytoplasmic and nuclear inclusions. Mol. Cell 10, 259–269 (2002).

Poirier, M.A. et al. Huntingtin spheroids and protofibrils as precursors in polyglutamine fibrilization. J. Biol. Chem. 277, 41032–41037 (2002).

Nucifora, F.C., Jr. et al. Nuclear localization of a non-caspase truncation product of atrophin-1, with an expanded polyglutamine repeat, increases cellular toxicity. J. Biol. Chem. 278, 13047–13055 (2003).

Lee, E.N. et al. Phthalocyanine tetrasulfonates affect the amyloid formation and cytotoxicity of α-synuclein. Biochemistry 43, 3704–3715 (2004).

Buxbaum, J.N. Diseases of protein conformation: what do in vitro experiments tell us about in vivo diseases? Trends Biochem. Sci. 28, 585–592 (2003).

Wetzel, R. Ideas of order for amyloid fibril structure. Structure (Camb) 10, 1031–1036 (2002).

Soreghan, B., Kosmoski, J. & Glabe, C. Surfactant properties of Alzheimer's Aβ peptides and the mechanism of amyloid aggregation. J. Biol. Chem. 269, 28551–28554 (1994).

Harper, J.D., Lieber, C.M. & Lansbury, P.T., Jr. Atomic force microscopic imaging of seeded fibril formation and fibril branching by the Alzheimer's disease amyloid-β protein. Chem. Biol. 4, 951–959 (1997).

Lambert, M.P. et al. Diffusible, nonfibrillar ligands derived from Aβ1–42 are potent central nervous system neurotoxins. Proc. Natl. Acad. Sci. USA 95, 6448–6453 (1998).

Harper, J.D., Wong, S.S., Lieber, C.M. & Lansbury, P.T., Jr. Assembly of Aβ amyloid protofibrils: an in vitro model for a possible early event in Alzheimer's disease. Biochemistry 38, 8972–8980 (1999).

Harper, J.D. & Lansbury, P.T., Jr. Models of amyloid seeding in Alzheimer's disease and scrapie: mechanistic truths and physiological consequences of the time-dependent solubility of amyloid proteins. Annu. Rev. Biochem. 66, 385–407 (1997).

Walsh, D.M., Lomakin, A., Benedek, G.B., Condron, M.M. & Teplow, D.B. Amyloid β-protein fibrillogenesis. Detection of a protofibrillar intermediate. J. Biol. Chem. 272, 22364–22372 (1997).

Klein, W.L., Krafft, G.A. & Finch, C.E. Targeting small Aβ oligomers: the solution to an Alzheimer's disease conundrum? Trends Neurosci. 24, 219–224 (2001).

Terry, R.D. et al. Physical basis of cognitive alterations in Alzheimer's disease: synapse loss is the major correlate of cognitive impairment. Ann. Neurol. 30, 572–580 (1991).

McLean, C.A. et al. Soluble pool of Aβ amyloid as a determinant of severity of neurodegeneration in Alzheimer's disease. Ann. Neurol. 46, 860–866 (1999).

Lue, L.F. et al. Soluble amyloid β peptide concentration as a predictor of synaptic change in Alzheimer's disease. Am. J. Pathol. 155, 853–862 (1999).

Walsh, D.M. et al. Naturally secreted oligomers of amyloid β protein potently inhibit hippocampal long-term potentiation in vivo. Nature 416, 535–539 (2002).

Volles, M.J. et al. Vesicle permeabilization by protofibrillar α-synuclein: implications for the pathogenesis and treatment of Parkinson's disease. Biochemistry 40, 7812–7819 (2001).

Sharon, R. et al. The formation of highly soluble oligomers of α-synuclein is regulated by fatty acids and enhanced in Parkinson's disease. Neuron 37, 583–595 (2003).

Chen, S., Berthelier, V., Yang, W. & Wetzel, R. Polyglutamine aggregation behavior in vitro supports a recruitment mechanism of cytotoxicity. J. Mol. Biol. 311, 173–182 (2001).

Sanchez, I., Mahlke, C. & Yuan, J. Pivotal role of oligomerization in expanded polyglutamine neurodegenerative disorders. Nature 421, 373–379 (2003).

Nucifora, F.C., Jr. et al. Interference by huntingtin and atrophin-1 with CBP-mediated transcription leading to cellular toxicity. Science 291, 2423–2428 (2001).

Jiang, H., Nucifora, F.C., Jr., Ross, C.A. & DeFranco, D.B. Cell death triggered by polyglutamine-expanded huntingtin in a neuronal cell line is associated with degradation of CREB-binding protein. Hum. Mol. Genet. 12, 1–12 (2003).

Bence, N.F., Sampat, R.M. & Kopito, R.R. Impairment of the ubiquitin-proteasome system by protein aggregation. Science 292, 1552–1555 (2001).

Kayed, R. et al. Common structure of soluble amyloid oligomers implies common mechanism of pathogenesis. Science 300, 486–489 (2003).

McClellan, A.J. & Frydman, J. Molecular chaperones and the art of recognizing a lost cause. Nat. Cell Biol. 3, E51–E53 (2001).

Goldberg, A.L. Protein degradation and protection against misfolded or damaged proteins. Nature 426, 895–899 (2003).

Ciechanover, A. & Brundin, P. The ubiquitin proteasome system in neurodegenerative diseases: sometimes the chicken, sometimes the egg. Neuron 40, 427–446 (2003).

Kopito, R.R. Aggresomes, inclusion bodies and protein aggregation. Trends Cell Biol. 10, 524–530 (2000).

Ravikumar, B. et al. Inhibitor of mTOR induces autophagy and reduces toxicity of polyglutamine expansions in fly and mouse models of Huntington disease. Nat. Genet. 36, 585–595 (2004).

Tanaka, M. et al. Aggresomes formed by α-synuclein and synphilin-1 are cytoprotective. J. Biol. Chem. 279, 4625–4631 (2004).

Verhoef, L.G., Lindsten, K., Masucci, M.G. & Dantuma, N.P. Aggregate formation inhibits proteasomal degradation of polyglutamine proteins. Hum. Mol. Genet. 11, 2689–2700 (2002).

Winklhofer, K.F., Reintjes, A., Hoener, M.C., Voellmy, R. & Tatzelt, J. Geldanamycin restores a defective heat shock response in vivo. J. Biol. Chem. 276, 45160–45167 (2001).

Piper, P.W. The Hsp90 chaperone as a promising drug target. Curr. Opin. Investig. Drugs 2, 1606–1610 (2001).

Yamamoto, A., Lucas, J.J. & Hen, R. Reversal of neuropathology and motor dysfunction in a conditional model of Huntington's disease. Cell 101, 57–66 (2000).

Davidson, B.L. & Paulson, H.L. Molecular medicine for the brain: silencing of disease genes with RNA interference. Lancet Neurol. 3, 145–149 (2004).

Miller, V.M. et al. Allele-specific silencing of dominant disease genes. Proc. Natl. Acad. Sci. USA 100, 7195–7200 (2003).

Monsonego, A. & Weiner, H.L. Immunotherapeutic approaches to Alzheimer's disease. Science 302, 834–838 (2003).

Mattson, M.P. & Chan, S.L. Good and bad amyloid antibodies. Science 301, 1847–1849 (2003).

Weiner, H.L. & Selkoe, D.J. Inflammation and therapeutic vaccination in CNS diseases. Nature 420, 879–884 (2002).

Tanaka, M. et al. Trehalose alleviates polyglutamine-mediated pathology in a mouse model of Huntington disease. Nat. Med. 10, 148–154 (2004).

Bohrmann, B. et al. Self-assembly of β-amyloid 42 is retarded by small molecular ligands at the stage of structural intermediates. J. Struct. Biol. 130, 232–246 (2000).

Wood, S.J., MacKenzie, L., Maleeff, B., Hurle, M.R. & Wetzel, R. Selective inhibition of Aβ fibril formation. J. Biol. Chem. 271, 4086–4092 (1996).

Reixach, N., Crooks, E., Ostresh, J.M., Houghten, R.A. & Blondelle, S.E. Inhibition of β-amyloid-induced neurotoxicity by imidazopyridoindoles derived from a synthetic combinatorial library. J. Struct. Biol. 130, 247–258 (2000).

May, B.C. et al. Potent inhibition of scrapie prion replication in cultured cells by bis-acridines. Proc. Natl. Acad. Sci. USA 100, 3416–3421 (2003).

Cordeiro, Y., Lima, L.M., Gomes, M.P., Foguel, D. & Silva, J.L. Modulation of prion protein oligomerization, aggregation, and β-sheet conversion by 4,4'-dianilino-1,1'-binaphthyl-5,5′-sulfonate (bis-ANS). J. Biol. Chem. 279, 5346–5352 (2004).

Heiser, V. et al. Identification of benzothiazoles as potential polyglutamine aggregation inhibitors of Huntington's disease by using an automated filter retardation assay. Proc. Natl. Acad. Sci. USA 99, 16400–16406 (2002).

Pollitt, S.K. et al. A rapid cellular FRET assay of polyglutamine aggregation identifies a novel inhibitor. Neuron 40, 685–694 (2003).

John, V., Beck, J.P., Bienkowski, M.J., Sinha, S. & Heinrikson, R.L. Human β-secretase (BACE) and BACE inhibitors. J. Med. Chem. 46, 4625–4630 (2003).

Petkova, A.T. et al. A structural model for Alzheimer's β-amyloid fibrils based on experimental constraints from solid state NMR. Proc. Natl. Acad. Sci. USA 99, 16742–16747 (2002).


Discussion

The FU gene

In the present paper, the FU gene was identified and its structure determined. FU consists of 29 exons of which exons 1 and 2 encode 5'UTRs, exon 3 contains the initiating ATG codon and exons 13 and 29 contain in frame stop codons (Fig. 1). Exon 1 and 2 may serve as alternative first exons, like the alternative exons 1, 1A and 1B found in the PTCH1 gene [23]. Exons 3 to 9 encode the kinase domain. This segment has strong similarity to Drosophila fu, whereas the remaining C-terminal part has a much weaker similarity [8]. Using the DIALIGN program [24] it is possible to align fu to L-FU in two regions in the C-terminal part (not shown). These are largely encoded by exons 15–16 and 22–29. This indicates that exons 10–14 and 17–21 may have been recruited to the FU gene during evolution.

Investigations of Syndactyly patient material

We investigated the possibility that FU underlies SD1 based upon the fact that FU lies within the localization interval for SD1 and that it is part of the Sonic Hedgehog signaling pathway, which participates in digital patterning [1]. Although three previously reported single nucleotide polymorphisms were identified, we did not detect any mutation in the FU coding region or flanking intronic regions. While these results do not implicate FU in the causation of SD1, it is possible that this disorder is caused by mutations in the noncoding regions not screened in this study. Alternatively, SD1 could be caused by a genomic rearrangement not identified by sequence analysis, although no altered bands were detected in an affected member of the SD1 family by Southern analysis using a FU cDNA clone as probe (data not shown).

Expression analyses

Analyses of FU expression have shown that transcripts are detected in all tissues examined. For the first time evidence is presented showing that more than one transcript can be expressed from this gene. The Northern blots clearly show that FU transcripts of different sizes indeed exist. Here the transcripts are estimated to be slightly bigger than previously reported and in some tissues more than one transcript is evident. It is clear from the available cDNAs and RT/PCR based transcript analyses that alternative splicing occurs. Additionally, it is also clear that different 5'UTRs are present in the transcripts. At least two protein isoforms, besides the previously described L-FU [8], may be produced. The S-FU isoform is the one that most dramatically differs from L-FU, consisting only of the N-terminal one third of L-FU. S-FU expression results from inclusion of exon 13 in the mature transcript. This alternative splicing event was detected in all tissues examined and at an apparently constant ratio. Also a case of regulated alternative splicing was detected by RT/PCR, but with a much less dramatic impact at the protein level, since it only results in the loss of 21 residues encoded by exon 24. However, the expression reveals a possible tissue specific regulation of this alternative splicing event. This may well reflect that L-FUΔ24 plays a biological role different from L-FU. Since it appears that the mRNA for L-FUΔ24 is not expressed in small intestine and prostate it can be speculated that FU has a different role there, if a leucine zipper is truly lost in L-FUΔ24. It is intriguing that testis appears to express transcripts both with and without the 63 bp segment and is also the tissue with strongest expression. Perhaps the expression of L-FUΔ24 and L-FU together is linked to the function of Desert Hedgehog which has been shown to have a particular role in spermatogenesis [25]. Whether interactions with GLI proteins, SUFU or other components of the signaling pathway are altered, and if this has any impact on GLI or SUFU activities, remains to be investigated. Certainly this adds another variable to the complicated picture of Hedgehog signaling and GLI regulation in vertebrates.

Functional investigations and perspectives

The assessment of functionality revealed that S-FU was not able to regulate GLI1 or GLI2 when expressed in 293 cells. In contrast, both L-FU and a variant lacking a full kinase domain (2HFU) were able to enhance GLI2 induced transcription. These results are qualitatively similar to those previously reported in C3H/10T½ cells [8]. L-FU and 2HFU were only able to enhance GLI2 activity 2 to 3 fold in 293 cells, whereas 5 to 8 fold inductions are seen in C3H/10T½ cells. This may reflect the fact that the latter cell line expresses additional components of the Hedgehog signaling pathway, which are required for full activity of FU. Unlike the previous investigations [8] it was not possible to see an effect of L-FU on SUFU. Again this difference may be explained by the various properties of the cell lines used. Understanding the signaling events downstream of SMO may reveal functional differences of the proteins involved, as compared to their fruit fly counterparts. Although SUFU inhibits GLI transcription factors and su(fu) inhibits Ci, there are still striking differences. As yet there have been no reports of a cos2 counterpart in vertebrates. Instead it has been observed that FU interacts with all GLI proteins and SUFU [8], even though fu does not bind to Ci [26]. It has also been observed that both L-FU and SUFU can be found in the nucleus [5–8], which has not been observed for fu or su(fu). It is likely that both FU and SUFU are shuttled in and out of the nucleus by binding to GLI proteins [5, 8]. Though basic activities of both FU and SUFU in regulation of GLI have been conserved, it also appears that significant differences from their fruit fly counterparts exist. Clearly, FU is not having an effect on GLI1 similar to the one seen on GLI2. Additional investigations are needed in order to establish the role of FU in hedgehog signaling and GLI control. The role of the different isoforms also remains to be elucidated. These have to be tested individually for their regulation of all GLI proteins and proteolytic products. Fu is known to have at least two separate physiological functions in the fly, one of which is dependent upon the kinase domain [27]. Likewise, FU may well have two or more distinct functions in signaling, represented by different domains, isoforms and protein interactions.


Results

Patient mutations in KDM5C

We obtained fibroblasts from seven patients suffering from XLID caused by four independent mutations in KDM5C (Table 1). The location of the mutations with respect to annotated features of the KDM5C protein is shown in Figure 1A. The mutations and clinical symptoms have been previously reported for p.D402Y ( 12), p.P480L ( 22) and c.2T>C ( 23). The patient with the c.3223delG (p.V1075Yfs*2) mutation exhibited severe ID, with developmental delay, short stature (5th–10th percentile) and hyperreflexia. Patient 3223delG-1 walked at 22 months, used only 6–8 words by the age of 13.5 years, and displays strabismus, a high narrow palate and constipation. We also examined fibroblasts from a female heterozygous carrier of the c.3223delG mutation (3223delG-HET mother of the affected male), who required tutoring in school and has a high narrow palate. The patient's sister (not investigated) is also a carrier of the mutation and exhibits mild intellectual disability, developmental delay, hyperreflexia and a high narrow palate, suggesting that the mutation can be deleterious in the heterozygous setting. For controls, we obtained BJ foreskin fibroblasts from Dr George Q Daley (Boston Children's Hospital) and control male skin fibroblasts from the Coriell Cell Repository (Control-1: GM03348, 10 years Control-2: AG06234, 17 years).

Patient fibroblast information

. Patient identifier . Family . Age at biopsy (year) . cDNA mutation . Protein mutation . ID . Clinical notes .
D402Y-1 38505 A034 72 c.1204G>T p.D402Y Moderate to severe DD, short stature, aggression
D402Y-2 38506 A034 67 c.1204G>T p.D402Y Moderate to severe DD, short stature, rare seizures, aggression
P480L-1 13185 K9330 8 c.1439C>T p.P480L Moderate DD, strabismus, mild scoliosis
P480L-2 13186 K9330 1 c.1439C>T p.P480L Mild DD, short stature, tremor, seizures, aggression
2T>C-1 13008 23 c.2T>C p.M1_E165del Severe DD, short stature, strabismus, seizures, ataxia, aggression
2T>C-2 13009 13 c.2T>C p.M1_E165del Severe DD, short stature, ataxia,
3223delG-1 22986 K9607 17 c.3223delG p.V1075Yfs*2 Severe DD, short stature, strabismus, hyperreflexia,
. Patient identifier . Family . Age at biopsy (year) . cDNA mutation . Protein mutation . ID . Clinical notes .
D402Y-1 38505 A034 72 c.1204G>T p.D402Y Moderate to severe DD, short stature, aggression
D402Y-2 38506 A034 67 c.1204G>T p.D402Y Moderate to severe DD, short stature, rare seizures, aggression
P480L-1 13185 K9330 8 c.1439C>T p.P480L Moderate DD, strabismus, mild scoliosis
P480L-2 13186 K9330 1 c.1439C>T p.P480L Mild DD, short stature, tremor, seizures, aggression
2T>C-1 13008 23 c.2T>C p.M1_E165del Severe DD, short stature, strabismus, seizures, ataxia, aggression
2T>C-2 13009 13 c.2T>C p.M1_E165del Severe DD, short stature, ataxia,
3223delG-1 22986 K9607 17 c.3223delG p.V1075Yfs*2 Severe DD, short stature, strabismus, hyperreflexia,

Annotations based on Homo sapiens lysine (K)-specific demethylase 5C (KDM5C) transcript variant 1 (uc004drz.3/NM_004187) in the GRCh38/hg38 genome assembly.

ID, intellectual disability DD, developmental delay.

Patient fibroblast information

. Patient identifier . Family . Age at biopsy (year) . cDNA mutation . Protein mutation . ID . Clinical notes .
D402Y-1 38505 A034 72 c.1204G>T p.D402Y Moderate to severe DD, short stature, aggression
D402Y-2 38506 A034 67 c.1204G>T p.D402Y Moderate to severe DD, short stature, rare seizures, aggression
P480L-1 13185 K9330 8 c.1439C>T p.P480L Moderate DD, strabismus, mild scoliosis
P480L-2 13186 K9330 1 c.1439C>T p.P480L Mild DD, short stature, tremor, seizures, aggression
2T>C-1 13008 23 c.2T>C p.M1_E165del Severe DD, short stature, strabismus, seizures, ataxia, aggression
2T>C-2 13009 13 c.2T>C p.M1_E165del Severe DD, short stature, ataxia,
3223delG-1 22986 K9607 17 c.3223delG p.V1075Yfs*2 Severe DD, short stature, strabismus, hyperreflexia,
. Patient identifier . Family . Age at biopsy (year) . cDNA mutation . Protein mutation . ID . Clinical notes .
D402Y-1 38505 A034 72 c.1204G>T p.D402Y Moderate to severe DD, short stature, aggression
D402Y-2 38506 A034 67 c.1204G>T p.D402Y Moderate to severe DD, short stature, rare seizures, aggression
P480L-1 13185 K9330 8 c.1439C>T p.P480L Moderate DD, strabismus, mild scoliosis
P480L-2 13186 K9330 1 c.1439C>T p.P480L Mild DD, short stature, tremor, seizures, aggression
2T>C-1 13008 23 c.2T>C p.M1_E165del Severe DD, short stature, strabismus, seizures, ataxia, aggression
2T>C-2 13009 13 c.2T>C p.M1_E165del Severe DD, short stature, ataxia,
3223delG-1 22986 K9607 17 c.3223delG p.V1075Yfs*2 Severe DD, short stature, strabismus, hyperreflexia,

Annotations based on Homo sapiens lysine (K)-specific demethylase 5C (KDM5C) transcript variant 1 (uc004drz.3/NM_004187) in the GRCh38/hg38 genome assembly.

ID, intellectual disability DD, developmental delay.

The c.3223delG mutation precludes KDM5C expression, but other patient mutations are compatible with transcript and protein production. (A) Schematic of KDM5C protein structure, with domains and patient mutations indicated. M166 is the predicted start codon for patients with the c.2T>C mutation. The KDM5C antibody is raised against the C-terminal 100 amino acids of the protein. (B) Analysis of KDM5C RNA amplified using either random primers (filled bars) or oligo dT primers (open bars) demonstrates that KDM5C RNA is detected in affected males exhibiting the p.D402Y, p.P480L or c.2T>C mutations, but not in the affected male with the c.3223delG mutation. Mean and standard deviation of 3–4 independent experiments are displayed (*P < 0.01 compared with Control-1, Student's t-test). RNA was detected using primers spanning exon–exon junctions, and normalized to housekeeping gene RSP18 and then Control-1 fibroblasts. (C) Whole cell lysates from patient skin fibroblasts shows KDM5C protein expression in affected males exhibiting the p.P480L and p.D402Y mutations. No expression is detected in the c.3223delG affected male. Males carrying the c.2T>C mutation produce a truncated KDM5C product. Lowest band, x, is non-specific. TUBULIN, loading control. (D) KDM5C shRNAs reduce KDM5C transcript in control-BJ and c.2T>C fibroblasts. Mean and standard deviation of two independent experiments are displayed (*P < 0.05 compared with FF, Student's t-test). RNA was detected using primers spanning exon–exon junctions, and normalized to housekeeping gene RSP18 and then control shRNA samples (FF, firefly luciferase). (E) Western blots show that the truncated product produced in c.2T>C fibroblasts is reduced upon KDM5C-shRNA treatment, suggesting that it is specific. Lowest band, x, is non-specific. ACTIN, loading control.

The c.3223delG mutation precludes KDM5C expression, but other patient mutations are compatible with transcript and protein production. (A) Schematic of KDM5C protein structure, with domains and patient mutations indicated. M166 is the predicted start codon for patients with the c.2T>C mutation. The KDM5C antibody is raised against the C-terminal 100 amino acids of the protein. (B) Analysis of KDM5C RNA amplified using either random primers (filled bars) or oligo dT primers (open bars) demonstrates that KDM5C RNA is detected in affected males exhibiting the p.D402Y, p.P480L or c.2T>C mutations, but not in the affected male with the c.3223delG mutation. Mean and standard deviation of 3–4 independent experiments are displayed (*P < 0.01 compared with Control-1, Student's t-test). RNA was detected using primers spanning exon–exon junctions, and normalized to housekeeping gene RSP18 and then Control-1 fibroblasts. (C) Whole cell lysates from patient skin fibroblasts shows KDM5C protein expression in affected males exhibiting the p.P480L and p.D402Y mutations. No expression is detected in the c.3223delG affected male. Males carrying the c.2T>C mutation produce a truncated KDM5C product. Lowest band, x, is non-specific. TUBULIN, loading control. (D) KDM5C shRNAs reduce KDM5C transcript in control-BJ and c.2T>C fibroblasts. Mean and standard deviation of two independent experiments are displayed (*P < 0.05 compared with FF, Student's t-test). RNA was detected using primers spanning exon–exon junctions, and normalized to housekeeping gene RSP18 and then control shRNA samples (FF, firefly luciferase). (E) Western blots show that the truncated product produced in c.2T>C fibroblasts is reduced upon KDM5C-shRNA treatment, suggesting that it is specific. Lowest band, x, is non-specific. ACTIN, loading control.

Patient mutations reduce KDM5C protein levels

We first analyzed the effect of the genetic mutations on the RNA expression level of KDM5C in patient and control fibroblasts using qRT-PCR (quantitative reverse transcription with PCR Fig. 1B). We analyzed both total RNA amplified using random hexamers (filled bars) and polyadenylated mRNA amplified using oligo dT primers (open bars). Patient fibroblasts harboring p.D402Y, p.P480L and c.2T>C mutations expressed KDM5C. However, there were significant reductions in the levels of total and polyadenylated KDM5C RNA in fibroblasts with the p.P480L mutation, and in total KDM5C RNA in fibroblasts with the p.D402Y mutation. KDM5C mRNA levels in c.3223delG fibroblasts were dramatically reduced to ∼15% of control cell lines, consistent with the resultant mRNA being targeted for nonsense-mediated mRNA decay (NMD Fig. 1B). KDM5C mRNA levels in the c.3223delG carrier female were slightly but significantly reduced compared with control skin fibroblasts.

KDM5C is a member of the KDM5 demethylase family and KDM5C homologs might compensate for KDM5C loss-of-function. We therefore analyzed the mRNA expression level of KDM5 family members KDM5A, KDM5B and KDM5D (also known as RBP2, PLU-1 and SMCY, respectively), to see whether these were altered in patient fibroblasts with KDM5C mutations ( Supplementary Material, Fig. S1 A). Levels of KDM5A mRNA were comparable in all patient and control fibroblast lines. Levels of KDM5B RNA were slightly but significantly reduced in P480L-2, 2T>C-1, 3223delG-1 and 3223delG-HET. Y-linked SMCY was undetectable in the carrier female as expected, but present at similar levels in affected males and controls.

c.1204G>T and c.1439C>T are missense mutations predicted to give rise to full-length KDM5C protein with an amino acid substitution at a single residue: p.D402Y and p.P480L, respectively. As predicted, we detected KDM5C protein at the expected molecular weight (176 kDa) on western blots of whole cell lysates from fibroblasts with c.1204G>T and c.1439C>T mutations (Fig. 1C) using an antibody raised against the C-terminus of KDM5C (Fig. 1A). In contrast, c.3223delG is predicted to cause a frameshift mutation introducing a premature termination codon (PTC) at the position immediately after the first altered amino acid (p.V1075Yfs*2). This is consistent with the low levels of KDM5C RNA detected in 3223delG-1 fibroblasts (Fig. 1B) and we anticipated no protein production. We detected no KDM5C protein in c.3223delG fibroblasts (Fig. 1C), but the KDM5C antibody is raised to the portion of KDM5C beyond the PTC (Fig. 1A) and therefore would not detect a potentially truncated product. Interestingly, we detected KDM5C protein in fibroblasts with the c.2T>C mutation in the translation start codon of KDM5C (Fig. 1C). The c.2T>C KDM5C protein runs at a lower molecular weight than wild-type KDM5C, suggesting production of an in-frame, N-terminally truncated protein. Importantly, global levels of KDM5C protein are lower in all patient lines compared to controls (Fig. 1C).

To verify that the lower molecular weight band seen in KDM5C western blots from c.2T>C fibroblasts is an alternative version of KDM5C, we treated control and c.2T>C fibroblasts with shRNAs targeting KDM5C. qRT-PCR analysis confirmed that two KDM5C shRNAs reduced the levels of KDM5C transcripts in control and c.2T>C patient cells compared with uninfected and control shRNA infected (FF, firefly luciferase) samples (Fig. 1D). Western blot analysis showed that KDM5C shRNA constructs decrease the full-length KDM5C signal in control fibroblasts, and also decrease the lower molecular weight signal in c.2T>C fibroblasts (Fig. 1E) corroborating that the protein recognized by the KDM5C antibody in these patient cells is indeed truncated KDM5C.

C.2T>C patients express N-terminally truncated p.M1_E165del KDM5C protein

Translation initiation site prediction algorithms based on DNA sequence anticipate that when the c.2T>C mutation is present, KDM5C translation starts at the methionine (M) at position 166, suggesting that the truncated protein is p.M1_E165del. M166 is in frame with the canonical start codon, in keeping with recognition of the truncated form by an antibody raised against the C-terminus of KDM5C (Fig. 1A). p.M1_E165del is 20 kDa smaller than wild-type (156 versus 176 kDa), which is consistent with the molecular weight difference observed in westerns (Fig. 1C), and lacks the highly conserved JmjN and ARID (AT-rich interactive domain) domains (Fig. 1A).

To test the hypothesis that c.2T>C patient cells produce p.M1_E165del KDM5C, we generated C-terminally tagged KDM5C expression constructs encoding either the full-length coding sequence with the c.2T>C mutation, or a truncated coding sequence beginning with the M166 codon (c.A1_A495del). Overexpression of these constructs in fibroblasts lacking endogenous KDM5C (c.3223delG patient fibroblasts, Fig. 2A) or in BJ control fibroblasts ( Supplementary Material, Fig. S1 B) followed by western blots showed that proteins encoded by c.2T>C and c.A1_A495del KDM5C run at a similar molecular weight, supporting our hypothesis. Data from overexpression of N-terminally tagged KDM5C with other patient mutations were consistent with results from patient cells: the missense mutations p.P480L and p.D402Y produce full-length protein (Fig. 2A, Supplementary Material, Fig. S1 B).

p.M1_E165del KDM5C is produced in c.2T>C patient fibroblasts. (A) Overexpression of KDM5C-Flag-HA in fibroblasts lacking endogenous KDM5C demonstrates similar size of proteins from a c.2T>C KDM5C expression vector, and one starting at p.M166. Overexpression of HA-Flag-KDM5C harboring patient mutations confirm that p.P480L and p.D402Y are missense mutations. H514A is a catalytic-null mutation ( 13). ACTIN, loading control. (B) Overexpression of untagged KDM5C demonstrates that KDM5C starting at M166 is a similar molecular weight to endogenous KDM5C in c.2T>C patient fibroblasts. Lowest band, x, is non-specific.

p.M1_E165del KDM5C is produced in c.2T>C patient fibroblasts. (A) Overexpression of KDM5C-Flag-HA in fibroblasts lacking endogenous KDM5C demonstrates similar size of proteins from a c.2T>C KDM5C expression vector, and one starting at p.M166. Overexpression of HA-Flag-KDM5C harboring patient mutations confirm that p.P480L and p.D402Y are missense mutations. H514A is a catalytic-null mutation ( 13). ACTIN, loading control. (B) Overexpression of untagged KDM5C demonstrates that KDM5C starting at M166 is a similar molecular weight to endogenous KDM5C in c.2T>C patient fibroblasts. Lowest band, x, is non-specific.

We then overexpressed either wild-type or c.A1_A495del KDM5C with no tags in KDM5C-null fibroblasts and compared the molecular weight of the resultant protein with endogenous KDM5C protein from control or c.2T>C patient fibroblasts (Fig. 2B). We observed that overexpressed wild-type KDM5C runs at a similar molecular weight to KDM5C in control fibroblasts, as expected. Overexpressed c.A1_A495del produces a KDM5C protein that runs at a similar molecular weight to the endogenous KDM5C in c.2T>C fibroblasts, suggesting that p.M1_E165del is produced in these patients.

P.M1_E165del KDM5C associates with chromatin

We first assessed whether the absence of the ARID DNA binding domain ( 24) prevents p.M1_E165del KDM5C associating with chromatin. To test this, we overexpressed KDM5C with patient mutations or KDM5C lacking either the JmjN or the ARID domain, and fractionated cells into cytoplasm, nucleoplasm and chromatin, before western blotting for KDM5C (Fig. 3). Loss of neither the JmjN nor the ARID domain prevents KDM5C being detected in the chromatin fraction. The truncated p.M1_E165del KDM5C is also found in the chromatin fraction (Fig. 3). Fractionation of c.2T>C patient cell lines confirmed endogenous association with chromatin ( Supplementary Material, Fig. S2 ).

Truncated c.2T>C KDM5C can associate with chromatin. Fractionation of KDM5C-null fibroblasts overexpressing KDM5C-Flag-HA demonstrates that KDM5C does not require the JmjN or ARID domains for chromatin association. TUBULIN (cytoplasm), P300 (nucleoplasm) and H3 (chromatin) loading controls. Right panel, fractionation controls.

Truncated c.2T>C KDM5C can associate with chromatin. Fractionation of KDM5C-null fibroblasts overexpressing KDM5C-Flag-HA demonstrates that KDM5C does not require the JmjN or ARID domains for chromatin association. TUBULIN (cytoplasm), P300 (nucleoplasm) and H3 (chromatin) loading controls. Right panel, fractionation controls.

Patient mutations in KDM5C reduce protein stability

Global levels of KDM5C protein are reduced in patient fibroblasts versus controls (Fig. 1C), while RNA levels are comparable (Fig. 1B), suggesting that patient mutations could affect KDM5C stability. To directly test this hypothesis, we carried out time-course analyses using the translation inhibitor cycloheximide on control and patient fibroblasts (Fig. 4). KDM5C protein is relatively stable in control fibroblasts, with a half-life of at least 12 h. Fibroblasts with the missense mutation p.D402Y have a KDM5C half-life of ∼4 h, while p.P480L KDM5C shows a more dramatic reduction to around 2 h. Fibroblasts with the c.2T>C mutation also show drastically reduced protein stability, with a half-life of ∼2 h (Fig. 4).

Patient mutations destabilize KDM5C. Cycloheximide (CHX, 100 μg/ml) time-courses on control and patient fibroblasts demonstrate that KDM5C in patient fibroblasts with c.2T>C, p.P480L and p.D402Y mutations is less stable than KDM5C in control fibroblasts. TUBULIN, loading controls.

Patient mutations destabilize KDM5C. Cycloheximide (CHX, 100 μg/ml) time-courses on control and patient fibroblasts demonstrate that KDM5C in patient fibroblasts with c.2T>C, p.P480L and p.D402Y mutations is less stable than KDM5C in control fibroblasts. TUBULIN, loading controls.

Similarly, cycloheximide time-courses on KDM5C-null fibroblasts overexpressing KDM5C with different mutations shows that c.2T>C and c.A1_A495del encoded proteins have substantially shorter half-lives than wild-type KDM5C ( Supplementary Material, Fig. S3 ). Interestingly, a reduction in stability is also seen for KDM5C lacking the JmjN domain, but not for KDM5C lacking the ARID domain, suggesting that it is the absence of the JmjN domain that reduces the stability of c.2T>C patient KDM5C ( Supplementary Material, Fig. S3 ). This is in keeping with the JmjN domain being important for stability in other JmjC-containing demethylases ( 25, 26).

Patient mutations in KDM5C reduce its histone demethylase activity

Some patient missense mutations, including p.D402Y, have been shown to compromise KDM5C demethylase activity ( 13, 14, 17), but the effect of the p.P480L mutation has not been investigated. Moreover, the truncated p.M1_E165del KDM5C produced in c.2T>C patients contains the JmjC catalytic domain but is missing other highly conserved domains, and its demethylase activity is not known.

We therefore asked whether patient mutations decreased KDM5C demethylase activity in vitro. We purified wild-type KDM5C and KDM5C containing either a catalytic-null mutation [p.H514A] ( 13) or patient mutations (p.D402Y, p.P480L, p.M1_E165del) from insect cells (Fig. 5A). We used these purified proteins in demethylase assays on histone peptides (Fig. 5B) or on purified histones (Fig. 5C), and assessed the results using either mass spectrometry or western blot using histone modification antibodies, respectively. It should be noted that it was not possible to express and purify p.M1_E165del to the same level as the other constructs (Fig. 5A), likely due to the inherent instability of this protein, and this is a caveat in assessing its demethylase activity.

Missense and translation re-start mutations reduce the demethylase activity of KDM5C in vitro. (A) Coomassie and KDM5C blots of KDM5C purified from Sf9 insect cells and used in demethylase assays on peptides (B) or histones (C). Wild-type KDM5C demethylates H3K4me3 and H3K4me2 the H514A catalytic-null form has no activity. p.D402Y, p.P480L and p.M1_E165del mutations reduce KDM5C demethylase activity.

Missense and translation re-start mutations reduce the demethylase activity of KDM5C in vitro. (A) Coomassie and KDM5C blots of KDM5C purified from Sf9 insect cells and used in demethylase assays on peptides (B) or histones (C). Wild-type KDM5C demethylates H3K4me3 and H3K4me2 the H514A catalytic-null form has no activity. p.D402Y, p.P480L and p.M1_E165del mutations reduce KDM5C demethylase activity.

Wild-type KDM5C demethylates H3K4me2 peptides to H3K4me1, and H3K4me3 to H3K4me2 (Fig. 5B), as seen previously ( 13, 14). The enzymatic-dead p.H514A protein shows no detectable demethylase activity on H3K4me2 or H3K4me3 peptides. Patient mutations p.P480L and p.D402Y strongly reduce demethylase activity on both H3K4me2 and H3K4me3 peptides compared with wild-type KDM5C, although they show residual activity compared with the catalytic-null mutant (Fig. 5B).

In addition to demethylase assays using histone peptides, we tested whether patient KDM5C mutations alter demethylase activity on purified histones, which are more physiologically relevant substrates. Wild-type KDM5C demethylates H3K4me2 and H3K4me3 on purified histones as previously reported, but results in no substantial changes in the level of H3K4me1, H3K9me3, H3K27me3, H3K36me2 or H3K36me3 (Fig. 5C). p.H514A catalytic-null KDM5C does not demethylate purified histones on H3K4me2/3, and neither does p.M1_E165del. p.D402Y and p.P480L KDM5C show limited demethylase activity on H3K4me2, whereas the demethylation of H3K4me3 by p.D402Y or p.P480L is comparable to that seen with wild-type KDM5C. The differences in mutant KDM5C activity on histones compared with peptides may be due to the presence of additional modifications on purified histones that may crosstalk with H3K4 methylation.

To complement these in vitro dememethylase assays, we overexpressed KDM5C with N-terminal (Fig. 6A) or C-terminal ( Supplementary Material, Fig. S4 ) tags in control fibroblasts and assessed H3K4me3 at a single-cell level using immunofluorescence microscopy. In cells expressing high levels of wild-type KDM5C, H3K4me3 staining was markedly reduced (arrows, Fig. 6A, Supplementary Material, Fig. S4 ). This was not the case in cells expressing high levels of the p.H514A KDM5C catalytic mutant, where levels of H3K4me3 were equal across the population of cells regardless of their KDM5C level (Fig. 6A, Supplementary Material, Fig. S4 ). High levels of p.D402Y induced a reduction in H3K4me3, while H3K4me3 staining remained high in cells overexpressing the p.P480L, p.M1_E165del (Fig. 6A) and c.2T>C ( Supplementary Material, Fig. S4 ) mutants, suggesting that these mutations reduce KDM5C activity in vivo. JmjN and ARID domains are both required for demethylase activity ( Supplementary Material, Fig. S4 ).

p.P480L and p.M1_E165del KDM5C have compromised demethylase activity ex vivo, and patient fibroblasts demonstrate local gene expression changes. (A) HA-Flag-KDM5C harboring different patient mutations was overexpressed in Control-BJ fibroblasts and the levels of KDM5C (Flag) and H3K4me3 were assessed by immunofluorescence. Cells with high WT or p.D402Y KDM5C showed reduced H3K4me3 staining, demonstrating demethylase activity. Overexpression of p.H514A, p.P480L and p.M1_E165del KDM5C did not reduce H3K4me3 levels. Arrows, cells expressing tagged KDM5C at high levels. (B) Acid histone extracts from patient skin fibroblasts show no consistent change in global H3K4 methylation. H3, loading control. (C) Changes in gene expression in patient fibroblasts compared with controls. Mean and standard deviation of three independent experiments are displayed (*P < 0.05 compared with Control-1, Student's t-test). RNA was detected using primers spanning exon–exon junctions, and normalized to housekeeping gene RSP18 and then Control-1.

p.P480L and p.M1_E165del KDM5C have compromised demethylase activity ex vivo, and patient fibroblasts demonstrate local gene expression changes. (A) HA-Flag-KDM5C harboring different patient mutations was overexpressed in Control-BJ fibroblasts and the levels of KDM5C (Flag) and H3K4me3 were assessed by immunofluorescence. Cells with high WT or p.D402Y KDM5C showed reduced H3K4me3 staining, demonstrating demethylase activity. Overexpression of p.H514A, p.P480L and p.M1_E165del KDM5C did not reduce H3K4me3 levels. Arrows, cells expressing tagged KDM5C at high levels. (B) Acid histone extracts from patient skin fibroblasts show no consistent change in global H3K4 methylation. H3, loading control. (C) Changes in gene expression in patient fibroblasts compared with controls. Mean and standard deviation of three independent experiments are displayed (*P < 0.05 compared with Control-1, Student's t-test). RNA was detected using primers spanning exon–exon junctions, and normalized to housekeeping gene RSP18 and then Control-1.

Patient fibroblasts show localized changes in gene expression, but not global changes in histone methylation

Total cellular histone methylation levels are determined by multiple different methyltransferases and demethylases. To assess the effect of reduced KDM5C activity in patients (Figs 5 and 6A) on global histone methylation in vivo, we extracted histones from patient and control fibroblasts. Western blots showed no consistent change in H3K4 methylation in patient fibroblasts compared with controls, suggesting that KDM5C mutations do not determine the global levels of H3K4 methylation (Fig. 6B).

Altered gene expression patterns in XLID do not necessarily require a global change in histone methylation, but can result from localized changes at specific loci. This is consistent with increased expression ( 27), or reduced DNA methylation ( 22), at certain genes in lymphoblastoid cell lines and blood from patients with KDM5C mutations compared with controls. We tested these previously identified genes for expression changes in patient fibroblasts compared with controls using qRT-PCR (Fig. 6C). We found three genes which showed significant up-regulation in patient fibroblasts compared with controls: EMILIN2 (elastin microfibril interface 2), TNFSF4 [tumor necrosis factor (ligand) superfamily member 4] and ZMYND12 (zinc-finger MYND-type containing 12). These findings support the idea of XLID-associated local changes in H3K4 methylation patterns and gene expression in patient cells.


References

Kryshtafovych A, Fidelis K, Moult J: Progress from CASP6 to CASP7. Proteins. 2007, 69 (Suppl 8): 194-207. 10.1002/prot.21769.

Grabowski M, Joachimiak A, Otwinowski Z, Minor W: Structural genomics: keeping up with expanding knowledge of the protein universe. Curr Opin Struct Biol. 2007, 17: 347-353. 10.1016/j.sbi.2007.06.003.

Reeves GA, Thornton JM: Integrating biological data through the genome. Hum Mol Genet. 2006, 15 (Spec No 1): R81-R87. 10.1093/hmg/ddl086.

Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJ: Integrating sequence and structural biology with DAS. BMC Bioinformatics. 2007, 8: 333-10.1186/1471-2105-8-333.

Tramontano A: The role of molecular modelling in biomedical research. FEBS Lett. 2006, 580: 2928-2934. 10.1016/j.febslet.2006.04.011.

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.

Tress M, Cheng J, Baldi P, Joo K, Lee J, Seo JH, Lee J, Baker D, Chivian D, Kim D, Ezkurdia I: Assessment of predictions submitted for the CASP7 domain prediction category. Proteins. 2007, 69 (Suppl 8): 137-151. 10.1002/prot.21675.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, Ledley RS, Suzek BE, Arminski L, Chen Y, Zhang J, Cardenas JL, Chung S, Castro-Alvear J, Dinkov G, Barker WC: PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 2004, D112-D114. 10.1093/nar/gkh097. 32 Database

Reid AJ, Yeats C, Orengo CA: Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone. Bioinformatics. 2007, 23: 2353-2360. 10.1093/bioinformatics/btm355.

Apic G, Gough J, Teichmann SA: Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001, 310: 311-325. 10.1006/jmbi.2001.4776.

Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD, Ke Z, Krylov D, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Thanki N, Yamashita RA, Yin JJ, Zhang D, Bryant SH: CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 2007, D237-D240. 10.1093/nar/gkl951. 35 Database

Hulo N, Bairoch A, Bulliard V, Cerutti L, Cuche BA, de Castro E, Lachaize C, Langendijk-Genevaux PS, Sigrist CJ: The 20 years of PROSITE. Nucleic Acids Res. 2008, D245-D249. 36 Database

Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2008, D281-D288. 36 Database

Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P: SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 2006, D257-D260. 10.1093/nar/gkj079. 34 Database

The universal protein resource (UniProt). Nucleic Acids Res. 2008, D190-D195. 36 Database

Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, et al: New developments in the InterPro database. Nucleic Acids Res. 2007, D224-D228. 10.1093/nar/gkl841. 35 Database

Yeats C, Lees J, Reid A, Kellam P, Martin N, Liu X, Orengo C: Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Res. 2008, D414-D418. 36 Database

Wilson D, Madera M, Vogel C, Chothia C, Gough J: The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res. 2007, D308-D313. 10.1093/nar/gkl910. 35 Database

Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH - a hierarchic classification of protein domain structures. Structure. 1997, 5: 1093-1108. 10.1016/S0969-2126(97)00260-8.

Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995, 247: 536-540. 10.1006/jmbi.1995.0159.

Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.

Kaplan N, Sasson O, Inbar U, Friedlich M, Fromer M, Fleischer H, Portugaly E, Linial N, Linial M: ProtoNet 4.0: a hierarchical classification of one million protein sequences. Nucleic Acids Res. 2005, D216-D218. 33 Database

Petryszak R, Kretschmann E, Wieser D, Apweiler R: The predictive power of the CluSTr database. Bioinformatics. 2005, 21: 3604-3609. 10.1093/bioinformatics/bti542.

Krause A, Stoye J, Vingron M: Large scale hierarchical clustering of protein sequences. BMC Bioinformatics. 2005, 6: 15-10.1186/1471-2105-6-15.

Bru C, Courcelle E, Carrere S, Beausse Y, Dalmar S, Kahn D: The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 2005, D212-D215. 33 Database

Portugaly E, Linial N, Linial M: EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res. 2007, D241-D246. 10.1093/nar/gkl850. 35 Database

Watson JD, Sanderson S, Ezersky A, Savchenko A, Edwards A, Orengo C, Joachimiak A, Laskowski RA, Thornton JM: Towards fully automated structure-based function prediction in structural genomics: a case study. J Mol Biol. 2007, 367: 1511-1522. 10.1016/j.jmb.2007.01.063.

Chothia C, Lesk AM: The relation between the divergence of sequence and structure in proteins. EMBO J. 1986, 5: 823-826.

Taylor WR, Orengo CA: Protein structure alignment. J Mol Biol. 1989, 208: 1-22. 10.1016/0022-2836(89)90084-3.

Redfern OC, Harrison A, Dallman T, Pearl FM, Orengo CA: CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol. 2007, 3: e232-10.1371/journal.pcbi.0030232.

Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993, 233: 123-138. 10.1006/jmbi.1993.1489.

Krissinel E, Henrick K: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr. 2004, 60: 2256-2268. 10.1107/S0907444904026460.

Madej T, Gibrat JF, Bryant SH: Threading a database of protein cores. Proteins. 1995, 23: 356-369. 10.1002/prot.340230309.

Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998, 11: 739-747. 10.1093/protein/11.9.739.

Kolodny R, Koehl P, Levitt M: Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol. 2005, 346: 1173-1188. 10.1016/j.jmb.2004.12.032.

Ye Y, Godzik A: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics. 2003, 19 (Suppl 2): ii246-255.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.

Polacco BJ, Babbitt PC: Automated discovery of 3D motifs for protein function annotation. Bioinformatics. 2006, 22: 723-730. 10.1093/bioinformatics/btk038.

Kleywegt GJ: Recognition of spatial motifs in protein structures. J Mol Biol. 1999, 285: 1887-1897. 10.1006/jmbi.1998.2393.

Wangikar PP, Tendulkar AV, Ramya S, Mali DN, Sarawagi S: Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J Mol Biol. 2003, 326: 955-978. 10.1016/S0022-2836(02)01384-0.

Laskowski RA, Luscombe NM, Swindells MB, Thornton JM: Protein clefts in molecular recognition and function. Protein Sci. 1996, 5: 2438-2452.

Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph. 1995, 13: 323-330. 10.1016/0263-7855(95)00073-9.

Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N: ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res. 2005, 33: W299-W302. 10.1093/nar/gki370.

Glaser F, Rosenberg Y, Kessel A, Pupko T, Ben-Tal N: The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures. Proteins. 2005, 58: 610-617. 10.1002/prot.20305.

Binkowski TA, Freeman P, Liang J: pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins. Nucleic Acids Res. 2004, 32: W555-W558. 10.1093/nar/gkh390.

Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J: CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res. 2006, 34: W116-W118. 10.1093/nar/gkl282.

Bagley SC, Altman RB: Characterizing the microenvironment surrounding protein sites. Protein Sci. 1995, 4: 622-635.

Shulman-Peleg A, Nussinov R, Wolfson HJ: SiteEngines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res. 2005, 33: W337-W341. 10.1093/nar/gki482.

Sasin JM, Godzik A, Bujnicki JM: SURF'S UP! - protein classification by surface comparisons. J Biosci. 2007, 32: 97-100. 10.1007/s12038-007-0009-0.

Porter CT, Bartlett GJ, Thornton JM: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 2004, 32: D129-D133. 10.1093/nar/gkh028.

Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA: PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins. Nucleic Acids Res. 2004, 32: W549-W554. 10.1093/nar/gkh439.

Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA: PDBSite: a database of the 3D structure of protein functional sites. Nucleic Acids Res. 2005, 33: D183-D187. 10.1093/nar/gki105.

George RA, Spriggs RV, Bartlett GJ, Gutteridge A, MacArthur MW, Porter CT, Al-Lazikani B, Thornton JM, Swindells MB: Effective function annotation through catalytic residue conservation. Proc Natl Acad Sci USA. 2005, 102: 12299-12304. 10.1073/pnas.0504833102.

Laskowski RA, Watson JD, Thornton JM: ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 2005, 33: W89-W93. 10.1093/nar/gki414.

Stark A, Russell RB: Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res. 2003, 31: 3341-3344. 10.1093/nar/gkg506.

Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996, 257: 342-358. 10.1006/jmbi.1996.0167.

Kristensen DM, Ward RM, Lisewski AM, Erdin S, Chen BY, Fofanov VY, Kimmel M, Kavraki LE, Lichtarge O: Prediction of enzyme function based on 3D templates of evolutionarily important amino acids. BMC Bioinformatics. 2008, 9: 17-10.1186/1471-2105-9-17.

Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O: De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS ONE. 2008, 3: e2136-10.1371/journal.pone.0002136.

Herrgard S, Cammer SA, Hoffman BT, Knutson S, Gallina M, Speir JA, Fetrow JS, Baxter SM: Prediction of deleterious functional effects of amino acid mutations using a library of structure-based function descriptors. Proteins. 2003, 53: 806-816. 10.1002/prot.10458.

Pal D, Eisenberg D: Inference of protein function from protein structure. Structure. 2005, 13: 121-130. 10.1016/j.str.2004.10.015.

Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32: D449-D451. 10.1093/nar/gkh086.

Friedberg I, Harder T, Godzik A: JAFA: a protein function annotation meta-server. Nucleic Acids Res. 2006, 34: W379-W381. 10.1093/nar/gkl045.

von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417: 399-403. 10.1038/nature750.

Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stümpflen V, Mewes HW, Ruepp A, Frishman D: The MIPS mammalian protein-protein interaction database. Bioinformatics. 2005, 21: 832-834. 10.1093/bioinformatics/bti115.

Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein-interaction networks?. Genome Biol. 2006, 7: 120-10.1186/gb-2006-7-11-120.

Aloy P, Russell RB: Ten thousand interactions for the molecular biologist. Nat Biotechnol. 2004, 22: 1317-1321. 10.1038/nbt1018.

Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics. 2005, 21: 410-412. 10.1093/bioinformatics/bti011.

Stein A, Russell RB, Aloy P: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res. 2005, 33: D413-D417. 10.1093/nar/gki037.

Riley R, Lee C, Sabatti C, Eisenberg D: Inferring protein domain interactions from databases of interacting proteins. Genome Biol. 2005, 6: R89-10.1186/gb-2005-6-10-r89.

Jothi R, Cherukuri PF, Tasneem A, Przytycka TM: Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol. 2006, 362: 861-875. 10.1016/j.jmb.2006.07.072.

Pagel P, Wong P, Frishman D: A domain interaction map based on phylogenetic profiling. J Mol Biol. 2004, 344: 1331-1346. 10.1016/j.jmb.2004.10.019.

Pagel P, Oesterheld M, Tovstukhina O, Strack N, Stumpflen V, Frishman D: DIMA 2.0--predicted and known domain interactions. Nucleic Acids Res. 2008, D651-D655. 36 Database

Raghavachari B, Tasneem A, Przytycka TM, Jothi R: DOMINE: a database of protein domain interactions. Nucleic Acids Res. 2008, D656-D661. 36 Database

Moult J, Pedersen JT, Judson R, Fidelis K: A large-scale experiment to assess protein structure prediction methods. Proteins. 1995, 23: ii-v. 10.1002/prot.340230303.

Soro S, Tramontano A: The prediction of protein function at CASP6. Proteins. 2005, 61 (Suppl 7): 201-213. 10.1002/prot.20738.

Pellegrini-Calace M, Soro S, Tramontano A: Revisiting the prediction of protein function at CASP6. FEBS J. 2006, 273: 2977-2983. 10.1111/j.1742-4658.2006.05309.x.


6 Definitions

The following definitions should aid the user in understanding what is being named, and in understanding the principles underlying these guidelines.

6.1 Gene

A gene is a functional unit, usually encoding a protein or RNA, whose inheritance can be followed experimentally. Inheritance is usually assayed in genetic crosses, but identification of the gene in cytogenetic or physical maps are other means of mapping the locus of a gene. The existence of a gene can also be inferred in the absence of any genetic or physical map information, such as from a cDNA sequence.

6.2 Pseudogene

A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their "normal" paralog (in which case the pseudogene typically lacks introns and includes a poly(A) tail often called processed pseudogenes) or from recombination (in which case the pseudogene is typically a tandem duplication of its "normal" paralog). Note that in mouse and rat, a gene may be a pseudogene in one inbred strain but not another.

6.3 Locus

A locus is a point in the genome, identified by a marker, which can be mapped by some means. It does not necessarily correspond to a gene it could, for example, be an anonymous non-coding DNA segment or a cytogenetic feature. A single gene may have several loci within it (each defined by different markers) and these markers may be separated in genetic or physical mapping experiments. In such cases, it is useful to define these different loci, but normally the gene name should be used to designate the gene itself, as this usually will convey the most information.

6.4 Marker

A marker is the means by which a gene or a locus is identified. The marker is dependent on an assay, which could, for example, be identification of a mutant phenotype or presence of an enzyme activity, protein band, or DNA fragment. The assay must show genetic variation of the marker to map the locus on a genetic map but not to place it on a physical map.

6.5 Allele

The two copies of an autosomal gene or locus on the maternal and paternal chromosomes are alleles. If the two alleles are identical, the animal is homozygous at that locus. When genetically inherited variants of a gene or locus are detectable by any means, the different alleles enable genetic mapping. A single chromosome can only carry a single allele and, except in cases of duplication, deletion or trisomy, an animal carries two autosomal alleles. In particular, a transgene inserted randomly in the genome is not an allele of the endogenous locus the condition is termed hemizygous if the transgene is present only in one of the two parental chromosome sets. By contrast, a gene modified by targeting at the endogenous locus is an allele and should be named as such.

6.6 Allelic Variant

Allelic variants are differences between alleles, detectable by any assay. For example, differences in anonymous DNA sequences can be detected as simple sequence length polymorphism (SSLP) or single nucleotide polymorphisms (SNPs). Other types of variants include differences in protein molecular weight or charge, differences in enzyme activity, or differences in single-stranded conformation (SSCP). Many allelic variants, in particular DNA variants, do not confer any external phenotype on the animal. These variants are often termed “polymorphisms” although, strictly speaking, that term applies only to variants with a frequency of more than 1% in the population.

6.7 Splice Variant or Alternative Splice

Alternative splicing of a gene results in different, normally occurring forms of mRNA defined by which exons (or parts of exons) are used. Thus one or more alternative protein products can be produced by a single allele of a gene. Among different alleles, alternative splice forms may or may not differ, depending on whether the sequence difference between the alleles affects the normal splicing mechanism and results in differences in the exon (or partial exon) usage. For example, allele A may produce mRNAs of splice form 1, 2, and 3 while allele B may produce mRNAs of splice form 1, 2, and 4 and Allele C may produce mRNAs of splice form 1, 2, and 3. In this case, each of the alleles A, B, and C by definition must differ in their DNA sequence. However, the difference between allele B versus alleles A and C must include a sequence difference that affects the splicing pattern of the gene.

6.8 Mutation

A mutation is a particular class of variant allele that usually confers a phenotypically identifiable difference to a reference "wild type" phenotype. However, in some cases, such as when homologous recombination is used to target a gene, a readily identified phenotype may not result even though the gene may be rendered non-functional. In such cases, the targeted genes are nevertheless referred to as mutant alleles.

6.9 Dominant and Recessive

Dominant and recessive refer to the nature of inheritance of phenotypes, not to genes, alleles, or mutations. A recessive phenotype is one that is only detected when both alleles have a particular variant or mutation. A dominant phenotype is detectable when only one variant allele is present. If both alleles can be simultaneously detected by an assay, then they are codominant. For example, an assay that detects variation of DNA or protein will almost invariably detect codominant inheritance, as both alleles are detected. If a mutation produces a phenotype in the heterozygote that is intermediate between the homozygous normal and mutant, the phenotype is referred to as semidominant. A single mutation may confer both a dominant and a recessive phenotype. For example, the mouse patch (Ph) mutation has a heterozygous (dominant) pigmentation phenotype but also a homozygous (recessive) lethal phenotype. As the terms are applied to phenotypes not to genes or alleles, then in the case where a gene has multiple mutant alleles, each can confer a phenotype that is dominant to some, but recessive to other, phenotypes due to other alleles.

Penetrance is a quantitative measure of how often the phenotype occurs in a population and expressivity is a measure of how strongly a phenotype is expressed in an individual. Particularly in segregating crosses, or where there is a threshold effect on phenotypic manifestation, these measures provide additional ways to describe how particular allelic combinations result in a phenotype.

6.10 Genotype

Genotype is the description of the genetic composition of the animals, usually in terms of particular alleles at particular loci. It may refer to single genes or loci or to many. Genotype can only be determined by assaying phenotype, including test mating to reveal carriers of recessive mutations. Strictly speaking, even direct determination of DNA variants is assaying phenotype not genotype as it is dependent on a particular assay, although it is so close to genotype that it serves as a surrogate.

6.11 Phenotype

Phenotype is the result of interaction between genotype and the environment and can be determined by any assay.

6.12 Quantitative Trait Loci (QTLs)

Quantitative Trait Loci (QTL) are polymorphic loci that contain alleles, which differentially affect the expression of continuously distributed phenotypic traits. Usually these are markers described by statistical association to quantitative variation in the particular phenotypic traits that are controlled by the cumulative action of alleles at multiple loci.

6.13 Haplotype

A haplotype is the association of genetically linked alleles, as found in a gamete. They may be a combination of any type of markers, and may extend over large, genetically separable distances, or be within a short distance such as within a gene and not normally separated.

6.14 Homolog

Genes are homologous if they recognizably have evolved from a common ancestor. Note that genes are either homologous or not there are no degrees of homology! For example, all globin genes, and myoglobin, are homologs, even though some are more closely related to each other than others. When a measure of relatedness between sequences is required, percent identity or similarity should be used.

6.15 Ortholog

Genes in different species are orthologs if they have evolved from a single common ancestral gene. For example, the beta globin genes of mouse, rat and human are orthologs. Note that several genes in the mouse or rat may have a single ortholog in another species and vice versa.

6.16 Paralog

Paralogous genes are genes within the same species that have arisen from a common ancestor by duplication and subsequent divergence. For example, the mouse alpha globin and beta globin genes are paralogs.


Structure of the APOB Gene/APOB mRNA Editing

In adult humans, apoB-100 is produced in the liver (4536 amino acids 100%) and apoB-48 in the intestine (2152 amino acids 48%) via a unique mRNA editing process ( 1). The 43-kb APOB gene is located on the short arm of human chromosome 2. It consists of 29 exons, of which exon 26, at 7572 bp, is the largest and encodes more than one half of the full-length protein. A novel posttranscriptional editing process occurring on the 14.5-kb mRNA leads to apoB-48 production. In brief, a RNA-specific cytidine deaminase, apoB mRNA editing enzyme catalytic complex 1 (apobec-1), binds to and acts on the cytosine molecule at base 6666 of the mRNA to create a uracil, which leads to the glutamine 2153 (CAA) codon being converted to a termination (UAA) codon ( 2). Apobec-1 is produced only in intestinal epithelial cells in adult humans. It is the editing process that determines the apoB isoform to be synthesized, which in turn dictates the type of lipoprotein, either chylomicron or VLDL particle, to be assembled.


Figure 1

Figure 1. Schematic representations of site-scanning saturation mutagenesis (SSSM), multi site directed mutagenesis (MSDM), and gene synthesis by PCR-based accurate synthesis (PAS) workflows. (A, left to right) SSSM workflow: User’s input includes plasmid sequence (blue) with gene of interest, a list of amino acid positions to be changed (red squares, only 3 sites are labeled for simplicity), forward and reverse flanking primers (Ffwd and Frev, purple arrows), degenerate codon (marked as “X”), and a list of desired parameters (Tm, GC content, flanking primers, etc., not shown). Mutation Maker outputs pairs of forward and reverse overlapping mutagenic primers (yellow) for multiple parallel PCR reactions in multiple 96-well plates (green arrow). To create N- and C- gene fragments, PCR reactions are first conducted separately and in parallel for each site (top and bottom rows, respectively) using mutagenic (yellow with red squares) and flanking primer pairs (purple) and a gene template (blue). The overlapping N- and C- fragments are then stitched together using flanking primers (purple) via a second PCR reaction (SOEing, Splicing by Overlap Extension) that produce full-length gene variants. These are subsequently pooled to create a gene library, where each variant carries a single random amino acid substitution at a specified position. Thereafter, the library is cloned into an expression vector (blue circles). (B, left to right) MSDM workflow: User input includes gene of interest and a list of target residues and mutations (red squares, only 3 mutations are shown for simplicity) and a list of parameters (Tm, GC, host organism, degeneracy, etc., not shown). Mutation Maker outputs a set of unidirectional mutagenic primers (yellow with red squares) as well as primers carrying parental codons (light yellow squares on primers) that cover one or more target sites. The resultant primers can be used in MSDM experiments using the QuikChange lightning multi site-directed mutagenesis kit (Agilent) to yield diverse combinatorial library (blue circles). (C, left to right) De novo gene synthesis workflow using the PCR-based Accurate Synthesis (PAS) protocol. The input to mutation maker consists of gene of interest (blue line) with a list of target residues and mutations (red squares, as indicated) as well as and a list of constraints (Tm, GC, host organism, motifs to exclude, degeneracy, etc., not shown). Mutation Maker outputs a set of overlapping oligos (yellow) that may carry mutation and parental codon combinations (red and light-yellow squares, respectively) in their nonoverlapping portions. These oligos can be assembled to full-length gene variants (blue lines), each carrying different mutation combinations that later can be cloned into an expression vector (blue circles).