“Most startup companies fail. This is not due to a mistake in the valuation of the founding concept or due to the competency of the workers hired to turn that concept into reality. Rather it is due to a failure in management.” So said the founder of a start up company to my wife who worked there for a time. When the venture capital people put in bad management, he left as did my wife eventually.
If we think of ourselves as start up companies where our components (our workers) are the genes with which we are born, the numbers are vastly different but the forces at play are the same. By and large, most of us are successful. We live long lives as compared to our ancestors. We get more than enough to eat, we pass on our genes for better or for worse, and we get to play far more than was ever possible. When we fail, it is sometimes due to a failure of a component part (think cystic fibrosis for example) but far more often it is a failure of management.
Now I bet you are thinking I am about to go on a rant about lifestyle. I could, I suppose, but actually the motivation for today’s post is a recent paper in the Proceedings of the National Academy of Sciences (PNAS) about genome wide association studies looking for sequences associated with type 2 diabetes. Notice I didn’t use the word “gene” in the last sentence. This was deliberate. In actuality, this research examined sequences that were intragenic. They exist in that mysterious 99% of our genome that does not code for genes. Does it have a function and if so what? We believe that much of it does have a function and that function is, in a word, management.
The research I am about to describe involved a genomic entity called a SNP and this needs some explanation. SNP stands for single nucleotide polymorphism and essentially it is a 1 base pair change in the sequence of DNA. Another more common word for this is mutation. From the early 1900s onward, biomedical researchers have been convinced that genetics would be the key to unraveling human disease. Indeed the success of genetics has been astounding. Initial work involved things that segregated with a simple Mendelian pattern (i.e. traits that involved a single gene). Diseases like cystic fibrosis and others were mapped and the dysfunctional genes discovered in this fashion. However, many diseases (such as type 1 and type 2 diabetes) were not so easy to analyze.
Let’s go over some nomenclature so as not to get confused. For all chromosomes except the X and Y chromosomes there are 2 copies in all normal people. These chromosomes (very long DNA molecules) encode around 20,000 genes; blueprints for proteins. Each gene can have many variants. These arose by mutation and account for the vast differences in a biological population. We call each of these variants an allele. So, each of us can have up to 2 alleles and occasionally we might have 2 copies of the same allele for some gene or other. We refer to a location on the genome as a locus and say that a person is heterozygous at such and such locus; meaning that they have 2 different alleles. Conversely, if they have 2 copies of the same allele we would say that they are homozygous at that locus. Since families obviously share a smaller pool of genes than the species as a whole, we might find it useful to examine the alleles in a family that is prone to some disease in order to better understand how that disease may be affected by those genes. As I mentioned above, many human diseases appear to involve more than one gene. We can begin to get a handle on this by analyzing how often two alleles of separate genes might be found together in an individual that has the disease being studied. If we find that the 2 alleles are found together far more often that would be the case due to random chance we say that the 2 alleles are in linkage disequilibrium. In other words, when these 2 particular genes get together (sort of like two otherwise good teenagers) bad things tend to happen. Geneticists nowadays, take blood samples from a family and then analyze the genome of each family member for markers that span the genome. These markers tell the researcher what alleles are present. It took thousands of years of work (spread out over hundreds of genetics labs) to establish markers that are useful for this enterprise.
As scientists began to analyze the genome at the sequence level (this was long before we had the entire genome sequenced) it was noticed that there were regions that had lots of repetitive sequence. The repetitive sequence is usually from 2 – 6 base pairs in length. It can be repeated tens and sometimes hundreds of times. Interestingly, the number of repeats can vary from person to person. We call these repeats microsatellite markers or sometimes simple sequence repeats (SSRs). We have no idea as to their function. When children were examined, it was found that one could identify the regions of DNA that were inherited from the mother and the father based on the microsatellite lengths found. These were used extensively but they had a problem: they were spaced too far apart. The minimum distance between 2 of these markers was a million base pairs. Several genes could be present within the space between 2 of these marks so the resolution of inheritance, using this technique was rather low. By the late twentieth century, a new kind of mark had become predominant; the SNP. SNPs are found on the order of 1 every 100 – 300 base pairs. Thus each gene contains numerous SNPs. I just did a search of the National Center for Biotechnology Information (NCBI) SNP database and found over 13 million entries within the human genome. More are being submitted each day. That is a LOT of data to crunch. With the rise in the use of computers to store and analyze huge amounts of data we can now begin to look for patterns in SNPs among populations who are suffering from a particular disease: such as type 2 diabetes. Furthermore, several companies have developed technologies to allow the simultaneous measure of thousands of SNPs at once.
This brings us back to our featured article. The authors begin by noting a number of seminal studies that identified 3 groups of SNPs that segregated in type 2 diabetic as well as obese populations done around 2007. One fascinating observation that came out of these reports was that almost all of these SNPs were found in non-coding regions of the genome. In other words, they did not have anything to do with some sort of poorly functioning gene. Now this group has been working on an idea – namely that there exist mysterious regions of the genome that somehow control large regions consisting of many genes. They have referred to these regions as genomic regulatory blocks (GRBs). They have hypothesized that these regions play a role in determining the timing and strength of expression of genes that control development. These genes would be transcription factors (of which I have written before) that sit on the regulatory sequence of genes and nucleate the formation of (or inhibit the formation of) the transcription complex thus determining which genes get made and when they get made. How these GRBs might do this is unknown. In this paper they extended their hypothesis to consider the possibility that many of the risk factors that have been identified for complex human diseases (such as type 2 diabetes) do not actually involve the nearest gene but instead involve some subtle (and as yet unidentified) change in how the GRB functions – which would mean that they worked through transcription factors within each of the GRB locus positions. In turn these transcription factors would affect many downstream genes to ultimately set things up for increased disease risk. In effect, disease is caused by a failure of management.
So, their goal was to see if they could indentify GRBs for each of the 3 groups of SNPs, identified in the 2007 studies that were in linkage disequilibrium and then find transcription factors that would play some global role suitable for altering the risk for type 2 diabetes. Indeed they did find that in each case the SNPs fell within or partially within a different GRB region. This was done by computer (bioinformatics analysis). One block of SNPs fell within a region than contained 3 genes: HHEX, KIF1, and IDE. Using their concept of GRB regulation they looked at the genomes of lots of different species and concluded that only HHEX, a transcription factor that is involved in the development of the pancreas, was part of this linkage disequilibrium when considered in this new way. The fact that it regulated pancreatic development (the source of insulin) made it an attractive candidate gene for type 2 diabetes risk. So far so good. A second block of SNPs was located in the CDKAL1 locus. Unfortunately, unlike HHEX, CDKAL1 was not a transcription factor and had nothing to do with anything that seemed to relate to diabetes. However, again by using their GRB concept they were able to link CDKAL1 to another gene: the transcription factor SOX4. Now SOX4 has been associated with both pancreatic development and with insulin secretion so again they found a good candidate gene. The third group of SNPs that was in linkage disequilibrium with type 2 diabetes was within the FTO locus. Unfortunately the transcription factor associated with this locus, IRX3 had no literature that associated it with pancreatic development or any other aspect of type 2 diabetes.
Undaunted, the researchers went to the bench and started doing experiments. Their favorite model organism is the zebra fish. There are a variety of technical reasons why zebra fish are really good genetic model organisms and perhaps some other time I’ll talk about it. Using some fancy molecular biology they actually uncovered a relationship between IRX3 and another transcription factor known to affect pancreatic development:Nkx2.2. It turned out that IRX3 was needed for full Nkx2.2 function. So…3 groups of SNPs associated with type 2 diabetes were associated with 3 transcription factors that played a role in pancreatic development. The important point here is that none of these genes was damaged or mutated in any way. The mutations were in regions that we assume somehow physically manage the timing and intensity of expression of these critical developmental regulators. How these regions work is anyone’s guess.
The authors state up front that they have not proved anything. Rather, they have presented an intriguing new relationship between genetic markers for disease and what these markers are marking. Numerous experiments need to be done manipulating these regions by directed mutagenesis (probably in mice) and examining the effects on pancreatic development, metabolism (especially insulin release), and diabetes susceptibility.
By this point (and congratulations for making it this far) you are probably wondering what good this is going to do for anyone. The answer is personal medicine. We are not simply our genes. It is absolutely clear to us now that all sorts of environmental factors play subtle roles in the shaping of our development. We are just beginning to get a glimmer of the majestic dance that is played between nature and nurture. If we can understand it, perhaps we can guide it. For example, if we know that certain SNP combinations create an increased risk of obesity, perhaps we might find some new diet or therapy administered at some critical juncture that guides the patient safely past that decision point. Of course this is the medicine of the future. It will be important to avoid the sort of “Brave New World” envisioned by Aldus Huxley but I have hope that we can use this sort of power for empowerment and not for enslavement. For the present, my consciousness has enough problems making decent decisions. I just hope my genome can fend for itself.