The other night, I attended a panel discussion on Life Science Analytics. Three local professionals– Paul Rejto, the head of computational biology in Oncology Research at Pfizer; Lucila Ohno-Machado, the chief of the UC San Diego Division of Biomedical Informatics; and Sandor Szalma, senior research fellow at Centocor and adjunct professor at Rutgers University– introduced and took questions on their roles in the budding field of Life Science Analytics.
In the simplest terms, Life Science Analytics refers to the field of study that marries computational analysis with biology and life science. At this particular panel, the speakers addressed several aspects of Life Science Analytics, including the study of bioinformatics. Bioinformatics attempts to improve our knowledge of biological systems like genes, cell structure, protein folding, and so on using the the tools of computational analysis. In other words, bioinformatics addresses situations like: “We have sequenced the genomes of ten patients with breast cancer; a zillion lines of As, Cs, Gs, and Ts, representing DNA– does something in all this data stand out as different from people without breast cancer?”
I like bioinformatics, conceptually– not only because I like data analysis, but because much of the work being done in diabetes research right now is aided by bioinformatics and genomic analysis, especially when it comes to drug discovery and hormonal treatments.
Based on what the speakers were saying, I see that there are a number of outstanding challenges in bioinformatics, and therefore in the ability of scientists to do meaningful bioinformatic research towards diabetes treatments and cures. Two of the big obstacles:
- As in many industries right now, there is the perplexing problem of having too much information; relatively cheaply, you can sequence a person’s entire genome– which means potentially billions of base-pairs of genomic details, and tons of data, per patient sample. Trying to process and analyze such large chunks of data, and then comparing across populations and other data sets, is an enormous task. Notably, this problem is not unique to bioinformatics– data mining and classification is a rapidly growing industry, as many different companies and domains are faced with large data sets that need analysis and normalization. (For example, Google and similar companies are trying to categorize and evaluate… all of the text on the internet.)
- A strange facet of the problem of too much data is that the data that does exist is often incomplete in important ways. More precisely, it’s nice to have a person’s genome, but the DNA sequences are much less meaningful at this point unless analysts also have a catalogue of every disease, ailment and characteristic of the person to associate with the gene sequence being looked at. In other words, if you have ten samples from patients with Parkinson’s and ten samples from patients without, you can start to make meaningful comparisons. However, if you just have twenty samples with no idea of what you’re looking for, you just have lots of data.
There are two avenues of approaching these challenges. The first is improving the nature of the analysis itself– writing more clever methods of comparing and tagging ever larger sets of information, This will come with time, as the technology and knowledge-base of bioinformatics grows.
The second avenue is improving the quality of the data. And here’s where we come in. In theory at least, it could be extremely useful to have the genetic sequences of lots of Type 1 diabetics. Scientists and bioinformaticists could find key differences between the diabetic genome and the non-diabetic genome that lead to new treatments and insights.
But this brings us to the third major challenge facing the field of bioinformatics: privacy. What happens if I get my genome, or part of my genome, sequenced, and give the data to an academic or private institution doing research. They publish study results, including some portion of my sequenced DNA.
Hey, wait a minute. That’s me out there, in a unique and identifiable format. Labeled as a diabetic at first, with the potential for getting all sorts of new labels as the accuracy of genomic analysis improves: this gene implies I’m more likely to get thyroid cancer; that one means I might be bipolar. Oh, and see here, this one means my earlobes are the attached kind, not the kind that hang loose.
Besides the base-level self-consciousness issue– I’d rather not be so exposed in a public setting– there is also the ever-present insurance issue: ten years down the line, will I be denied coverage because of some foreboding gene found in my data set? Or, worse still, will my children be denied coverage because they may have inherited some gene that I have?
And so there is a dilemma: am I willing to advance science, potentially to the point of curing diabetes. if it means I have to be genetically naked in public?
Of course, this is currently a purely theoretical question; no one has asked for my genome yet. But if medicine continues the way it’s going, the question may not be theoretical for long. So am I willing? Depends on the context– if I met the researchers, and knew what was being researched, and could see the value of adding my personal data to the project, then yes, certainly. I want diabetes cured, and I am willing to sacrifice elements of my privacy to that end. But, if it were an anonymous or open-ended project– as in, “Here, sequence your genome, we’ll file it away and use it to compare demographic information and for all sorts of other things you may not hear about,”– then no, I’d rather not. Sorry, but the risk to my personhood and integrity is too high given the great unknown of what will be possible in the future.
But that’s just me. What about you– diabetic or not– would you sacrifice your privacy to medicine?