Two years ago, I interviewed Alex Kostic, who was then a postdoctoral fellow at the Broad Institute of MIT and Harvard exploring the microbiome’s connection to type 1 diabetes. I spoke with Kostic again in his lab late last year. Toward the end of the conversation, we were joined by Jacob Luber, a PhD student in the Bioinformatics and Integrative Genomics Program at Harvard Medical School who pairs computational analysis with new sequencing technologies to study the microbiome. He also has type 1, so the work of the Kostic lab is of both personal and professional interest.
For part 1 of this interview, click here.
You’ve talked about the huge amount of data this field (and other fields) are generating right now: that it’s cheap to sequence DNA, but too expensive (and storage-intensive) to store and analyze the data. How is your lab approaching this problem?
Jacob Luber: I’m taking a lot of the data from lab experiments and human cohorts and trying to build models that predict the relations of certain genes in the microbiome with the onset of type 1 diabetes and other autoimmune diseases. So how we’re doing that is we’ve sequenced a massive number of individuals when they’re infants, then we built a non-redundant gene catalogue of essentially every gene that appears through time in the microbiome. This is a massive amount of genes. We have 9.8 million genes, and we have to run statistical analyses to test all 9.8 million of those genes. We found that we had terabytes of data [a terabyte is equal to a thousand gigabytes].
When we submitted it to Harvard’s supercomputer we found out it would take months of time waiting in line; when you submit a job every lab has a priority, and you get your allocation of resources. So we had to develop some new technologies to make this work.
So essentially we turned to Amazon, which has built huge data centers, massively bigger than anyone else’s in the world. So businesses like Netflix will pay Amazon to use that computing capacity. Essentially, Amazon will have a guaranteed price for computing, and then the on demand price can fluctuate by an order of magnitude. We wrote an algorithm that uses a lot of linear algebra to look at pricing history and computing needs, to come up with a strategy to reduce costs. And we were able to reduce our supercomputing costs by almost an order of magnitude.
Alex Kostic: It’s meant that we can do our analysis at a scale and a pace that we would never be able to do if we were reliant on the Harvard infrastructure. We are reanalyzing the same data sets as before, but we’re going into much more resolution, because before when we analyzed the data sets we were just analyzing the taxonomies, the names or the organisms. We never went down to the gene level. But using this cloud infrastructure and using the tools that Jacob’s built, now we are able to actually go down to the gene level.
JL: I think if you look at the academic supercomputing world, Harvard is definitely not a light hitter; they have substantial resources. But the scope of this analysis and the complexity of the problem is so great that it truly needs really novel solutions to these problems.
In terms of the scope of the analysis we are able to do using Amazon’s computing capacity, previously researchers were only able to look at a sample from an infant who was either on the progression to not get type 1 diabetes or was on a progression to seroconvert and to say: at this point in time we had different microbes with different abundance levels. But the genomes are changing so rapidly as the microbiome changes in infants that this analysis really doesn’t tell the whole story. That is why we’re looking at the gene in the microbe: we actually took specific strains out of the picture so now for each of these 9.8 million genes we’re asking the question of how does gene x or gene y progress over time in the microbiome, and how could that possibly relate to the onset of autoimmunity.
So that’s much more specific than studying the strains.
JL: Yes, so each strain can have a couple thousand genes, maybe five or six thousand genes.
As someone who has type 1, what developments are you most excited about in terms of the microbiome?
JL: I was diagnosed with diabetes in June of 1995, so a long time ago. Every time you go to a doctor’s appointment they say, Oh, we’re going to find the cure in five years. And it’s never been true. So I think that if we discover that the microbiome influences the onset of diabetes and plays a role in how you metabolize carbohydrates, if we can find a way to perturb the microbiome with synthetic biology, we could maybe not reach a cure but have different treatment options that are way better. Perhaps methods where you could change how you respond to carbohydrates, so you could potentially take fewer insulin shots a day and have better control.
AK: the most exciting thing about it is how dynamic and malleable the system is compared to the human genome. You can swap out these bacteria and essentially the genes that they are expressing, and you can do it in very well-designed methods, using synthetic biology and natural probiotics. So this is very much the kind of thing that gets us excited.
JL: I think that there are also huge gains in the basic science. For years people have tried to understand the genetics of type 1 diabetes, which is not heritable in an obvious way. Maybe there is an element of why you get diabetes that we’ll be able to elucidate from the microbiome, which Alex has already started to do. So I think in ten years we might be asking, What can we do, how can we supplement the microbiome in infancy in such a way that they’ll never get diabetes in the first place?