Hi, my name is Michael Cortes. I'm currently a PhD student
at Stony Brook University studying Applied Mathematics and
Statistics. All in all, I am a mathematical / computational / machine learning modeler
who has fun building models of just about anything! Hopefully this site will make it
easier for you to learn a little bit more about me.
When lambda phage viruses infect their host E. coli, the outcome of infection is either lysis (active viral growth/replication) or lysogeny (dormancy). However, the virus appears to choose either fate with a probability of 20-80% under apparently similar sets of conditions. Why is that?
Experiments have revealed important key viral genes and proteins involved in the decision (Review Paper (Oppenhiem, Annual Reviews 2005)). This figure shows many of the important ones and their regulatory interactions. Somehow, this network is able to sense external signals and adjust the abundances of specific key proteins. By doing so, the decision to lyse or lysogenize becomes more or less probable.
After simplifying the detailed model in the above figure, we can approximate the dynamics of two key proteins CII (c) and Q (q) during the decision-making period of the infection. This mathematical model treats the outcome of infection as the result of a race between CII and Q. In my recent publication, we show that this model reproduces experimental data. Below, the left Figure shows how the levels of CII and Q depend on the timing of delayed infections. On the right, we show the rescaled differential equations the results are based on.
The prior mathematical model treated the reaction mixture as being well-mixed, and the concentrations as being continuous and fluid-like. We know in reality these models approximate the true discrete nature of molecule copy counts, and the fact that the intracellular environment may not be well-mixed. To see how things may depend on the discrete molecule copy count and spatial diffusion (Brownian motion) we developed a discrete, stochastic, agent-based simulation. Every distinctly colored dot is a different viral or host cell protein, and the orange circle is the viral DNA. Eventually, the replication complex (big-black dot) forms on the DNA molecule, and leads to very fast replication. Eventually, so much protein and DNA is produced that the cell will explode.
In the wild, phages infect cells at the population level. When multiple phages infect a given cell, the probability they will lyse (actively replicate and kill the host) or lysogenize (lie dormant) depends principally on the number of infections also called the multiplicity of infection or MOI. How does the precise MOI dependence affect the propagation of the phage? To address this, we developed agent-based, stochastic, simulations that model phages attacking cells at the population level. In this simulation, the red dots are phages, the green dots are uninfected cells, the cyan dots are infected cells that haven't yet decided their fate, and the blue cells are lysogenized cells. Overall, the simulation suggests the MOI-dependency may have evolved so that it maximizes the number of lysogens produced, possibly because they are more equipped to survive in poor nutrient conditions.
In biology, we often see a diversification of phenotypic traits in a given ecological environment. For example, foxes hunt and kill rabbits, but rabbits are very nible, fast, and reproduce themselves quickly. Its natural to wonder which is better: to be a more efficient predator which replicates more slowly, or to be a fast replicator that is vulnerable to predation? In this simulation, different species (marked by their color) try to take over the entire region by either growing quickly, killing neighboring competitors, or some balance between these two traits. Colors closer to blue are slow growers, but efficient killers, whereas colors closer to red are fast growers but non-aggressive killers. Orange colors are in between. Mutations randomly occur in the population. Interestingly, the outcome of the competition appears to converge to a state where there are 2 or 3 species that are slightly on the fast replicator side, or the efficient killer side. Overall, this simulation shows how natural selection can maintain the diversification of species.
Neural networks are fairly sophisticated and quite remarkable in their ability to classify data and make predictions after being trained. So, what kinds of things can they be used for?
There are several factors which are present at the moment of infection which could potentially
contribute to the decision to lyse or lysogenize. These include the number of infecting phages,
the host's cell volume, the timing of infections, host cell physiology (growth), and even temperature.
Using data collected from experiments (Lanying et al, Cell 2010)
we can create and train a neural network to map the initial state of the infection to the output decision of lysis or lysogeny.
In this case, the initial state of the infection is transformed into a 4 dimensional vector of predictive "features". These
features include: MOI, cell length, cell width, and location of infections. Thus, we create a neural network that has 4 neurons in the input layer,
6 neurons in the hidden layer, and 1 neuron in the output layer. The output layer is a number between 0 and 1,
with 1 corresponding to lysogeny and 0 corresponding to lysis.
In the figure below, we have sampled stochastic trajectories for the detailed model of the phage lambda gene regulatory network. The data is noisy
and appears complex. Can we train a neural network to "look" at these plots and determine if the outcome of these simulations are lysogeny or lysis?
To see if this is indeed the case, we train a neural network on the raw stochastic trajectory data from the detailed stochastic simulations (above). The input to the network is a 60-dimensional feature vector such that the t-th component is the concentration of a key protein at minute t. The topology of this neural network is 60 neurons in the input layer, 60 in the hidden layer, and 1 in the output layer. Prediction accuracy is approximately 100% on new data, which means it works!
Can the traditional problem of determining protein secondary structure (folded structure) from the primary structure (sequence of amino acids) be attacked using neural networks? To address this, we analyzed approximately 200,000 sequences culled from the Protein Data Bank (PDB). After analyzing the sequences, we learned that the important predictive features of these sequences are the inter-residue distances (see figure below, each letter is a type of amino acid residue).
For each sequence, we extracted all the inter-residue distances between all pairs of residues in the given sequence. This was then transformed into a
2000-dimensional vector, which served as the input to a neural network having 2000 neurons in the input layer, 2000 neurons in the hidden layer, and 3
neurons in the output layer. The output layer had 3 neurons for predicting 3 classes of secondary structure (alpha helix, beta strand, neither).
Training this neural network on 5% of the data culled from the PDB gave a prediction accuracy greater than 98% on the remaining 95% of the data, wow!
Neural networks are almost like magic. They can learn some very complicated things. Could they learn a complicated mathematical function?
To test this possibility, we can build a neural network to "learn" the function: