Welcome!

Hi, my name is Michael Cortes. I'm currently a PhD student at Stony Brook University studying Applied Mathematics and Statistics. All in all, I am a mathematical / computational / machine learning modeler who has fun building models of just about anything! Hopefully this site will make it easier for you to learn a little bit more about me.



Below you'll find some models I've built to address some interesting biology-related questions.


Viruses can't make up their minds

When lambda phage viruses infect their host E. coli, the outcome of infection is either lysis (active viral growth/replication) or lysogeny (dormancy). However, the virus appears to choose either fate with a probability of 20-80% under apparently similar sets of conditions. Why is that?


Biochemical reaction network

Experiments have revealed important key viral genes and proteins involved in the decision (Review Paper (Oppenhiem, Annual Reviews 2005)). This figure shows many of the important ones and their regulatory interactions. Somehow, this network is able to sense external signals and adjust the abundances of specific key proteins. By doing so, the decision to lyse or lysogenize becomes more or less probable.



Mathematical simulations of viral protein dynamics

After simplifying the detailed model in the above figure, we can approximate the dynamics of two key proteins CII (c) and Q (q) during the decision-making period of the infection. This mathematical model treats the outcome of infection as the result of a race between CII and Q. In my recent publication, we show that this model reproduces experimental data. Below, the left Figure shows how the levels of CII and Q depend on the timing of delayed infections. On the right, we show the rescaled differential equations the results are based on.



Agent-based model of gene expression dynamics

The prior mathematical model treated the reaction mixture as being well-mixed, and the concentrations as being continuous and fluid-like. We know in reality these models approximate the true discrete nature of molecule copy counts, and the fact that the intracellular environment may not be well-mixed. To see how things may depend on the discrete molecule copy count and spatial diffusion (Brownian motion) we developed a discrete, stochastic, agent-based simulation. Every distinctly colored dot is a different viral or host cell protein, and the orange circle is the viral DNA. Eventually, the replication complex (big-black dot) forms on the DNA molecule, and leads to very fast replication. Eventually, so much protein and DNA is produced that the cell will explode.



Host cells being attacked by viruses which can develop lytically or lysogenically

In the wild, phages infect cells at the population level. When multiple phages infect a given cell, the probability they will lyse (actively replicate and kill the host) or lysogenize (lie dormant) depends principally on the number of infections also called the multiplicity of infection or MOI. How does the precise MOI dependence affect the propagation of the phage? To address this, we developed agent-based, stochastic, simulations that model phages attacking cells at the population level. In this simulation, the red dots are phages, the green dots are uninfected cells, the cyan dots are infected cells that haven't yet decided their fate, and the blue cells are lysogenized cells. Overall, the simulation suggests the MOI-dependency may have evolved so that it maximizes the number of lysogens produced, possibly because they are more equipped to survive in poor nutrient conditions.



Evolution under trade-off between fast replication and efficient predation

In biology, we often see a diversification of phenotypic traits in a given ecological environment. For example, foxes hunt and kill rabbits, but rabbits are very nible, fast, and reproduce themselves quickly. Its natural to wonder which is better: to be a more efficient predator which replicates more slowly, or to be a fast replicator that is vulnerable to predation? In this simulation, different species (marked by their color) try to take over the entire region by either growing quickly, killing neighboring competitors, or some balance between these two traits. Colors closer to blue are slow growers, but efficient killers, whereas colors closer to red are fast growers but non-aggressive killers. Orange colors are in between. Mutations randomly occur in the population. Interestingly, the outcome of the competition appears to converge to a state where there are 2 or 3 species that are slightly on the fast replicator side, or the efficient killer side. Overall, this simulation shows how natural selection can maintain the diversification of species.



Using neural networks to make predictions

Neural networks are fairly sophisticated and quite remarkable in their ability to classify data and make predictions after being trained. So, what kinds of things can they be used for?


Predicting cell-fate choices using neural networks

There are several factors which are present at the moment of infection which could potentially contribute to the decision to lyse or lysogenize. These include the number of infecting phages, the host's cell volume, the timing of infections, host cell physiology (growth), and even temperature. Using data collected from experiments (Lanying et al, Cell 2010) we can create and train a neural network to map the initial state of the infection to the output decision of lysis or lysogeny. In this case, the initial state of the infection is transformed into a 4 dimensional vector of predictive "features". These features include: MOI, cell length, cell width, and location of infections. Thus, we create a neural network that has 4 neurons in the input layer, 6 neurons in the hidden layer, and 1 neuron in the output layer. The output layer is a number between 0 and 1, with 1 corresponding to lysogeny and 0 corresponding to lysis.



Training the network on a subset of the data and using the rest as a test-set, we obtain approximately 70% prediction accuracy. Varying the network's toplogy results in roughly the same prediction accuracy. Does this mean that the neural network is not smart enough to learn when a cell should lyse or lysogenize?

No

This result suggests that biochemical stochasticity in the firing of reactions is a non-negligible component of this decision-making system. It behaves as a biased coin such that the "probability of heads" changes depending on the relevant variables (such as MOI and cell size). If the neural network's predictions were instead fairly accurate, it would suggest a relatively limited role of biochemical stochasticity. But, this is not what we observe, and we cannot rule out the importance of stochastic biochemical reactions influencing the decision.



Using neural networks to identify lysis or lysogeny directly from stochastic trajectory data

In the figure below, we have sampled stochastic trajectories for the detailed model of the phage lambda gene regulatory network. The data is noisy and appears complex. Can we train a neural network to "look" at these plots and determine if the outcome of these simulations are lysogeny or lysis?

To see if this is indeed the case, we train a neural network on the raw stochastic trajectory data from the detailed stochastic simulations (above). The input to the network is a 60-dimensional feature vector such that the t-th component is the concentration of a key protein at minute t. The topology of this neural network is 60 neurons in the input layer, 60 in the hidden layer, and 1 in the output layer. Prediction accuracy is approximately 100% on new data, which means it works!



Using neural networks to predict protein secondary structure from primary structure

Can the traditional problem of determining protein secondary structure (folded structure) from the primary structure (sequence of amino acids) be attacked using neural networks? To address this, we analyzed approximately 200,000 sequences culled from the Protein Data Bank (PDB). After analyzing the sequences, we learned that the important predictive features of these sequences are the inter-residue distances (see figure below, each letter is a type of amino acid residue).

For each sequence, we extracted all the inter-residue distances between all pairs of residues in the given sequence. This was then transformed into a 2000-dimensional vector, which served as the input to a neural network having 2000 neurons in the input layer, 2000 neurons in the hidden layer, and 3 neurons in the output layer. The output layer had 3 neurons for predicting 3 classes of secondary structure (alpha helix, beta strand, neither).

Training this neural network on 5% of the data culled from the PDB gave a prediction accuracy greater than 98% on the remaining 95% of the data, wow!



Fitting output of complex stochastic simulation using neural network

Neural networks are almost like magic. They can learn some very complicated things. Could they learn a complicated mathematical function? To test this possibility, we can build a neural network to "learn" the function:

The neural network is trained on a training set of 100 randomly chosen training points, and is tested on 100 randomly chosen test points. The network has 6 neurons in the input layer (one for each x-variable), 8 neurons in the hidden layer, and 1 neuron in the output layer. The function values are normalized by the max value so that they lie between 0 and 1, and can be matched by the output neuron of the neural network. From the figure, we see that the neural network does a reasonably good job at approximating this function. Different network topologies may give a superior fit between the f(x) function and the neural network. Also, this neural network took a long time to converge, possibly because the function to be learned is quite complicated.