Research
I earned my PhD at the University of Oxford in the Department of Statistics, where I worked with Prof. Charlotte Deane MBE. My thesis was about the analysis of protein structures and the protein folding process, and in particular the effect of the biophysics of protein formation on the protein folding pathway. During the PhD, I worked as a Campaign DigiDem and realized that I could use my technical and analytical skills for social good.
Protein folding as a biological process
Proteins fold to their native structure as a result of a complex, dynamic process. Since this process takes place over nanometers and microseconds, direct observations of the folding process are challenging. I worked on developing models which simplify the multidimensional space of protein conformations and allow us to predict trajectories through that space.
Bioinformatics of protein structure prediction
Protein structure prediction has been the focus of intense scientific effort for more than three decades. Over the last ten years, advanced statistical methods, including deep learning, have transformed our ability to predict protein structures. These methods rely on the analysis of large numbers of related protein sequences from many sources. My work in this area has focused on the biophysical relevance of these kinds of analyses and the possible differences in their ability for use for protein folding.
Papers
-
Nissley, D. A., Carbery, A., Chonofsky, M., & Deane, C. M. (2021). Ribosome occupancy profiles are conserved between structurally and evolutionarily related yeast domains. Bioinformatics, 37(13), 1853–1859. [abstract] [full text]
-
M. Chonofsky, S.H.P. de Oliveira, K. Krawczyk, C. Deane. The evolution of contact prediction: evidence that contact selection in statistical contact prediction is changing. Bioinformatics, btz16, 6 November 2019. [abstract] [full text]
Motivation: Over the last few years, the field of protein structure prediction has been transformed by increasingly accurate contact prediction software. These methods are based on the detection of coevolutionary relationships between residues from multiple sequence alignments (MSAs). However, despite speculation, there is little evidence of a link between contact prediction and the physico-chemical interactions which drive amino-acid coevolution. Furthermore, existing protocols predict only a fraction of all protein contacts and it is not clear why some contacts are favoured over others. Using a dataset of 863 protein domains, we assessed the physico-chemical interactions of contacts predicted by CCMpred, MetaPSICOV and DNCON2, as examples of direct coupling analysis, meta-prediction and deep learning.
Results: We considered correctly predicted contacts and compared their properties against the protein contacts that were not predicted. Predicted contacts tend to form more bonds than non-predicted contacts, which suggests these contacts may be more important than contacts that were not predicted. Comparing the contacts predicted by each method, we found that metaPSICOV and DNCON2 favour accuracy, whereas CCMPred detects contacts with more bonds. This suggests that the push for higher accuracy may lead to a loss of physico-chemically important contacts. These results underscore the connection between protein physico-chemistry and the coevolutionary couplings that can be derived from MSAs. This relationship is likely to be relevant to protein structure prediction and functional analysis of protein structure and may be key to understanding their utility for different problems in structural biology.
Availability and implementation: We use publicly available databases. Our code is available for download at [https://opig.stats.ox.ac.uk/].
-
A.K. Drukier, Ch. Cantor, M. Chonofsky, G.M. Church, R.L. Fagaly, K. Freese, A. Lopez, T. Sano, C. Savage, W.P. Wong, New class of biological detectors for WIMPs, Int. J. Mod. Phys. A, 29 (19), 1443007 (2014). [abstract] [pdf]
Weakly Interacting Massive Particles (WIMPs) may constitute a large fraction of the matter in the Universe. There are excess events in the data of DAMA/LIBRA, CoGeNT, CRESST-II, and recently CDMS-Si, which could be consistent with WIMP masses of approximately 10 GeV/c2. However, for MDM > 10 GeV/c2 null results of the CDMS-Ge, XENON, and LUX detectors may be in tension with the potential detections for certain dark matter scenarios and assuming a certain light response. We propose the use of a new class of biological dark matter (DM) detectors to further examine this light dark matter hypothesis, taking advantage of new signatures with low atomic number targets. Two types of biological DM detectors are discussed here: DNA-based detectors and enzymatic reactions (ER) based detectors. In the case of DNA-based detectors, we discuss a new implementation. In the case of ER detectors, there are four crucial phases of the detection process: a) change of state due to energy deposited by a particle; b) amplification due to the release of energy derived from the action of an enzyme on its substrate; c) sustainable but non-explosive enzymatic reaction; d) self-termination due to the denaturation of the enzyme, when the temperature is raised. This paper provides information of how to design as well as optimize these four processes.
-
Downes DJ, Chonofsky M, Tan K, Pfannenstiel BT, Reck-Peterson SL, Todd RB. (2014) Characterization of the Mutagenic Spectrum of 4-Nitroquinoline 1-Oxide (4-NQO) in Aspergillus nidulans by Whole Genome Sequencing. G3 4: 2483-2492. [abstract] [full text]
4-Nitroquinoline 1-oxide (4-NQO) is a highly carcinogenic chemical that induces mutations in bacteria, fungi, and animals through the formation of bulky purine adducts. 4-NQO has been used as a mutagen for genetic screens and in both the study of DNA damage and DNA repair. In the model eukaryote Aspergillus nidulans, 4-NQO-based genetic screens have been used to study diverse processes, including gene regulation, mitosis, metabolism, organelle transport, and septation. Early work during the 1970s using bacterial and yeast mutation tester strains concluded that 4-NQO was a guanine-specific mutagen. However, these strains were limited in their ability to determine full mutagenic potential, as they could not identify mutations at multiple sites, unlinked suppressor mutations, or G:C to C:G transversions. We have now used a whole genome resequencing approach with mutant strains generated from two independent genetic screens to determine the full mutagenic spectrum of 4-NQO in A. nidulans. Analysis of 3994 mutations from 38 mutant strains reveals that 4-NQO induces substitutions in both guanine and adenine residues, although with a 19-fold preference for guanine. We found no association between mutation load and mutagen dose and observed no sequence bias in the residues flanking the mutated purine base. The mutations were distributed randomly throughout most of the genome. Our data provide new evidence that 4-NQO can potentially target all base pairs. Furthermore, we predict that current practices for 4-NQO-induced mutagenesis are sufficient to reach gene saturation for genetic screens with feasible identification of causative mutations via whole genome resequencing.
-
Tan K, Roberts AJ, Chonofsky M, Egan MJ, Reck-Peterson SL. (2014) A microscopy-based screen employing multiplex genome sequencing identifies cargo-specific requirements for dynein velocity. Mol. Biol. Cell 25: 669-678. [abstract] [full text]
The timely delivery of membranous organelles and macromolecules to specific locations within the majority of eukaryotic cells depends on microtubule-based transport. Here we describe a screening method to identify mutations that have a critical effect on intracellular transport and its regulation using mutagenesis, multicolor-fluorescence microscopy, and multiplex genome sequencing. This screen exploits the filamentous fungus Aspergillus nidulans, which has many of the advantages of yeast molecular genetics but uses long-range microtubule-based transport in a manner more similar to metazoan cells. Using this method, we identified seven mutants that represent novel alleles of components of the intracellular transport machinery: specifically, kinesin-1, cytoplasmic dynein, and the dynein regulators Lis1 and dynactin. The two dynein mutations identified in our screen map to dynein’s AAA+ catalytic core. Single-molecule studies reveal that both mutations reduce dynein’s velocity in vitro. In vivo these mutants severely impair the distribution and velocity of endosomes, a known dynein cargo. In contrast, another dynein cargo, the nucleus, is positioned normally in these mutants. These results reveal that different dynein functions have distinct stringencies for motor performance.
Conference proceedings and talks
-
Tan K, Roberts A, Egan M, Chonofsky M, Reck-Peterson SL. Whole-genome sequencing identifies novel alleles of genes required for organelle distribution and motility in Aspergillus nidulans. 27th Fungal Genetics Conference, March 2013. [abstract]
Many organelles are transported long distances along microtubules in eukaryotic organisms by dynein and kinesin motors. To identify novel alleles and genes required for microtubule-based transport, we performed a genetic screen in the filamentous fungus, Aspergillus nidulans. We fluorescently-labeled three different organelle populations known to be cargo of dynein and kinesin in Aspergillus: nuclei, endosomes, and peroxisomes. We then used a fluorescence microscopy-based screen to identify mutants with defects in the distribution or motility of these organelles. Using whole-genome sequencing, we found a number of single nucleotide polymorphisms (SNPs) that resulted in misdistribution of peroxisomes, endosomes, or nuclei. Some of these SNPs were novel alleles of cytoplasmic dynein/ nudA, Arp1/ nudK (dynactin), Lis1/ nudF, and kinesin-1/ kinA. Here, we characterize the in vivo transport defects in these novel mutants and analyze the single molecule in vitromotility properties of purified mutant motor proteins. We also describe our methods for using whole genome sequencing as a tool in mutagenesis studies in A. nidulans.
-
S. Reck-Peterson S, Tan K, Egan M, Chonofsky M, Roberts A. Novel alleles of genes required for organelle distribution and motility in Aspergillus nidulans: a whole-genome sequencing approach. ASCB 2012. [abstract]
Many organelles are transported long distances along microtubules in most eukaryotic organisms. To identify novel factors required for microtubule-based organelle distribution and motility, we performed a screen in the filamentous fungus, Aspergillus nidulans. As a model system, A. nidulans combines some of the advantages of yeast molecular genetics with the long-range microtubule-based transport of metazoans. Using whole-genome sequencing, we found a number of single nucleotide polymorphisms (SNPs) that resulted in misdistribution of peroxisomes, endosomes, or nuclei. Some of these SNPs were novel alleles of cytoplasmic dynein, dynactin, Lis1, and kinesin-1. Here, we characterize the transport defects in these novel mutants and report methods for using whole genome sequencing as a tool in mutagenesis studies in A. nidulans.
-
Chonofsky, M. A new clustering algorithm to improve motor protein tracking. Clare Research Symposium, March 2012. [abstract]
Cytoplasmic dynein is a dimeric motor protein which is active in cell division, transport processes (including viral transport), cilia, and other contexts. When studied by TIRF microscopy, the data present an interesting statistical problem: noisy data must be assigned to discrete steps. While reminiscent of planar k-means clustering (which is NP-hard), the strictly time-bound nature of the data presents difficulties for standard clustering algorithms. I present a stochastic steepest-descent algorithm for this problem, outlining its utility in theory and practice. I will present some simulation results, and then demonstrate the application of the algorithm to dynein stepping data.
-
Chonofsky, M. The phylogenetic SOWH test: simulation assessment and a new alternative. Clare Research Symposium, March 2011. [abstract]
Finding the best evolutionary history (or phylogenetic tree) of a set of organisms is a complex task due to the combinatorial structure of the problem. Small changes in branch pattern can substantially alter how well the tree fits the data, and the parameter space is both discrete and multidimensional. Hence, it is computionally prohibitive to fully describe the distribution of support for trees. Phylogenetic trees are used in taxonomy, epidemiology, and many other fields, so it is important to be able to find the best possible phylogenies. My work uses computer simulation to assess confidence in phylogenetic trees. Given some data and a single “best” phylogenetic tree, how likely is it that another tree explains the data just as well? I will present new simulation results about a particular parametric bootstrap test of phylogeny, including an improved method for estimating tree confidence.
-
Rai, H., Chonofsky, M., Kelch, D., Cronn, R., Parks, M., Nagalingum, N., et al. Generic And Familial Relationships Within Conifers Inferred From Nuclear Data. The annual conference of the Botany Society of America, July 2009. [abstract]
Extant conifers (Coniferae) comprise approximately 700 species in 70 genera. Currently, systematists recognize 6-8 conifer families. Independent lines of evidence have provided support for the monophyly of Pinaceae and of Cupressophyta (all other extant conifers). If gnetophytes are excluded from conifers, these two clades are sister taxa. Within cupressophytes, a close relationship between the southern hemisphere Araucariaceae and Podocarpaceae has been supported, as has the recircumscription of Taxodiaceae, now included in Cupressaceae. However, uncertainty remains regarding the relative positions of Cephalotaxaceae, Cupressaceae, and Taxaceae. Moreover, a robust test of familial relationships inferred from plastid data is lacking, and in many cases, generic relationships remain to be elucidated. We are surveying multiple, independently evolving phytochrome loci and associated intronic regions from a large number of genera representing all 8 conifer families. In initial trees inferred from incomplete phytochrome matrices, Cephalotaxaceae are sister to, rather than nested in Taxaceae, and within Podocarpaceae, Prumnopitys is the sister group of a clade that includes Parasitaxus as the sister group of Manoao/Lagarostrobos.