Notes on SARS-CoV-2 and phylodynamics: Software, Papers, links, numbers, questions

Started 2020-03-24.

Software for phylodynamics: Inference

BEAST2 is a a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. Many people (including me) have written packages which deal with particular types of analysis. Some (but not including mine) focus on infectious diseases. I got the BEAST2 ones from here.

BEAST2 packages

BADTRIP. Infer transmission time for non-haplotype data and epi data

SCOTTI. Structured COalescent Transmission Tree Inference

bdmm. Multitype birth-death model (aka birth-death-migration model)

BDSKY. birth death skyline - handles serially sampled tips, piecewise constant rate changes through time and sampled ancestors.

EpiInf. BD/SIR/SIS epidemic trajectory inference.

PhyDyn. Epidemiological modelling with BEAST

phylodynamics. BDSIR and Stochastic Coalescent

BASTA. Bayesian structured coalescent approximation

CoalRe. Infer viral reassortment networks

BEAST v1 models

There is also beastlier which is implemented in BEAST v1. Github link.

Others

TransPhylo R package

PHYLOSCANNER R package.. Inferring Transmission from Within- and Between-Host Pathogen Genetic Diversity. Molecular Biology And Evolution

Phybreak R package. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

TreeFix-TP. TreeFix-TP: Phylogenetic Error-Correction for Infectious Disease Transmission Network Inference

SLAFEEL. Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases

No doubt there is other software...

Software for phylodynamics: Simulations

FAVITES. Simultaneous simulation of transmission networks, phylogenetic trees and sequences

Software for phylodynamics: Visualisation and analysis

Previously mentioned R package PHYLOSCANNER maybe useful.

About the virus

Mutation rate

Based on figures for the first SARS and Andrew Rambaut's preliminary look and 30000 sites in the genome, I reckon roughly somewhat less than one mutation per genome per week, or roughly one within each host. (0.0015 per site per year is 30000*.0015/50 = 0.9 per genome per week; 0.0015 is on the high side of the estimates.)

Direct RNA sequencing and early evolution of SARS-CoV-2, George Taiaroa et al, puts it at 0.0012 substitutions/site/year (95% HPD 0.00063 to 0.0017)

Phylodynamic analyses based on 128 sequences, Tanja Stadler et al, has two estimates, for two different models, both near 0.0007.

Temporal signal and the phylodynamic threshold of SARS-CoV-2, Sebastian Duchene, Leo Featherstone, Melina Haritopoulou-Sinanidou, Andrew Rambaut, Philippe Lemey, and Guy Baele, 2020. "Analyses of subsequent data sets, that included between 47 to 122 genomes, converged at an evolutionary rate of about 1.1x10-3 subs/site/year and a time of origin of around late November 2019. Our study provides guidelines to assess the phylodynamic threshold and demonstrates that establishing this threshold constitutes a fundamental step for understanding the power and limitations of early data in outbreak genome surveillance."

Replication fidelity

Coronaviruses: An RNA proofreading machine regulates replication fidelity and diversity has some useful information about murine coronavirus, but I haven't got an absolute value.

Coronaviruses as DNA Wannabes: A New Model for the Regulation of RNA Virus Replication Fidelity

Thinking Outside the Triangle:Replication Fidelity of the Largest RNA Viruses. If I read Fig 2 correctly, the number of errors per nucleotide per replication cycle is between 1e-6 and 1e-7 for coronaviruses.

Mutation biases

The divergence between SARS-CoV-2 and RaTG13 might be overestimated due to the extensive RNA modification, Yue Li, Xinai Yang, Na Wang, Haiyan Wang, Bin Yin, Xiaoping Yang& Wenqing Jiang.

Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2, Salvatore Di Giorgio, Filippo Martignano, Maria Gabriella Torcia, Giorgio Mattiuz, Silvestro G. Conticello.

Evidence for strong mutation bias towards, and selection against, T/U content in SARS-CoV2: implications for attenuated vaccine design, Alan M Rice, Atahualpa Castillo Morales, Alexander T Ho, Christine Mordstein, Stefanie Mühlhausen, Samir Watson, Laura Cano, Bethan Young, Grzegorz Kudla, Laurence D. Hurst.

Issues with SARS-CoV-2 sequencing data, Nicola De Maio, Conor Walker, Rui Borges, Lukas Weilguny, Greg Slodkowicz, Nick Goldman.

Rampant C to U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses – causes and consequences for their short and long evolutionary trajectories, Peter Simmonds.

Longitudinal peripheral blood transcriptional analysis of COVID-19 patients captures disease progression and reveals potential biomarkers, Qihong Yan et al. "Nevertheless, there was a significant increase in type I interferon-stimulated genes (ISGs) in S1, including ISG15, ISG20, TRIM5, TRIM25, and APOBEC3A, IFN-induced GTP-binding protein (MX1 and MX2), 2'-5'-oligoadenylate synthase (OAS1, OAS2, OAS3, and OASL), and interferon-inducible protein (IFI6, IFI27, 240IFI35, and IFITM3)." (my emphasis.)

Generation time

How long does it take from a virus particle entering a cell to its progeny entering other cells? I don't know.

Recombination rate

This is probably high. This article has the following abstract.

Mouse hepatitis virus (MHV), a coronavirus, has been shown to undergo a high frequency of RNA recombination both in tissue culture and in animal infection. So far, RNA recombination has been demonstrated only between genomic RNAs of two coinfecting viruses. To understand the mechanism of RNA recombination and to further explore the potential of RNA recombination, we studied whether recombination could occur between a replicating MHV RNA and transfected RNA fragments. We first used RNA fragments which represented the 5' end of genomic-sense sequences of MHV RNA for transfection. By using polymerase chain reaction amplification with two specific primers, we were able to detect recombinant RNAs which incorporated the transfected fragment into the 5' end of the viral RNA in the infected cells. Surprisingly, even the anti-genomic-sense RNA fragments complementary to the 5' end of MHV genomic RNA could also recombine with the MHV genomic RNAs. This observation suggests that RNA recombination can occur during both positive- and negative-strand RNA synthesis. Furthermore, the recombinant RNAs could be detected in the virion released from the infected cells even after several passages of virus in tissue culture cells, indicating that these recombinant RNAs represented functional virion RNAs. The crossover sites of these recombinants were detected throughout the transfected RNA fragments. However, when an RNA fragment with a nine-nucleotide (CUUUAUAAA) deletion immediately downstream of a pentanucleotide (UCUAA) repeat sequence in the leader RNA was transfected into MHV-infected cells, most of the recombinants between this RNA and the MHV genome contained crossover sites near this pentanucleotide repeat sequence. In contrast, when exogenous RNAs with the intact nine-nucleotide sequence were used in similar experiments, the crossover sites of recombinants in viral genomic RNA could be detected at more-downstream sites. This study demonstrated that recombination can occur between replicating MHV RNAs and RNA fragments which do not replicate, suggesting the potential of RNA recombination for genetic engineering.

Transmission time

Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing, Ferretti et al, has information about the distribution of transmission times.

General links

There is an excellent podcast called TWIV (this week in virology). They are long conversations or interviews, some require quite a bit of background, some are answering emails from the general public. There's many hours about COVID-19.

Half-hour talk on the mathematics of the Corona outbreak by Tom Britton. Only needs school maths.

If you are a programmer and want to help, these may interest you.
Biohackathons
Kaggle

More generally, there is
Crowdfight COVID-19

You can contribute your computer's time here, to help understand the virus proteins.
Folding at home