12,399 views

Gro-Tsen

@gro_tsen

, 44 tweets, 9 min read

My Authors

https://twitter.com/gro_tsen/status/1258346734029340672

https://twitter.com/gro_tsen/status/1258346734029340672

Let me say a bit more about these curves modelling the attack rate as a function of the variance of infectious contacts, for a given reproduction number (here R₀=2.5), and explain how they were computed. •1/44

https://twitter.com/gro_tsen/status/1258346734029340672

This is a very simple graph (for some reason people prefer “network”) based epidemiological model. No time involved! We just have a graph of “infectious contacts”, start with an infected vertex and propagate forward: the attack rate is the proportion that gets infected. •2/44

The graph itself can be directed (=oriented) or undirected. The blue curve is for a directed graph, the red one for an undirected one, I'll get back to this. The graph is computed using a random graph model which specifies the distribution of degrees. •3/44

I'm not sure how this random graph model should be called, but the details aren't terribly important, what matters is that one can specify the degree distribution and that there aren't things like degree-degree correlation or clustering. Here's a VAGUE account: … •4/44

Basically, in the undirected case, take a large number of vertices, draw the degree of each vertex at random following the specified distribution, place as many “stubs” around each vertex as its degree should have, then create the edges by repeatedly connecting stubs: … •5/44

… choose two stubs at random, create an edge between the vertices in question replacing the stubs, and repeat until there are no more stubs. This is sometimes called the (Bollobás) “configuration model” or Molloy-Reed model. en.wikipedia.org/wiki/Configura… •6/44

The directed case is essentially the same except that we do draw in-degrees and out-degrees following two different distributions having the same mean, and we use in-stubs and out-stubs. I don't know if this random graph construction has a name! •7/44

But again, I'm being sketchy on details here because there's probably a wide range of random graph models for which the calculation will apply. For the directed case we don't even care about the out-degree distribution, only the in-degree distribution: … •8/44

… so we can take the out-degrees to be Poisson-distributed (and construct the graph by ignoring out-stubs completely, just pick an in-stub and create an edge from it to a uniformly chosen random vertex). See threadreaderapp.com/thread/1252581… for more on this. •9/44

ANYWAY, given a distribution of degrees, which is all that will matter, we care about the expected size of the giant connected component of the random graph: for the undirected case this has the pretty straightforward meaning; … •10/44

… whereas for the directed case it will be the size of a giant out-component (the “out-component” or “down-set” of a vertex x being the set of vertices which can be reached from x), meaning the size of the out-component for a vertex x such that it is giant. •11/44

The reason why we care about the giant connected component is that we are assuming the epidemic actually spreads: we care about how many people it hits, its attack rate, which is the relative size of this giant component (in proportion of all vertices). •12/44

Now this may seem tremendously complicated! Choose a distribution of degrees, construct a random graph by a process that I didn't fully describe, take the giant connected component that I didn't accurately define, and take its expected size! How do we do this? •13/44

Well, it turns out we don't. There's a formula for the expected size of this component in function of the distribution of degrees, and this formula is all I use. I never actually constructed any graphs. So what's this formula? •14/44

First recall that if p_i is a probability distribution on ℕ (meaning p₀,p₁,p₂,… are nonnegative reals with sum 1, where p_i is ℙ(X=i)), recall that the generating function G(z) of this distribution is ∑_i p_i·z^i (a power series of z). •15/44

Equivalently, it is the function which takes z to the expected value 𝔼(z^X) of z^X where X has the distribution in question. Clearly, G is nonnegative, increasing, satisfies G(1)=1, and G′(1) = ∑_i p_i · i is the expected value 𝔼(X) of X. •16/44

Next, recall what a Galton-Watson process is: based on a probability distribution p_i as above, construct a tree by starting with the root and, for each node independently and recursively, give it a number of children selected from the given distribution. •17/44

The “extinction probability” of the G-W process is the probability that the tree thus constructed is finite (the process dies out in a finite number of steps). There is a simple formula for this: it is the smallest fixed point of G, namely, the smallest u s.t. G(u)=u. •18/44

Here's a handwavy explanation for this: if u is the probability that the G-W process terminates (starting from a single node), the probability that it terminates starting from k nodes is u^k, but a single node has probability p_k of having k children: … •19/44

… so, dividing according to the number of nodes i that the root node has, we have u = ∑_i p_i·u^i, meaning u=G(u). And u certainly isn't the largest fixed point of G (that's 1), and my G's will only have two fixed points anyway, so that's good enough. •20/44

To compute the attack rate (=size of the giant out-component) in our directed graph, the formula is then this: it is equal to the non-extinction probability (1−u) of the G-W process based on the in-degree distribution on which the graph was constructed. •21/44

The intuitive explanation for this is that to determine whether a random vertex was reached by the epidemic, we trace BACK from that vertex: we look at the nodes that could have infected it, then those that could have infected those, and so on: … •22/44

… so, given sufficiently many nodes, this behaves like a G-W process based on the distribution of in-degrees, and this process will nonterminate iff the node x is infected by the epidemic. This is what I tried to explain in threadreaderapp.com/thread/1252581… (mentioned above). •23/44

Again, lots of handwaving here on my part, but if you want a precise theorem, it's theorem 4 in the paper “The Strong Giant in a Random Digraph” by Mathew Penrose, arxiv.org/abs/1409.4371 •24/44

So anyway, in the directed graph case, the attack rate of the epidemic is 1−u where u is the smallest fixed point of the generating function G for the distribution of IN-degrees used to construct the graph. •25/44

Now what about the undirected case? Well, the formula is a tiny bit more complicated (but it's also more standard, because undirected random graphs have been more studied): let G₁(z) := G′(z)/G′(1) where G′(z) = ∑_i i·p_i·z^{i−1} is the derivative of G. •26/44

Then the (expected) relative size of the giant connected component of the random graph is 1−G(u) where u is the smallest fixed point of G₁ this time. A very sketchy intuitive explanation is this: again, we follow an exploration process of the G-W kind, … •27/44

… but this time, when we follow an edge, we don't arrive at some uniform random vertex but by an one chosen with probability proportional to the number of edges leading there! And that's exactly what G₁ does (multiply p_i by i and renormalize). •28/44

Again, this is super super sketchy, and I left out some crucial hypotheses (supercriticity!), but those who want a precise theorem and proof will find it in “The Size of the Giant Component of a Random Graph with a Given Degree Sequence” by Molloy and Reed, theorem 1, … •29/44

… and the Molloy–Reed theorem isn't even obvious to decode into what I wrote: see “Random graphs with arbitrary degree distributions and their applications” by Newman, Strogatz and Watts (‌arxiv.org/abs/cond-mat/0…‌) for some explanations. •30/44

But anyway, these are the formulas I'll be using: in the directed case, 1−u where u=G(u), and in the undirected case, 1−G(u) where u=G₁(u) with G₁(z):=G′(z)/G′(1). But what distributions do we take? •31/44

Well, one very important, in a sense “universal” or “unbiased”, distribution of degrees, is Poisson, namely p_i = exp(−κ) · κ^i/i!. This is what you get if you just create edges between random vertices until the desired average degree. Its variance equals its mean, κ. •32/44

For the Poisson distribution, the generating function is G(z) = exp(−κ(1−z)). Note that we also have G₁(z)=G(z). The smallest fixed point of G is −W(−κ·exp(−κ))/κ where W is Lambert's W function (W(z) is the largest solution of w·exp(w)=z for z real). •33/44

I already discussed this value in the thread threadreaderapp.com/thread/1236324… as the attack rate for the classical SIR model. This is expected: if we run SIR on a complete graph, contagions will occur at random and contagion in-degree distribution will be Poisson. •34/44

Another natural distribution is geometric (=discrete exponential): p_i = κ^i/((κ+1)^{i+1}) for mean κ and variance κ(κ+1). (This occurs as the distribution of OUT-degrees in the context of the previous tweet, for example, given recovery in exponential process.) •35/44

For the geometric distibution, generating function is G(z) = 1/(κ+1−κz) and its smallest fixed point is u=1/κ. This 1−u = 1 − 1/κ that we get here coincides with the herd immunity threshold of classical SIR. There's probably a reason for this, but I couldn't find it! •36/44

As a sequence of distributions that naturally includes Poisson and geometric as special or limit cases, I chose the binomial and negative binomial (Pólya) distributions: binomial has G(z) = ((n−κ+κz)/n)^n where κ is mean and n = κ²/(κ−σ²) for variance σ². •37/44

While negative binomial has G(z) = (r/(r+κ+κz))^r where κ is mean and r = κ²/(σ²−κ) for variance σ². Both tend to Poisson when σ² → κ; negative binomial has geometric as a special case when r=1 (that's σ²=κ²+κ). •38/44

(There's something a little weird, here: the binomial distribution is meaningless, as a probability distro, unless n is an integer. So my curves should be dotted in the left-hand region. But the formulas still make perfect sense in general and give a reasonable value. … •39/44

… But that's not really important anyway, because I care mostly about the right-hand region of the curves, where the distribution is negative-binomial, and that one makes sense even if r isn't an integer. The lack of symmetry bugs me, but never mind.) •40/44

So my curves are as follows: the blue one gives 1− the smallest fixed point, u, of the function G equal to G(z) = (r/(r+κ+κz))^r for r = κ²/(σ²−κ) when σ²>κ, or to G(z) = ((n−κ+κz)/n)^n for n = κ²/(κ−σ²) when σ²<κ, with G(z) = exp(−κ(1−z)) as limit case. •41/44

And the red curve is 1−G(u) where u is the smallest fixed point of G₁(z) = (r/(r+κ+κz))^{r+1} or G₁(z) = ((n−κ+κz)/n)^{n−1} or G₁(z) = exp(−κ(1−z)), in these respective cases. Both are plotted in function of σ (the square root of the variance). •42/44

Again, the Sage code is here, and it really isn't very complicated: gist.github.com/Gro-Tsen/a74fe… — it just does what the previous two tweets say. •43/44

What does it all mean? Well, writing this thread has exhausted me, so I'll stop here for the moment and maybe later try to comment on how these curves should or shouldn't be read. But I have no idea how large we can expect the variance to be! •44/44

Enjoying this thread?

Try unrolling a thread yourself!

Enjoying this thread?

Try unrolling a thread yourself!

More from @gro_tsen see all

Embed code for your website

Did Thread Reader help you today?