Interpreting Covid-19 Test Results

Tags: Covid-19 , Medical tests , Conditional Probability , Desmos

(Español: Interpretación de resultados de pruebas de laboratorio de Covid-19)

Ever since the Covid-19 pandemic started, the subject of interpreting a Covid-19 test results has been coming back to bite me over and over again.

I knew the fact that the probability of a Covid-19 test result giving the correct information depends on the portion of the population that is infected at that moment, but I did not know the details or understand why this could be the case. Every time I tried to read about this I ended up reading articles with lots of tables with made-up numbers that did not help me gain any clarity or understanding on the matter.

I finally found an AMS Feature Column by Bill Casselman that got me a bit closer to understanding, and in this post I give a very graphical explanation using probability trees, provide some extra details, and show some nice interactive plots that I made to play around with the ideas in the column.

Before going any further, I need to establish some terminology.

Some terminology

  • A Covid-19 test may give a positive result when the person is not infected. This is called a false positive.
  • Similarly, a Covid-19 test may give a negative result when the person is in fact infected infected. This is called a false negative.
  • The two remaining outcomes are when the test tells the truth. These are the true positive and true negative.

Which of these four possible outcomes of a Covid-19 test is more important depends on who you are and what you are worrying about. From a societal perspective, the false negatives are probably the most serious ones: they make people think they are not infected when they in fact are, and so increase the chances of them spreading the disease.

Don’t believe a test saying “Guaranteed to have less than 1% false negatives!”

A natural assumption is to believe that the percentage of each of these cases (false negatives, true positives, etc.) depends on the quality of the test:

  • a high quality test has few false negatives, and few false positives.
  • a low quality test has more false negatives, and more false positives.

Now, the thing is, this is not entirely true! The percentages for each case depend on the population that one is testing and the moment of time one is doing so. One cannot look at the specification sheet of a PCR Covid-19 test and read “Guaranteed to have less than 1% false negatives!”. A statement like this simply cannot be given.

Wrapping one’s head around this fact, and understanding why it is so is the content of the AMS Column by Bill Casselman, and what I show and explain in this blog post.

I will try my best to be self contained in what follows so there is no need to read the column before going on. Hopefully the column will make even more sense after reading this!

Before moving on, here are the cool applets I made!

The applet below gives the chances that a test result is actually true (true negative, or true positive).

Follow this link if the interactive does not load.

How it works, and what it says:

  • Move the point $p$ (on the horizontal axis) to specify the “amount of covid-19 around you”, as a percentage (which we are identifying here as the “percentage of infected population”, but later we will try to make this more precise).
  • The vertical axis tells you what the chances are of a test result being actually true (true negative, or true positive) given the value of $p$ you chose.
  • The “quality” of the test is determined by the values of $a$ and $b$ which you can adjust with the sliders. These are also numbers between $0\%$ and $100\%$ and have fancy names:
    • $a$: the sensitivity of the test.
    • $b$: the specificity of the test.

I’ll explain more what the sensitivity and the specificity mean below — they are the numbers that you do find in a specification sheet of a Covid-19 test!

And here is its complement applet, about false test results:

The applet below shows the chances of a test result being false (false negative, or false positive). Move the point $p$, and change the quality of the test by changing $a$ and $b$!

Follow this link if the interactive does not load.

Here they are together (all true and false possible outcomes), in all their messy glory:

Follow this link if the interactive does not load.

How are these applets generated?

The short answer is that I created them using desmos by building on what Bill Casselman explains in his AMS Column, but I provide a detailed explanation with lots of pictures below!

Some background - probability trees

Probabilities multiply, and it is often useful to visualize this in trees.

Here is an example: Say I take a card from a deck of cards, and then I take a second one. What is the probability that I grab a heart and then another heart? To answer this it helps thinking about the two steps separately.

There are 4 possible outcomes in total:

  • heart then heart
  • heart then not heart
  • not heart then heart, and
  • not heart and then not heart

These four outcomes show up at the far right in the tree-diagram below:

So, what is the chance for each outcome? The nice thing about the tree is that we can separate thinking about each step separately, and then multiply.

Specifically, for the first card:

  • The probability that the first card is a heart is $13/52$ (since there are 13 hearts and 52 cards in total). Note that $13/52=1/4$ which gives us another way to understand this: probability that the first card is a heart is $1/4$ (since a quarter of the cards are hearts). So, there is a $25\%$ chance of grabbing a heart.
  • The probability that the first one is not a heart is $39/52=3/4$ (all the other cards). So, there is a $75\%$ chance of not grabbing a heart.

We put these probabilities in the tree:

For the second card the chances are not the same since there are less cards, and the number of hearts varies!

For example, if the first card was a heart, then there are only 12 hearts in the remaining 51 cards. So, if the first card was a heart, the chances of grabbing a second heart are $12/51$, or $23.53\%$ (as expected, lower than $25\%$, since one heart was taken out). This second probability is what is known as a conditional probability.

We put this new information in the tree:

What if the first card was not a heart? Well, then all the 13 hearts are present in the renaming 51 hearts, so there is a $13/51=25.49\%$ chance that the second card is a heart if the first one was not a heart (as expected, higher than $25\%$, since the there are the same amount of hearts but less cards).

We put this new information in the tree too:

The thick blue paths above correspond to the two ways that one can obtain a heart in the second draw:

  • 1st is heart and 2nd is heart (path at the top)
  • 1st is not heart and 2nd is heart (the other path)

Similar reasoning allows one to complete the whole tree. This is what one gets:

Now, we can put them together and multiply! The percentages on the right are the probability of each outcome:

What this says:

  • There is a $5.88\%$ chance that the first card is a heart and the second one is a heart.
  • There is a $19.12\%$ chance that the first card is a heart and the second one is not a heart (same as in the opposite order, which is the third outcome in the tree).
  • There is a $58.88\%$ chance that the first card is not a heart, and the second card is also not a heart.

So, there is a $5.88\%$ chance that one grabs two hearts. Much lower than the $55.88\%$ chance of not grabbing any hearts.

Note that all the percentages on the right add up to $100\%$, as they should since they cover all the possible outcomes!

Back to Covid-19

This is the tree related to Covid-19 testing:

What probabilities do we know in this tree?

This is where the sensitivity and specificity of the test come in. When a test is created, medical studies are conducted to figure out:

  • Test sensitivity ($a\%$): Chances that a test will correctly identify an infected person (i.e. say positive if the person is infected).
  • Test specificity ($b\%$): Chances that a test will correctly identify a non infected person (i.e. say negative if the person is not infected).

These are what one could call “quality” parameters of the test. The important fact is that they do not depend on the population one is testing. They also happen to be the numbers in our tree!

If we know these, then we know the other two chances in this same “branch level”, since they are complementary events:

What about the first two branches of the tree?

Well, the chances of being infected change according to the proportion of people sick at the current moment. If $p\%$ of the people are infected, then we can take this as the chance of being infected (more on this later). Similarly, the chance of not being infected is $(100-p)\%$ (all the other people).

So, our tree with the chances of each step is now complete!

We can now multiply the chances to calculate the probability of each outcome. Note that everything depends on two things: the parameters of the test ($a\%$ and $b\%$), and on the current proportion $p\%$ of the population that is infected (which is independent of the test).

So, where do the crazy curves in the applets above come from?

The natural sort of questions one asks oneself after getting a Covid-19 tests are of the form:

  1. “I got a negative test result, what are the chances I am in reality infected?”
  2. “I got a negative test result, what are the chances that it is true (I am not infected)?”
  3. “I got a positive test result, what are the chances that the test is wrong?”
  4. “I got a positive test result, what are the chances that it is correct?”

These are not questions about the probability of an outcome in the tree, but one can figure out what one needs to compute from the tree to be able to answer them.

This is how it works: example with question (1).

Say, we want to answer the question (1): “I got a negative test result, what are the chances I am really infected?”

In question (1) the test result came out negative. These are all the paths in the tree with a negative test result:

Looking at this tree, the question we want to answer is: out of those two possible outcomes giving a negative test result, what are the chances that I am really on the top path (the one corresponding to being truly infected)? Well, you just divide the chances:

$$ \frac{\text{chance of infected path}}{\text{sum of chances of all negative paths}} =\frac{p\%(100\%-a\%)}{p\%(100\%-a\%)+(100\%-p\%)b\%} $$

That is it! Those are the chances of actually being infected given a negative Covid-19 test result:

$$ \begin{array}{c}\text{Chance of having}\\ \text{Covid-19 if the test }\\ \text{result is negative}\end{array}=\frac{p\%(100\%-a\%)}{p\%(100\%-a\%)+(100\%-p\%)b\%} $$

How does this look like with actual numbers?

Say the PCR test that was taken has sensitivity $a\%=95\%$, and specificity $b\%=85\%$, then $$ \begin{array}{c}\text{Chance of having}\\ \text{Covid-19 if the test }\\ \text{result is negative}\end{array}=\frac{p\%\times 5\%}{p\%\times 5\%+(100\%-p\%)\times 85\%} $$ … still depends on $p$!

Ok, so what if I assume the chances in my community of being infected at that particular moment are $30\%$? Then one does get a number: $$ \begin{array}{c}\text{Chance of having}\\ \text{Covid-19 if the test }\\ \text{result is negative}\\ \text{assuming $p=30$}\end{array}=\frac{30\%\times 5\%}{30\%\times 5\%+70\%\times 85\%}=2.459\% $$ It is a small chance!

But now, say I have Covid-19 related symptoms, and was with someone who tested positive. Then talking $p\%$ to be $30\%$ seems to be a mistake, since I am not a randomly selected member of my population. Maybe it should be $80\%$ instead? $$ \begin{array}{c}\text{Chance of having}\\ \text{Covid-19 if the test }\\ \text{result is negative}\\ \text{assuming $p=80$}\end{array}=\frac{80\%\times 5\%}{80\%\times 5\%+20\%\times 85\%}=19.05\% $$ This increases the chances….

And then again, since $p$ varies with time and depends on my surroundings, maybe instead of repeating this computation over and over again, one should just plot the chances as a function of $p$!

Here is the plot, for a test with $a\%=95\%$, and specificity $b\%=85\%$:

You can see in the figure the chances for $p=30$ and $p=80$ we computed above. Just follow what the curve says! See below.

As you may be starting to see, the answer to the simple questions one makes about a Covid-19 test result are not that simple to answer!

Plots for different values of $a$ and $b$

Below is an interactive desmos plot of the above plot, with sliders for the sensitivity and specificity $a$, $b$ of the test, so that you can see how they affect the curve that answers the question (1) “I got a negative test result, what are the chances I am really infected?"

(it may be better to open this in its own screen, so if you want to play with it, follow this link)

Follow this link if the interactive does not load.

You can also generate plots for other values of $a$ and $b$ using the TikZ+pgfplots souce file I created to make the figures in this post. You just need to change the numbers in the \pgfmathsetmacro{\SensA}{0.95} and \pgfmathsetmacro{\SpecB}{0.85} (they need to be given in decimal form, not percentages) and run pdflatex on them.

Probability of being infected $p$: pre-test probability

One final thing before wrapping up….

As mentioned above, one should take as $p$ not the percentage of the population that is infected, but take into account the particular situation of the person for which one is interpreting the test.

The person-and-time-specific estimate of $p$ should involve considering at least:

  • What % of the population has Covid-19 right now? (not the reported one, but the true expected one). This is the first estimate for $p$.
  • Does the person have symptoms? If so, the estimate of $p$ should increase.
  • Has the person been in contact with someone with symptoms or that who has tested positive? If so, the estimate of $p$ should increase.

The final $p$ is what is sometimes called the pre-test infection probability. It depends both on the moment of time and on the particular situation of the person. It can only be very roughly estimated.

With it, one can look at the test result and interpret the result accordingly, but it is all based on assumptions and estimates!

Then there are issues about the time of the test and possible time of infection… the values of $a$ and $b$ may change for a specific test if it is “too early” or “too late” to test. Estimates have to be made everywhere…

Desmos Applets again!

Ok, so I think this covers all that was needed to understand what went into generating the desmos plots to figure out the chances that a covid-19 tests results are true or false.

Here they are again. So many moving parts that it is hard to understand what is going on! Try thinking of a specific case you have gone through, and figure out what it would say in that case. Also consider opening the appled in full size by following this link.

Follow this link if the interactive does not load.

What to take from all of this?

  • The chances that Covid-19 tests results are true or false depend on:
    • the initial chance of infection $p$ (pre-test chance of infection), and this chance depends both on the moment of time and particular person.
    • the Sensitivity and Specificity of the test.
  • Sensitivity and Specificity are conditional probability “quality” parameters for a Covid-19 test that are independent of the population that is being tested.
  • Then Sensitivity and Specificity, however, are not completely constant, and may be different if it is “too early” or “too late” to test.
  • The chances that Covid-19 tests results are true or false are different if the test results are positive or negative! Maybe this is one of the most important things to take home!

Subscribe

Want to get an email when a new post is added? If so, subscribe here.