COVID-19

shpalman · Post by **shpalman** » Sun Dec 13, 2020 9:59 pm

shpalman wrote: Sun Dec 13, 2020 5:00 pm Lombardy is a yellow zone as of today, and it was a nice day, so you can imagine everyone went into town.

This was taken at about 15:20.

To be fair, the narrow streets in the centre are usually almost impassable during the weekends in December.

Italy let people go to the shops and everyone went to the shops...

jimbob · Post by **jimbob** » Sun Dec 13, 2020 10:09 pm

KAJ wrote: Sun Dec 13, 2020 8:49 pm
shpalman wrote: Sun Dec 13, 2020 6:37 pm Depends how it's handled if a swab is carried out on one day but processed the next day.
Good point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.

In developers-guide#params-structure I don't see a metric for tests other than by publish date. On the other hand I don't see "newPillarOneTwoTestsByPublishDate" either, which I've been using for some time and must have picked up from coronavirus.data.gov.uk in the first place.

I guess the dates of cases by publish date may correspond to those of tests by publish date. Taking that ratio I get:
CasesTests.png
Code: Select all
Analysis of Variance Table

Response: log(Cases.Tests)
              Df  Sum Sq Mean Sq F value    Pr(>F)    
poly(date, 3)  3 1.99814 0.66605  34.030  2.12e-12 ***
day            6 0.59199 0.09867   5.041 0.0003737 ***
Residuals     53 1.03734 0.01957                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -2.63043    0.04667 -56.365  < 2e-16 ***
poly(date, 3)1 -0.76742    0.14078  -5.451 1.33e-06 ***
poly(date, 3)2 -1.08763    0.13991  -7.774 2.58e-10 ***
poly(date, 3)3  0.46106    0.14190   3.249  0.00201 ** 
dayMon          0.13816    0.06597   2.094  0.04105 *  
dayTue         -0.12832    0.06605  -1.943  0.05735 .  
dayWed         -0.08010    0.06617  -1.211  0.23140    
dayThu         -0.12394    0.06634  -1.868  0.06725 .  
dayFri         -0.17354    0.06605  -2.627  0.01123 *  
daySat         -0.09323    0.06597  -1.413  0.16347    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1399 on 53 degrees of freedom
Multiple R-squared:  0.714,	Adjusted R-squared:  0.6655 
F-statistic:  14.7 on 9 and 53 DF,  p-value: 1.385e-11
That has a pretty similar (small, F <=5) weekday effect to cases by publish date
PubCases.png
Code: Select all
Analysis of Variance Table

Response: log(PubCases)
              Df  Sum Sq Mean Sq F value   Pr(>F)    
poly(date, 3)  3 1.71301 0.57100 28.4255  4.3e-11 ***
day            6 0.39217 0.06536  3.2538 0.008492 ** 
Residuals     53 1.06465 0.02009                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.81603    0.04756 206.393  < 2e-16 ***
poly(date, 3)1 -0.53409    0.14262  -3.745 0.000446 ***
poly(date, 3)2 -0.85512    0.14174  -6.033 1.60e-07 ***
poly(date, 3)3  0.79031    0.14376   5.498 1.13e-06 ***
dayMon         -0.05330    0.06684  -0.797 0.428773    
dayTue         -0.06282    0.06692  -0.939 0.352065    
dayWed          0.12376    0.06704   1.846 0.070454 .  
dayThu          0.14975    0.06721   2.228 0.030135 *  
dayFri          0.09062    0.06743   1.344 0.184700    
daySat          0.08855    0.06770   1.308 0.196555    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1417 on 53 degrees of freedom
Multiple R-squared:  0.6641,	Adjusted R-squared:  0.6071 
F-statistic: 11.64 on 9 and 53 DF,  p-value: 7.734e-10
So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.

Looking at this tweet - Ironically by someone who seems to be a supporter of Clare Path

https://twitter.com/TarboilerBill/statu ... 7791280128

@ClareCraigPath
you assume the Labs produce consistent results. They don’t appear too in Scotland. Almost double the chance of getting a +ve diagnoses at the weekend.

EpG7EEcXcAA2ia1.jpg (57.71 KiB) Viewed 6716 times

My reply:

Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?

shpalman · Post by **shpalman** » Mon Dec 14, 2020 3:56 pm

new variant of Covid may explain fast spread of virus in south of England

We have identified a new variant of coronavirus, which may be associated with the fastest spread in the south-east of England. Initial analysis suggests that this variant is growing faster than the existing variants. We’ve currently identified over 1,000 cases with this variant, predominantly in the south of England, although cases have been identified in nearly 60 different local authority areas, and numbers are increasing rapidly.

Little waster · Post by **Little waster** » Mon Dec 14, 2020 5:40 pm

One for the No sh.t Sherlock pile.

A report in the Sunday Times over the weekend suggests that the decision not to impose a circuit-breaker lockdown was influenced by a meeting involving the prime minister, the chancellor and three proponents of a “herd immunity” approach to managing the virus: Prof Sunetra Gupta and Prof Carl Heneghan of the University of Oxford and Prof Anders Tegnell, the Swedish epidemiologist who has masterminded Sweden’s catastrophic Covid control policy (in the last month, Sweden has reported 1,400 Covid deaths, while neighbours Norway and Finland, both of which have roughly half its population, reported 100 and 80 respectively). The delay in imposing national restrictions resulted in an estimated 1.3 million extra Covid infections.

If you asked me to list the three very worst scientists I'd want advising the Government on when and whether to impose a lockdown they'd be right at the top.

KAJ · Post by **KAJ** » Mon Dec 14, 2020 9:09 pm

jimbob wrote: Sun Dec 13, 2020 6:54 pm
KAJ wrote: Sun Dec 13, 2020 6:03 pm I've been musing on what might be behind the strong dependence of 'Cases by Specimen Date' on day of week. I surmised that it was due to the dependence of number of tests on day of week.
<snip>

Anyone got any suggestions I might investigate?
I think it depends on how the tests are handled. Hospital admissions, I guess would be less affected by weekends compared to drive in testing, so you might see a difference in the positivity ratio for the different types. Those isolating because of contacts might get tests but not hurry.

I can see how those could swing it either way.

Meanwhile - this is my (far simpler) plot of the tests vs specimen date for each day of the week against week number.

It tells the same story as your more in-depth analysis - but I think the graph is easy to see.
<snip graph>

jimbob wrote: Sun Dec 13, 2020 10:09 pm
KAJ wrote: Sun Dec 13, 2020 8:49 pm
shpalman wrote: Sun Dec 13, 2020 6:37 pm Depends how it's handled if a swab is carried out on one day but processed the next day.
Good point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.

<snip>

So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.
<snip>

Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?

So I'm ready to believe that the dependence of cases by specimen date on weekday is (at least) largely due to weekday variation in number, nature etc. of specimens. I don't have the data to investigate that, but it doesn't really matter. The weekday dependence is really just a nuisance factor clouding the view of the time (date) dependence which I really want to see.

Some people use cases by publish date, which reduces the weekday dependence. Others use 7-day moving averages which practically eliminates the weekday dependence. But both of these effectively associate some cases with dates other than the specimen date, distorting the date dependence I want to see.

Better is to model the weekday dependence to remove its effect. I may not know how it originates but I can model it quite well, it is reasonably constant (in logs) from week to week.

I quite like jimbob's representation with one line for each weekday. I can use it for the models I've been fitting (ggplot2 is nice!). This is today's data and fitted model in my representation:

: xy1.png (20.65 KiB) Viewed 6572 times

This is the same in jimbob's representation:

: xy2.png (58.75 KiB) Viewed 6572 times

That presentation makes clear that the date dependence is modelled identically for the different weekdays, they differ only in their vertical position (and that the modelled points are at different dates). I could allow the date dependence to differ between weekdays by nesting the date polynomial in weekdays, giving this:

: xy3.png (59.49 KiB) Viewed 6572 times

That graph suggests to me that the date dependence is very similar for the different weekdays so it isn't worth going to the nested model. Especially as the nested model has 28 coefficients (each weekday has a constant + 3 polynomial coefficients) compared to 10 for the non-nested model (3 polynomial coefficients + one constant per weekday). These examples have 63 dates (of which 5 are zero weighted). The non-nested model estimates 10 coefficients from 58 data which seems reasonable. The nested model estimates 28 coefficients from 58 data which seems a stretch. Putting it another way, each weekday occurs 9 times in the 63 dates. Fitting a cubic (or even a quadratic) to 9 points feels too much of a stretch. So I'm not going to use the nested model.

Having decided that, I now prefer my representation to jimbob's, which implies separate curves for each weekday. I think my representation more clearly represents the model and its relation to the data.

jimbob · Post by **jimbob** » Mon Dec 14, 2020 9:38 pm

KAJ wrote: Mon Dec 14, 2020 9:09 pm
jimbob wrote: Sun Dec 13, 2020 6:54 pm
KAJ wrote: Sun Dec 13, 2020 6:03 pm I've been musing on what might be behind the strong dependence of 'Cases by Specimen Date' on day of week. I surmised that it was due to the dependence of number of tests on day of week.
<snip>

Anyone got any suggestions I might investigate?
I think it depends on how the tests are handled. Hospital admissions, I guess would be less affected by weekends compared to drive in testing, so you might see a difference in the positivity ratio for the different types. Those isolating because of contacts might get tests but not hurry.

I can see how those could swing it either way.

Meanwhile - this is my (far simpler) plot of the tests vs specimen date for each day of the week against week number.

It tells the same story as your more in-depth analysis - but I think the graph is easy to see.
<snip graph>

jimbob wrote: Sun Dec 13, 2020 10:09 pm
KAJ wrote: Sun Dec 13, 2020 8:49 pm
Good point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.

<snip>

So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.
<snip>

Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?

So I'm ready to believe that the dependence of cases by specimen date on weekday is (at least) largely due to weekday variation in number, nature etc. of specimens. I don't have the data to investigate that, but it doesn't really matter. The weekday dependence is really just a nuisance factor clouding the view of the time (date) dependence which I really want to see.

Some people use cases by publish date, which reduces the weekday dependence. Others use 7-day moving averages which practically eliminates the weekday dependence. But both of these effectively associate some cases with dates other than the specimen date, distorting the date dependence I want to see.

Better is to model the weekday dependence to remove its effect. I may not know how it originates but I can model it quite well, it is reasonably constant (in logs) from week to week.

I quite like jimbob's representation with one line for each weekday. I can use it for the models I've been fitting (ggplot2 is nice!). This is today's data and fitted model in my representation:
xy1.png
This is the same in jimbob's representation:
xy2.png

That presentation makes clear that the date dependence is modelled identically for the different weekdays, they differ only in their vertical position (and that the modelled points are at different dates). I could allow the date dependence to differ between weekdays by nesting the date polynomial in weekdays, giving this:
xy3.png

That graph suggests to me that the date dependence is very similar for the different weekdays so it isn't worth going to the nested model. Especially as the nested model has 28 coefficients (each weekday has a constant + 3 polynomial coefficients) compared to 10 for the non-nested model (3 polynomial coefficients + one constant per weekday). These examples have 63 dates (of which 5 are zero weighted). The non-nested model estimates 10 coefficients from 58 data which seems reasonable. The nested model estimates 28 coefficients from 58 data which seems a stretch. Putting it another way, each weekday occurs 9 times in the 63 dates. Fitting a cubic (or even a quadratic) to 9 points feels too much of a stretch. So I'm not going to use the nested model.

Having decided that, I now prefer my representation to jimbob's, which implies separate curves for each weekday. I think my representation more clearly represents the model and its relation to the data.

Yes, I prefer yours too. My main point was to use the colour scale to show the weekday numbers. I could probably improve mine but this is pretty much how I approach work. If it's a subtle effect - there's usually a larger effect to deal with, or it's not a problem. I tend to use graphs (including spatial data) rather than more formal statistical tests for about 90% of the work I do.

sTeamTraen · Post by **sTeamTraen** » Tue Dec 15, 2020 2:21 pm

Well, this is very boring. ECDC will no longer be producing daily data summaries.

sTeamTraen · Post by **sTeamTraen** » Tue Dec 15, 2020 5:06 pm

sTeamTraen wrote: Tue Dec 15, 2020 2:21 pm Well, this is very boring. ECDC will no longer be producing daily data summaries.

I tweaked my code to use the Johns Hopkins data.

UK 7-day moving average of new cases is 19772 - the highest since 23 November, a week before the end of the lockdown.

KAJ · Post by **KAJ** » Tue Dec 15, 2020 8:36 pm

sTeamTraen wrote: Tue Dec 15, 2020 5:06 pm
sTeamTraen wrote: Tue Dec 15, 2020 2:21 pm Well, this is very boring. ECDC will no longer be producing daily data summaries.
I tweaked my code to use the Johns Hopkins data.

UK 7-day moving average of new cases is 19772 - the highest since 23 November, a week before the end of the lockdown.

I also tweaked my code which uses the coronavirus.data.gov.uk data. I now correct for the weekday effect by estimating it and generating a fit with all points having a constant (Sunday) weekday effect, this shows trends much more clearly. Remember the latest 5 points are marked as "incomplete" by .gov.uk so I've given them zero weight in the fit.

: SpecCases.png (22.35 KiB) Viewed 6395 times

I agree with you, the weekday corrected values have been increasing since the end of lockdown.

In case anyone's interested, here are some regression statistics,

Code: Select all

Analysis of Variance Table

Response: log(SpecCases)
              Df  Sum Sq Mean Sq F value    Pr(>F)    
poly(date, 3)  3 1.35257 0.45086  44.233 7.383e-14 ***
day            6 2.00832 0.33472  32.839 2.185e-15 ***
Residuals     48 0.48926 0.01019                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.56115    0.03601 265.502  < 2e-16 ***
poly(date, 3)1 -0.72911    0.13406  -5.439 1.78e-06 ***
poly(date, 3)2 -0.56530    0.14042  -4.026 0.000201 ***
poly(date, 3)3  0.86350    0.13549   6.373 6.73e-08 ***
dayMon          0.53046    0.04930  10.759 2.19e-14 ***
dayTue          0.42975    0.04921   8.733 1.76e-11 ***
dayWed          0.39323    0.05073   7.751 5.25e-10 ***
dayThu          0.31430    0.05062   6.209 1.20e-07 ***
dayFri          0.28242    0.05054   5.588 1.06e-06 ***
daySat          0.02138    0.05050   0.423 0.673923    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.101 on 48 degrees of freedom
Multiple R-squared:  0.8729,	Adjusted R-squared:  0.8491 
F-statistic: 36.64 on 9 and 48 DF,  p-value: < 2.2e-16

KAJ · Post by **KAJ** » Wed Dec 16, 2020 9:20 am

Just to counter any suggestion that the changes in case numbers were due to change in numbers, type etc. of specimens, here's a similar analysis of hospital admissions. Smaller weekday effect. Very similar shape of date effect.

: Admits.png (15.79 KiB) Viewed 6335 times

Code: Select all

Analysis of Variance Table

Response: log(Admits)
              Df  Sum Sq Mean Sq  F value    Pr(>F)    
poly(date, 3)  3 2.33169 0.77723 160.6072 < 2.2e-16 ***
day            6 0.15234 0.02539   5.2466 0.0002649 ***
Residuals     53 0.25648 0.00484                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     7.23734    0.02326 311.202  < 2e-16 ***
poly(date, 3)1  1.01718    0.07000  14.531  < 2e-16 ***
poly(date, 3)2 -0.98118    0.06957 -14.103  < 2e-16 ***
poly(date, 3)3  0.54315    0.07056   7.698 3.41e-10 ***
dayMon          0.04039    0.03281   1.231   0.2236    
dayTue          0.05327    0.03284   1.622   0.1107    
dayWed          0.07238    0.03290   2.200   0.0322 *  
dayThu          0.04636    0.03298   1.405   0.1657    
dayFri         -0.05533    0.03310  -1.672   0.1004    
daySat         -0.05851    0.03281  -1.783   0.0802 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06957 on 53 degrees of freedom
Multiple R-squared:  0.9064,	Adjusted R-squared:  0.8905 
F-statistic: 57.03 on 9 and 53 DF,  p-value: < 2.2e-16

sTeamTraen · Post by **sTeamTraen** » Wed Dec 16, 2020 3:09 pm

KAJ wrote: Tue Dec 15, 2020 8:36 pm In case anyone's interested, here are some regression statistics,

Do these data meet the assumptions for performing a regression analysis? I don't think we can treat cases or deaths as a random variable, because of the nature of the process(es) driving them. So you would need a specific analysis for time series --- perhaps a latent growth curve model.

On a related note, we can't interpret the p values either, as we are not sampling. We have the entire population of cases or deaths. (Well, some may be missing, but we're not trying to infer that.)

KAJ · Post by **KAJ** » Wed Dec 16, 2020 5:52 pm

sTeamTraen wrote: Wed Dec 16, 2020 3:09 pm
KAJ wrote: Tue Dec 15, 2020 8:36 pm In case anyone's interested, here are some regression statistics,
Do these data meet the assumptions for performing a regression analysis? I don't think we can treat cases or deaths as a random variable, because of the nature of the process(es) driving them. So you would need a specific analysis for time series --- perhaps a latent growth curve model.

On a related note, we can't interpret the p values either, as we are not sampling. We have the entire population of cases or deaths. (Well, some may be missing, but we're not trying to infer that.)

I should emphasise more often and more clearly that I'm not using the regressions to draw inferences, and I'd include as inferences interpreting p values or doing significance tests or making predictions. I'm using the regressions simply as descriptive statistics - the data can be (approximately) described as "a curve of <this> form with weekday dependent offsets of <this>". A complete description would need (approximately) as many parameters as the number of data. The R-sq indicates the completeness of the regression description, I'm pleasantly surprised that a cubic curve and simple weekday offsets fits over 80% of the variation in the data. The ANOVA indicates how much that accuracy depends on the curve and on the weekday offsets.

If we were using the regression to assess the validity of a process model and estimate it's parameters, I would share your reservations. I really wouldn't want to model the process, and I would take a lot of convincing that a single model was applicable for an extended period.

Thanks for the insightful and apposite comment.

Post by **Bird on a Fire** » Wed Dec 16, 2020 6:45 pm

Thanks for the interesting posts, KAJ.

Re modelling - that James Annan on twitter gets a good fit using a random walk Markov chain approach. He's not modelling the process exactly, certainly not mechanistically, beyond saying that the next week's figures will be directly dependent on the previous week's by an exponent drawn from some distribution or other. I think he constrains those distributions a bit according to lockdown level at the time, but basically lets the modelling process estimate it from the data.

But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good

sTeamTraen · Post by **sTeamTraen** » Wed Dec 16, 2020 8:03 pm

Today's UK numbers are the highest since 24 November (deaths) / 13 November (cases).

There is a week to go of restrictions that are less severe than the November lockdown, and then there will be 5 days of merriment, with subsidised coach travel (seriously, WTAF)

At this rate, the first couple of weeks of January are going to see horrors the like of which the NHS has never faced in its existence, even without Brexit.

KAJ · Post by **KAJ** » Wed Dec 16, 2020 9:05 pm

Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good

I'm really not very happy with the cubic.

I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.

When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.

When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.

I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?

Comments and suggestions welcome!

shpalman · Post by **shpalman** » Wed Dec 16, 2020 9:14 pm

KAJ wrote: Wed Dec 16, 2020 9:05 pm
Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good
I'm really not very happy with the cubic.

I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.

When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.

When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.

I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?

Comments and suggestions welcome!

To get a line which makes physical sense you need a model of the underlying physical process.

AMS · Post by **AMS** » Wed Dec 16, 2020 9:28 pm

Any curve fit that tends to either plus or minus infinity is never going to work for long. In the meantime you're just feeding it more parameters to constrain it nearer to the data for a bit longer, but these don't actually tell you anything.

KAJ · Post by **KAJ** » Wed Dec 16, 2020 9:34 pm

shpalman wrote: Wed Dec 16, 2020 9:14 pm To get a line which makes physical sense you need a model of the underlying physical process.

I'm not looking for a line which makes physical sense. I'm just looking for a (relatively) simple and interpretable description of the data.

KAJ · Post by **KAJ** » Wed Dec 16, 2020 9:38 pm

AMS wrote: Wed Dec 16, 2020 9:28 pm Any curve fit that tends to either plus or minus infinity is never going to work for long. In the meantime you're just feeding it more parameters to constrain it nearer to the data for a bit longer, but these don't actually tell you anything.

Yes. I'm beginning to talk myself into restricting the range of the data.
My objective of a simple and interpretable description of the data conflicts with a data range covering more complex behaviours.

Post by **Bird on a Fire** » Wed Dec 16, 2020 9:39 pm

I think loess is a decent idea after a while, to describe overall movements in the location of the data. You could split the data into sections and fit your own curves to them, but really that's just you being an idiosyncratic local regression algorithm

And I don't know that modelling the underlying process in much detail really is necessary - random walks work pretty well for a lot of demographic data, and as I mentioned a few posts ago seem to be being used with some degree of success for UK covid numbers specifically. I'm not quite sure what shpalman means by 'physical sense' though.

KAJ · Post by **KAJ** » Wed Dec 16, 2020 9:51 pm

As many as 11,000 new coronavirus cases will be recorded in Thursday's update from Public Health Wales due to a massive backlog of samples processed in lighthouse laboratories.

sTeamTraen · Post by **sTeamTraen** » Wed Dec 16, 2020 10:07 pm

KAJ wrote: Wed Dec 16, 2020 9:51 pm As many as 11,000 new coronavirus cases will be recorded in Thursday's update from Public Health Wales due to a massive backlog of samples processed in lighthouse laboratories.

It seems from the RHS of the chart on that page that there are several days of recent cases missing, with the peak before the absence being around ~2,000 per day. The 11,000 are apparently backlogged from last weekend, so perhaps to be spread over 5 days. That will probably bring the number for the last 5 days up to around 2,400/day. For 3.1 million people that's close to what Iowa, Nevada, and Arkansas are experiencing. It means 1% of the population will get infected in less than a fortnight.

Wales has about 180 ICU beds, and I've been told by an ICU doctor that the average stay of a COVID-9 patient, whatever the outcome, is at least 3 weeks. That means that once they are full, beds will turn over at around 9-10 per day. We're looking at a humanitarian tragedy (and PTSD on a previously unheard-of scale among medical staff, I imagine).

jimbob · Post by **jimbob** » Wed Dec 16, 2020 10:39 pm

shpalman wrote: Wed Dec 16, 2020 9:14 pm
KAJ wrote: Wed Dec 16, 2020 9:05 pm
Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good
I'm really not very happy with the cubic.

I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.

When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.

When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.

I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?

Comments and suggestions welcome!
To get a line which makes physical sense you need a model of the underlying physical process.

And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.

we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.

*I guess that an argument can be made for geometric means

KAJ · Post by **KAJ** » Thu Dec 17, 2020 10:15 am

jimbob wrote: Wed Dec 16, 2020 10:39 pm <snip>
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.

we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.

*I guess that an argument can be made for geometric means

* I don't think a good argument can be made for geometric means.

The sum of a weeks daily counts (cases, patients, deaths, ...) is meaningful. Indeed, a daily count is [can be interpreted as] the sum of hourly counts. The counts that agencies report are the sums of counts from different places.

But what does the product of counts mean? And do we really want a statistic that is zero if any datum is zero?

I suggest that the principal reasons we plot on log axes in this application are (a) it is the most familiar way of handling wide dynamic ranges (b) it gives a straight line for exponential growth. But neither of these are a reason for geometric means.

If we want a measure of central tendency that is more robust to outliers, then a median might be a better choice. But I haven't found that outliers are a problem here. Others experience may differ.

jimbob · Post by **jimbob** » Thu Dec 17, 2020 10:29 am

KAJ wrote: Thu Dec 17, 2020 10:15 am
jimbob wrote: Wed Dec 16, 2020 10:39 pm <snip>
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.

we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.

*I guess that an argument can be made for geometric means
* I don't think a good argument can be made for geometric means.

The sum of a weeks daily counts (cases, patients, deaths, ...) is meaningful. Indeed, a daily count is [can be interpreted as] the sum of hourly counts. The counts that agencies report are the sums of counts from different places.

But what does the product of counts mean? And do we really want a statistic that is zero if any datum is zero?

I suggest that the principal reasons we plot on log axes in this application are (a) it is the most familiar way of handling wide dynamic ranges (b) it gives a straight line for exponential growth. But neither of these are a reason for geometric means.

If we want a measure of central tendency that is more robust to outliers, then a median might be a better choice. But I haven't found that outliers are a problem here. Others experience may differ.

Yes I was having a brainfart

Scrutable

COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19

Re: COVID-19