COVID-19

Discussions about serious topics, for serious people
Locked
User avatar
shpalman
Princess POW
Posts: 8621
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: COVID-19

Post by shpalman »

shpalman wrote: Sun Dec 13, 2020 5:00 pm Lombardy is a yellow zone as of today, and it was a nice day, so you can imagine everyone went into town.

This was taken at about 15:20.

Image

To be fair, the narrow streets in the centre are usually almost impassable during the weekends in December.
Italy let people go to the shops and everyone went to the shops...
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
User avatar
jimbob
Light of Blast
Posts: 5665
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: COVID-19

Post by jimbob »

KAJ wrote: Sun Dec 13, 2020 8:49 pm
shpalman wrote: Sun Dec 13, 2020 6:37 pm Depends how it's handled if a swab is carried out on one day but processed the next day.
Good point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.

In developers-guide#params-structure I don't see a metric for tests other than by publish date. On the other hand I don't see "newPillarOneTwoTestsByPublishDate" either, which I've been using for some time and must have picked up from coronavirus.data.gov.uk in the first place.

I guess the dates of cases by publish date may correspond to those of tests by publish date. Taking that ratio I get:
CasesTests.png

Code: Select all

Analysis of Variance Table

Response: log(Cases.Tests)
              Df  Sum Sq Mean Sq F value    Pr(>F)    
poly(date, 3)  3 1.99814 0.66605  34.030  2.12e-12 ***
day            6 0.59199 0.09867   5.041 0.0003737 ***
Residuals     53 1.03734 0.01957                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -2.63043    0.04667 -56.365  < 2e-16 ***
poly(date, 3)1 -0.76742    0.14078  -5.451 1.33e-06 ***
poly(date, 3)2 -1.08763    0.13991  -7.774 2.58e-10 ***
poly(date, 3)3  0.46106    0.14190   3.249  0.00201 ** 
dayMon          0.13816    0.06597   2.094  0.04105 *  
dayTue         -0.12832    0.06605  -1.943  0.05735 .  
dayWed         -0.08010    0.06617  -1.211  0.23140    
dayThu         -0.12394    0.06634  -1.868  0.06725 .  
dayFri         -0.17354    0.06605  -2.627  0.01123 *  
daySat         -0.09323    0.06597  -1.413  0.16347    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1399 on 53 degrees of freedom
Multiple R-squared:  0.714,	Adjusted R-squared:  0.6655 
F-statistic:  14.7 on 9 and 53 DF,  p-value: 1.385e-11
That has a pretty similar (small, F <=5) weekday effect to cases by publish date
PubCases.png

Code: Select all

Analysis of Variance Table

Response: log(PubCases)
              Df  Sum Sq Mean Sq F value   Pr(>F)    
poly(date, 3)  3 1.71301 0.57100 28.4255  4.3e-11 ***
day            6 0.39217 0.06536  3.2538 0.008492 ** 
Residuals     53 1.06465 0.02009                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.81603    0.04756 206.393  < 2e-16 ***
poly(date, 3)1 -0.53409    0.14262  -3.745 0.000446 ***
poly(date, 3)2 -0.85512    0.14174  -6.033 1.60e-07 ***
poly(date, 3)3  0.79031    0.14376   5.498 1.13e-06 ***
dayMon         -0.05330    0.06684  -0.797 0.428773    
dayTue         -0.06282    0.06692  -0.939 0.352065    
dayWed          0.12376    0.06704   1.846 0.070454 .  
dayThu          0.14975    0.06721   2.228 0.030135 *  
dayFri          0.09062    0.06743   1.344 0.184700    
daySat          0.08855    0.06770   1.308 0.196555    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1417 on 53 degrees of freedom
Multiple R-squared:  0.6641,	Adjusted R-squared:  0.6071 
F-statistic: 11.64 on 9 and 53 DF,  p-value: 7.734e-10
So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.
Looking at this tweet - Ironically by someone who seems to be a supporter of Clare Path

https://twitter.com/TarboilerBill/statu ... 7791280128
@ClareCraigPath
you assume the Labs produce consistent results. They don’t appear too in Scotland. Almost double the chance of getting a +ve diagnoses at the weekend.
EpG7EEcXcAA2ia1.jpg
EpG7EEcXcAA2ia1.jpg (57.71 KiB) Viewed 5699 times
My reply:
Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?
Have you considered stupidity as an explanation
User avatar
shpalman
Princess POW
Posts: 8621
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: COVID-19

Post by shpalman »

new variant of Covid may explain fast spread of virus in south of England
We have identified a new variant of coronavirus, which may be associated with the fastest spread in the south-east of England. Initial analysis suggests that this variant is growing faster than the existing variants. We’ve currently identified over 1,000 cases with this variant, predominantly in the south of England, although cases have been identified in nearly 60 different local authority areas, and numbers are increasing rapidly.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
User avatar
Little waster
After Pie
Posts: 2385
Joined: Tue Nov 12, 2019 12:35 am
Location: About 1 inch behind my eyes

Re: COVID-19

Post by Little waster »

One for the No sh.t Sherlock pile.
A report in the Sunday Times over the weekend suggests that the decision not to impose a circuit-breaker lockdown was influenced by a meeting involving the prime minister, the chancellor and three proponents of a “herd immunity” approach to managing the virus: Prof Sunetra Gupta and Prof Carl Heneghan of the University of Oxford and Prof Anders Tegnell, the Swedish epidemiologist who has masterminded Sweden’s catastrophic Covid control policy (in the last month, Sweden has reported 1,400 Covid deaths, while neighbours Norway and Finland, both of which have roughly half its population, reported 100 and 80 respectively). The delay in imposing national restrictions resulted in an estimated 1.3 million extra Covid infections.
If you asked me to list the three very worst scientists I'd want advising the Government on when and whether to impose a lockdown they'd be right at the top.
This place is not a place of honor, no highly esteemed deed is commemorated here, nothing valued is here.
What is here was dangerous and repulsive to us.
This place is best shunned and left uninhabited.
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

jimbob wrote: Sun Dec 13, 2020 6:54 pm
KAJ wrote: Sun Dec 13, 2020 6:03 pm I've been musing on what might be behind the strong dependence of 'Cases by Specimen Date' on day of week. I surmised that it was due to the dependence of number of tests on day of week.
<snip>

Anyone got any suggestions I might investigate?
I think it depends on how the tests are handled. Hospital admissions, I guess would be less affected by weekends compared to drive in testing, so you might see a difference in the positivity ratio for the different types. Those isolating because of contacts might get tests but not hurry.

I can see how those could swing it either way.

Meanwhile - this is my (far simpler) plot of the tests vs specimen date for each day of the week against week number.

It tells the same story as your more in-depth analysis - but I think the graph is easy to see.
<snip graph>
jimbob wrote: Sun Dec 13, 2020 10:09 pm
KAJ wrote: Sun Dec 13, 2020 8:49 pm
shpalman wrote: Sun Dec 13, 2020 6:37 pm Depends how it's handled if a swab is carried out on one day but processed the next day.
Good point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.

<snip>

So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.
<snip>
Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?
So I'm ready to believe that the dependence of cases by specimen date on weekday is (at least) largely due to weekday variation in number, nature etc. of specimens. I don't have the data to investigate that, but it doesn't really matter. The weekday dependence is really just a nuisance factor clouding the view of the time (date) dependence which I really want to see.

Some people use cases by publish date, which reduces the weekday dependence. Others use 7-day moving averages which practically eliminates the weekday dependence. But both of these effectively associate some cases with dates other than the specimen date, distorting the date dependence I want to see.

Better is to model the weekday dependence to remove its effect. I may not know how it originates but I can model it quite well, it is reasonably constant (in logs) from week to week.

I quite like jimbob's representation with one line for each weekday. I can use it for the models I've been fitting (ggplot2 is nice!). This is today's data and fitted model in my representation:
xy1.png
xy1.png (20.65 KiB) Viewed 5555 times
This is the same in jimbob's representation:
xy2.png
xy2.png (58.75 KiB) Viewed 5555 times
That presentation makes clear that the date dependence is modelled identically for the different weekdays, they differ only in their vertical position (and that the modelled points are at different dates). I could allow the date dependence to differ between weekdays by nesting the date polynomial in weekdays, giving this:
xy3.png
xy3.png (59.49 KiB) Viewed 5555 times
That graph suggests to me that the date dependence is very similar for the different weekdays so it isn't worth going to the nested model. Especially as the nested model has 28 coefficients (each weekday has a constant + 3 polynomial coefficients) compared to 10 for the non-nested model (3 polynomial coefficients + one constant per weekday). These examples have 63 dates (of which 5 are zero weighted). The non-nested model estimates 10 coefficients from 58 data which seems reasonable. The nested model estimates 28 coefficients from 58 data which seems a stretch. Putting it another way, each weekday occurs 9 times in the 63 dates. Fitting a cubic (or even a quadratic) to 9 points feels too much of a stretch. So I'm not going to use the nested model.

Having decided that, I now prefer my representation to jimbob's, which implies separate curves for each weekday. I think my representation more clearly represents the model and its relation to the data.
User avatar
jimbob
Light of Blast
Posts: 5665
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: COVID-19

Post by jimbob »

KAJ wrote: Mon Dec 14, 2020 9:09 pm
jimbob wrote: Sun Dec 13, 2020 6:54 pm
KAJ wrote: Sun Dec 13, 2020 6:03 pm I've been musing on what might be behind the strong dependence of 'Cases by Specimen Date' on day of week. I surmised that it was due to the dependence of number of tests on day of week.
<snip>

Anyone got any suggestions I might investigate?
I think it depends on how the tests are handled. Hospital admissions, I guess would be less affected by weekends compared to drive in testing, so you might see a difference in the positivity ratio for the different types. Those isolating because of contacts might get tests but not hurry.

I can see how those could swing it either way.

Meanwhile - this is my (far simpler) plot of the tests vs specimen date for each day of the week against week number.

It tells the same story as your more in-depth analysis - but I think the graph is easy to see.
<snip graph>
jimbob wrote: Sun Dec 13, 2020 10:09 pm
KAJ wrote: Sun Dec 13, 2020 8:49 pm
Good point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.

<snip>

So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.
<snip>
Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?
So I'm ready to believe that the dependence of cases by specimen date on weekday is (at least) largely due to weekday variation in number, nature etc. of specimens. I don't have the data to investigate that, but it doesn't really matter. The weekday dependence is really just a nuisance factor clouding the view of the time (date) dependence which I really want to see.

Some people use cases by publish date, which reduces the weekday dependence. Others use 7-day moving averages which practically eliminates the weekday dependence. But both of these effectively associate some cases with dates other than the specimen date, distorting the date dependence I want to see.

Better is to model the weekday dependence to remove its effect. I may not know how it originates but I can model it quite well, it is reasonably constant (in logs) from week to week.

I quite like jimbob's representation with one line for each weekday. I can use it for the models I've been fitting (ggplot2 is nice!). This is today's data and fitted model in my representation:
xy1.png
This is the same in jimbob's representation:
xy2.png

That presentation makes clear that the date dependence is modelled identically for the different weekdays, they differ only in their vertical position (and that the modelled points are at different dates). I could allow the date dependence to differ between weekdays by nesting the date polynomial in weekdays, giving this:
xy3.png

That graph suggests to me that the date dependence is very similar for the different weekdays so it isn't worth going to the nested model. Especially as the nested model has 28 coefficients (each weekday has a constant + 3 polynomial coefficients) compared to 10 for the non-nested model (3 polynomial coefficients + one constant per weekday). These examples have 63 dates (of which 5 are zero weighted). The non-nested model estimates 10 coefficients from 58 data which seems reasonable. The nested model estimates 28 coefficients from 58 data which seems a stretch. Putting it another way, each weekday occurs 9 times in the 63 dates. Fitting a cubic (or even a quadratic) to 9 points feels too much of a stretch. So I'm not going to use the nested model.

Having decided that, I now prefer my representation to jimbob's, which implies separate curves for each weekday. I think my representation more clearly represents the model and its relation to the data.
Yes, I prefer yours too. My main point was to use the colour scale to show the weekday numbers. I could probably improve mine but this is pretty much how I approach work. If it's a subtle effect - there's usually a larger effect to deal with, or it's not a problem. I tend to use graphs (including spatial data) rather than more formal statistical tests for about 90% of the work I do.
Have you considered stupidity as an explanation
User avatar
sTeamTraen
Stummy Beige
Posts: 2601
Joined: Mon Nov 11, 2019 4:24 pm
Location: Palma de Mallorca, Spain

Re: COVID-19

Post by sTeamTraen »

Something something hammer something something nail
User avatar
sTeamTraen
Stummy Beige
Posts: 2601
Joined: Mon Nov 11, 2019 4:24 pm
Location: Palma de Mallorca, Spain

Re: COVID-19

Post by sTeamTraen »

sTeamTraen wrote: Tue Dec 15, 2020 2:21 pm Well, this is very boring. ECDC will no longer be producing daily data summaries. :cry:
I tweaked my code to use the Johns Hopkins data.

UK 7-day moving average of new cases is 19772 - the highest since 23 November, a week before the end of the lockdown.
Something something hammer something something nail
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

sTeamTraen wrote: Tue Dec 15, 2020 5:06 pm
sTeamTraen wrote: Tue Dec 15, 2020 2:21 pm Well, this is very boring. ECDC will no longer be producing daily data summaries. :cry:
I tweaked my code to use the Johns Hopkins data.

UK 7-day moving average of new cases is 19772 - the highest since 23 November, a week before the end of the lockdown.
I also tweaked my code which uses the coronavirus.data.gov.uk data. I now correct for the weekday effect by estimating it and generating a fit with all points having a constant (Sunday) weekday effect, this shows trends much more clearly. Remember the latest 5 points are marked as "incomplete" by .gov.uk so I've given them zero weight in the fit.
SpecCases.png
SpecCases.png (22.35 KiB) Viewed 5378 times
I agree with you, the weekday corrected values have been increasing since the end of lockdown.

In case anyone's interested, here are some regression statistics,

Code: Select all

Analysis of Variance Table

Response: log(SpecCases)
              Df  Sum Sq Mean Sq F value    Pr(>F)    
poly(date, 3)  3 1.35257 0.45086  44.233 7.383e-14 ***
day            6 2.00832 0.33472  32.839 2.185e-15 ***
Residuals     48 0.48926 0.01019                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.56115    0.03601 265.502  < 2e-16 ***
poly(date, 3)1 -0.72911    0.13406  -5.439 1.78e-06 ***
poly(date, 3)2 -0.56530    0.14042  -4.026 0.000201 ***
poly(date, 3)3  0.86350    0.13549   6.373 6.73e-08 ***
dayMon          0.53046    0.04930  10.759 2.19e-14 ***
dayTue          0.42975    0.04921   8.733 1.76e-11 ***
dayWed          0.39323    0.05073   7.751 5.25e-10 ***
dayThu          0.31430    0.05062   6.209 1.20e-07 ***
dayFri          0.28242    0.05054   5.588 1.06e-06 ***
daySat          0.02138    0.05050   0.423 0.673923    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.101 on 48 degrees of freedom
Multiple R-squared:  0.8729,	Adjusted R-squared:  0.8491 
F-statistic: 36.64 on 9 and 48 DF,  p-value: < 2.2e-16
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

Just to counter any suggestion that the changes in case numbers were due to change in numbers, type etc. of specimens, here's a similar analysis of hospital admissions. Smaller weekday effect. Very similar shape of date effect.
Admits.png
Admits.png (15.79 KiB) Viewed 5318 times

Code: Select all

Analysis of Variance Table

Response: log(Admits)
              Df  Sum Sq Mean Sq  F value    Pr(>F)    
poly(date, 3)  3 2.33169 0.77723 160.6072 < 2.2e-16 ***
day            6 0.15234 0.02539   5.2466 0.0002649 ***
Residuals     53 0.25648 0.00484                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     7.23734    0.02326 311.202  < 2e-16 ***
poly(date, 3)1  1.01718    0.07000  14.531  < 2e-16 ***
poly(date, 3)2 -0.98118    0.06957 -14.103  < 2e-16 ***
poly(date, 3)3  0.54315    0.07056   7.698 3.41e-10 ***
dayMon          0.04039    0.03281   1.231   0.2236    
dayTue          0.05327    0.03284   1.622   0.1107    
dayWed          0.07238    0.03290   2.200   0.0322 *  
dayThu          0.04636    0.03298   1.405   0.1657    
dayFri         -0.05533    0.03310  -1.672   0.1004    
daySat         -0.05851    0.03281  -1.783   0.0802 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06957 on 53 degrees of freedom
Multiple R-squared:  0.9064,	Adjusted R-squared:  0.8905 
F-statistic: 57.03 on 9 and 53 DF,  p-value: < 2.2e-16
User avatar
sTeamTraen
Stummy Beige
Posts: 2601
Joined: Mon Nov 11, 2019 4:24 pm
Location: Palma de Mallorca, Spain

Re: COVID-19

Post by sTeamTraen »

KAJ wrote: Tue Dec 15, 2020 8:36 pm In case anyone's interested, here are some regression statistics,
Do these data meet the assumptions for performing a regression analysis? I don't think we can treat cases or deaths as a random variable, because of the nature of the process(es) driving them. So you would need a specific analysis for time series --- perhaps a latent growth curve model.

On a related note, we can't interpret the p values either, as we are not sampling. We have the entire population of cases or deaths. (Well, some may be missing, but we're not trying to infer that.)
Something something hammer something something nail
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

sTeamTraen wrote: Wed Dec 16, 2020 3:09 pm
KAJ wrote: Tue Dec 15, 2020 8:36 pm In case anyone's interested, here are some regression statistics,
Do these data meet the assumptions for performing a regression analysis? I don't think we can treat cases or deaths as a random variable, because of the nature of the process(es) driving them. So you would need a specific analysis for time series --- perhaps a latent growth curve model.

On a related note, we can't interpret the p values either, as we are not sampling. We have the entire population of cases or deaths. (Well, some may be missing, but we're not trying to infer that.)
I should emphasise more often and more clearly that I'm not using the regressions to draw inferences, and I'd include as inferences interpreting p values or doing significance tests or making predictions. I'm using the regressions simply as descriptive statistics - the data can be (approximately) described as "a curve of <this> form with weekday dependent offsets of <this>". A complete description would need (approximately) as many parameters as the number of data. The R-sq indicates the completeness of the regression description, I'm pleasantly surprised that a cubic curve and simple weekday offsets fits over 80% of the variation in the data. The ANOVA indicates how much that accuracy depends on the curve and on the weekday offsets.

If we were using the regression to assess the validity of a process model and estimate it's parameters, I would share your reservations. I really wouldn't want to model the process, and I would take a lot of convincing that a single model was applicable for an extended period.

Thanks for the insightful and apposite comment.
User avatar
Bird on a Fire
Princess POW
Posts: 10142
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: COVID-19

Post by Bird on a Fire »

Thanks for the interesting posts, KAJ.

Re modelling - that James Annan on twitter gets a good fit using a random walk Markov chain approach. He's not modelling the process exactly, certainly not mechanistically, beyond saying that the next week's figures will be directly dependent on the previous week's by an exponent drawn from some distribution or other. I think he constrains those distributions a bit according to lockdown level at the time, but basically lets the modelling process estimate it from the data.

But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good :(
We have the right to a clean, healthy, sustainable environment.
User avatar
sTeamTraen
Stummy Beige
Posts: 2601
Joined: Mon Nov 11, 2019 4:24 pm
Location: Palma de Mallorca, Spain

Re: COVID-19

Post by sTeamTraen »

Today's UK numbers are the highest since 24 November (deaths) / 13 November (cases).

There is a week to go of restrictions that are less severe than the November lockdown, and then there will be 5 days of merriment, with subsidised coach travel (seriously, WTAF)

At this rate, the first couple of weeks of January are going to see horrors the like of which the NHS has never faced in its existence, even without Brexit.
Something something hammer something something nail
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good :(
I'm really not very happy with the cubic.

I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.

When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.

When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.

I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?

Comments and suggestions welcome!
User avatar
shpalman
Princess POW
Posts: 8621
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: COVID-19

Post by shpalman »

KAJ wrote: Wed Dec 16, 2020 9:05 pm
Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good :(
I'm really not very happy with the cubic.

I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.

When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.

When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.

I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?

Comments and suggestions welcome!
To get a line which makes physical sense you need a model of the underlying physical process.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
AMS
Snowbonk
Posts: 466
Joined: Mon Nov 11, 2019 11:14 pm

Re: COVID-19

Post by AMS »

Any curve fit that tends to either plus or minus infinity is never going to work for long. In the meantime you're just feeding it more parameters to constrain it nearer to the data for a bit longer, but these don't actually tell you anything.
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

shpalman wrote: Wed Dec 16, 2020 9:14 pm To get a line which makes physical sense you need a model of the underlying physical process.
I'm not looking for a line which makes physical sense. I'm just looking for a (relatively) simple and interpretable description of the data.
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

AMS wrote: Wed Dec 16, 2020 9:28 pm Any curve fit that tends to either plus or minus infinity is never going to work for long. In the meantime you're just feeding it more parameters to constrain it nearer to the data for a bit longer, but these don't actually tell you anything.
Yes. I'm beginning to talk myself into restricting the range of the data.
My objective of a simple and interpretable description of the data conflicts with a data range covering more complex behaviours.
User avatar
Bird on a Fire
Princess POW
Posts: 10142
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: COVID-19

Post by Bird on a Fire »

I think loess is a decent idea after a while, to describe overall movements in the location of the data. You could split the data into sections and fit your own curves to them, but really that's just you being an idiosyncratic local regression algorithm ;)

And I don't know that modelling the underlying process in much detail really is necessary - random walks work pretty well for a lot of demographic data, and as I mentioned a few posts ago seem to be being used with some degree of success for UK covid numbers specifically. I'm not quite sure what shpalman means by 'physical sense' though.
We have the right to a clean, healthy, sustainable environment.
User avatar
sTeamTraen
Stummy Beige
Posts: 2601
Joined: Mon Nov 11, 2019 4:24 pm
Location: Palma de Mallorca, Spain

Re: COVID-19

Post by sTeamTraen »

It seems from the RHS of the chart on that page that there are several days of recent cases missing, with the peak before the absence being around ~2,000 per day. The 11,000 are apparently backlogged from last weekend, so perhaps to be spread over 5 days. That will probably bring the number for the last 5 days up to around 2,400/day. For 3.1 million people that's close to what Iowa, Nevada, and Arkansas are experiencing. It means 1% of the population will get infected in less than a fortnight. :shock:

Wales has about 180 ICU beds, and I've been told by an ICU doctor that the average stay of a COVID-9 patient, whatever the outcome, is at least 3 weeks. That means that once they are full, beds will turn over at around 9-10 per day. We're looking at a humanitarian tragedy (and PTSD on a previously unheard-of scale among medical staff, I imagine).
Something something hammer something something nail
User avatar
jimbob
Light of Blast
Posts: 5665
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: COVID-19

Post by jimbob »

shpalman wrote: Wed Dec 16, 2020 9:14 pm
KAJ wrote: Wed Dec 16, 2020 9:05 pm
Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.

Models aside, the actual numbers are not looking good :(
I'm really not very happy with the cubic.

I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.

When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.

When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.

I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?

Comments and suggestions welcome!
To get a line which makes physical sense you need a model of the underlying physical process.
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.

we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.

*I guess that an argument can be made for geometric means
Have you considered stupidity as an explanation
KAJ
Fuzzable
Posts: 313
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: COVID-19

Post by KAJ »

jimbob wrote: Wed Dec 16, 2020 10:39 pm <snip>
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.

we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.

*I guess that an argument can be made for geometric means
* I don't think a good argument can be made for geometric means.

The sum of a weeks daily counts (cases, patients, deaths, ...) is meaningful. Indeed, a daily count is [can be interpreted as] the sum of hourly counts. The counts that agencies report are the sums of counts from different places.

But what does the product of counts mean? And do we really want a statistic that is zero if any datum is zero?

I suggest that the principal reasons we plot on log axes in this application are (a) it is the most familiar way of handling wide dynamic ranges (b) it gives a straight line for exponential growth. But neither of these are a reason for geometric means.

If we want a measure of central tendency that is more robust to outliers, then a median might be a better choice. But I haven't found that outliers are a problem here. Others experience may differ.
User avatar
jimbob
Light of Blast
Posts: 5665
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: COVID-19

Post by jimbob »

KAJ wrote: Thu Dec 17, 2020 10:15 am
jimbob wrote: Wed Dec 16, 2020 10:39 pm <snip>
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.

we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.

*I guess that an argument can be made for geometric means
* I don't think a good argument can be made for geometric means.

The sum of a weeks daily counts (cases, patients, deaths, ...) is meaningful. Indeed, a daily count is [can be interpreted as] the sum of hourly counts. The counts that agencies report are the sums of counts from different places.

But what does the product of counts mean? And do we really want a statistic that is zero if any datum is zero?

I suggest that the principal reasons we plot on log axes in this application are (a) it is the most familiar way of handling wide dynamic ranges (b) it gives a straight line for exponential growth. But neither of these are a reason for geometric means.

If we want a measure of central tendency that is more robust to outliers, then a median might be a better choice. But I haven't found that outliers are a problem here. Others experience may differ.
Yes I was having a brainfart
Have you considered stupidity as an explanation
Locked