Italy let people go to the shops and everyone went to the shops...shpalman wrote: Sun Dec 13, 2020 5:00 pm Lombardy is a yellow zone as of today, and it was a nice day, so you can imagine everyone went into town.
This was taken at about 15:20.
To be fair, the narrow streets in the centre are usually almost impassable during the weekends in December.
COVID-19
- shpalman
- Princess POW
- Posts: 8621
- Joined: Mon Nov 11, 2019 12:53 pm
- Location: One step beyond
- Contact:
Re: COVID-19
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
Re: COVID-19
Looking at this tweet - Ironically by someone who seems to be a supporter of Clare PathKAJ wrote: Sun Dec 13, 2020 8:49 pmGood point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.shpalman wrote: Sun Dec 13, 2020 6:37 pm Depends how it's handled if a swab is carried out on one day but processed the next day.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.
In developers-guide#params-structure I don't see a metric for tests other than by publish date. On the other hand I don't see "newPillarOneTwoTestsByPublishDate" either, which I've been using for some time and must have picked up from coronavirus.data.gov.uk in the first place.
I guess the dates of cases by publish date may correspond to those of tests by publish date. Taking that ratio I get:
CasesTests.pngThat has a pretty similar (small, F <=5) weekday effect to cases by publish dateCode: Select all
Analysis of Variance Table Response: log(Cases.Tests) Df Sum Sq Mean Sq F value Pr(>F) poly(date, 3) 3 1.99814 0.66605 34.030 2.12e-12 *** day 6 0.59199 0.09867 5.041 0.0003737 *** Residuals 53 1.03734 0.01957 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.63043 0.04667 -56.365 < 2e-16 *** poly(date, 3)1 -0.76742 0.14078 -5.451 1.33e-06 *** poly(date, 3)2 -1.08763 0.13991 -7.774 2.58e-10 *** poly(date, 3)3 0.46106 0.14190 3.249 0.00201 ** dayMon 0.13816 0.06597 2.094 0.04105 * dayTue -0.12832 0.06605 -1.943 0.05735 . dayWed -0.08010 0.06617 -1.211 0.23140 dayThu -0.12394 0.06634 -1.868 0.06725 . dayFri -0.17354 0.06605 -2.627 0.01123 * daySat -0.09323 0.06597 -1.413 0.16347 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1399 on 53 degrees of freedom Multiple R-squared: 0.714, Adjusted R-squared: 0.6655 F-statistic: 14.7 on 9 and 53 DF, p-value: 1.385e-11
PubCases.pngSo I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.Code: Select all
Analysis of Variance Table Response: log(PubCases) Df Sum Sq Mean Sq F value Pr(>F) poly(date, 3) 3 1.71301 0.57100 28.4255 4.3e-11 *** day 6 0.39217 0.06536 3.2538 0.008492 ** Residuals 53 1.06465 0.02009 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.81603 0.04756 206.393 < 2e-16 *** poly(date, 3)1 -0.53409 0.14262 -3.745 0.000446 *** poly(date, 3)2 -0.85512 0.14174 -6.033 1.60e-07 *** poly(date, 3)3 0.79031 0.14376 5.498 1.13e-06 *** dayMon -0.05330 0.06684 -0.797 0.428773 dayTue -0.06282 0.06692 -0.939 0.352065 dayWed 0.12376 0.06704 1.846 0.070454 . dayThu 0.14975 0.06721 2.228 0.030135 * dayFri 0.09062 0.06743 1.344 0.184700 daySat 0.08855 0.06770 1.308 0.196555 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1417 on 53 degrees of freedom Multiple R-squared: 0.6641, Adjusted R-squared: 0.6071 F-statistic: 11.64 on 9 and 53 DF, p-value: 7.734e-10
https://twitter.com/TarboilerBill/statu ... 7791280128
My reply:@ClareCraigPath
you assume the Labs produce consistent results. They don’t appear too in Scotland. Almost double the chance of getting a +ve diagnoses at the weekend.
Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?
Have you considered stupidity as an explanation
- shpalman
- Princess POW
- Posts: 8621
- Joined: Mon Nov 11, 2019 12:53 pm
- Location: One step beyond
- Contact:
Re: COVID-19
new variant of Covid may explain fast spread of virus in south of England
We have identified a new variant of coronavirus, which may be associated with the fastest spread in the south-east of England. Initial analysis suggests that this variant is growing faster than the existing variants. We’ve currently identified over 1,000 cases with this variant, predominantly in the south of England, although cases have been identified in nearly 60 different local authority areas, and numbers are increasing rapidly.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
- Little waster
- After Pie
- Posts: 2385
- Joined: Tue Nov 12, 2019 12:35 am
- Location: About 1 inch behind my eyes
Re: COVID-19
One for the No sh.t Sherlock pile.
If you asked me to list the three very worst scientists I'd want advising the Government on when and whether to impose a lockdown they'd be right at the top.A report in the Sunday Times over the weekend suggests that the decision not to impose a circuit-breaker lockdown was influenced by a meeting involving the prime minister, the chancellor and three proponents of a “herd immunity” approach to managing the virus: Prof Sunetra Gupta and Prof Carl Heneghan of the University of Oxford and Prof Anders Tegnell, the Swedish epidemiologist who has masterminded Sweden’s catastrophic Covid control policy (in the last month, Sweden has reported 1,400 Covid deaths, while neighbours Norway and Finland, both of which have roughly half its population, reported 100 and 80 respectively). The delay in imposing national restrictions resulted in an estimated 1.3 million extra Covid infections.
This place is not a place of honor, no highly esteemed deed is commemorated here, nothing valued is here.
What is here was dangerous and repulsive to us.
This place is best shunned and left uninhabited.
What is here was dangerous and repulsive to us.
This place is best shunned and left uninhabited.
Re: COVID-19
jimbob wrote: Sun Dec 13, 2020 6:54 pmI think it depends on how the tests are handled. Hospital admissions, I guess would be less affected by weekends compared to drive in testing, so you might see a difference in the positivity ratio for the different types. Those isolating because of contacts might get tests but not hurry.KAJ wrote: Sun Dec 13, 2020 6:03 pm I've been musing on what might be behind the strong dependence of 'Cases by Specimen Date' on day of week. I surmised that it was due to the dependence of number of tests on day of week.
<snip>
Anyone got any suggestions I might investigate?
I can see how those could swing it either way.
Meanwhile - this is my (far simpler) plot of the tests vs specimen date for each day of the week against week number.
It tells the same story as your more in-depth analysis - but I think the graph is easy to see.
<snip graph>
So I'm ready to believe that the dependence of cases by specimen date on weekday is (at least) largely due to weekday variation in number, nature etc. of specimens. I don't have the data to investigate that, but it doesn't really matter. The weekday dependence is really just a nuisance factor clouding the view of the time (date) dependence which I really want to see.jimbob wrote: Sun Dec 13, 2020 10:09 pm<snip>KAJ wrote: Sun Dec 13, 2020 8:49 pmGood point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.shpalman wrote: Sun Dec 13, 2020 6:37 pm Depends how it's handled if a swab is carried out on one day but processed the next day.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.
<snip>
So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.
Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?
Some people use cases by publish date, which reduces the weekday dependence. Others use 7-day moving averages which practically eliminates the weekday dependence. But both of these effectively associate some cases with dates other than the specimen date, distorting the date dependence I want to see.
Better is to model the weekday dependence to remove its effect. I may not know how it originates but I can model it quite well, it is reasonably constant (in logs) from week to week.
I quite like jimbob's representation with one line for each weekday. I can use it for the models I've been fitting (ggplot2 is nice!). This is today's data and fitted model in my representation: This is the same in jimbob's representation: That presentation makes clear that the date dependence is modelled identically for the different weekdays, they differ only in their vertical position (and that the modelled points are at different dates). I could allow the date dependence to differ between weekdays by nesting the date polynomial in weekdays, giving this: That graph suggests to me that the date dependence is very similar for the different weekdays so it isn't worth going to the nested model. Especially as the nested model has 28 coefficients (each weekday has a constant + 3 polynomial coefficients) compared to 10 for the non-nested model (3 polynomial coefficients + one constant per weekday). These examples have 63 dates (of which 5 are zero weighted). The non-nested model estimates 10 coefficients from 58 data which seems reasonable. The nested model estimates 28 coefficients from 58 data which seems a stretch. Putting it another way, each weekday occurs 9 times in the 63 dates. Fitting a cubic (or even a quadratic) to 9 points feels too much of a stretch. So I'm not going to use the nested model.
Having decided that, I now prefer my representation to jimbob's, which implies separate curves for each weekday. I think my representation more clearly represents the model and its relation to the data.
Re: COVID-19
Yes, I prefer yours too. My main point was to use the colour scale to show the weekday numbers. I could probably improve mine but this is pretty much how I approach work. If it's a subtle effect - there's usually a larger effect to deal with, or it's not a problem. I tend to use graphs (including spatial data) rather than more formal statistical tests for about 90% of the work I do.KAJ wrote: Mon Dec 14, 2020 9:09 pmjimbob wrote: Sun Dec 13, 2020 6:54 pmI think it depends on how the tests are handled. Hospital admissions, I guess would be less affected by weekends compared to drive in testing, so you might see a difference in the positivity ratio for the different types. Those isolating because of contacts might get tests but not hurry.KAJ wrote: Sun Dec 13, 2020 6:03 pm I've been musing on what might be behind the strong dependence of 'Cases by Specimen Date' on day of week. I surmised that it was due to the dependence of number of tests on day of week.
<snip>
Anyone got any suggestions I might investigate?
I can see how those could swing it either way.
Meanwhile - this is my (far simpler) plot of the tests vs specimen date for each day of the week against week number.
It tells the same story as your more in-depth analysis - but I think the graph is easy to see.
<snip graph>So I'm ready to believe that the dependence of cases by specimen date on weekday is (at least) largely due to weekday variation in number, nature etc. of specimens. I don't have the data to investigate that, but it doesn't really matter. The weekday dependence is really just a nuisance factor clouding the view of the time (date) dependence which I really want to see.jimbob wrote: Sun Dec 13, 2020 10:09 pm<snip>KAJ wrote: Sun Dec 13, 2020 8:49 pm
Good point. "Cases by specimen date" is clearly(?) by the date the swab is carried out.
But I've been using Tests = "newPillarOneTwoTestsByPublishDate" which is by publish date, very often(?) later than specimen date.
<snip>
So I guess correcting cases by tests doesn't work because they relate to different days, leaving viable the hypothesis that the weekday dependence of cases by specimen date is due to a weekday dependence of total (positive and negative) numbers of speciments. But I don't think i have the data to investigate that.
Now why would that be? Maybe because its mostly hospital testing, as opposed to routine testing?
Some people use cases by publish date, which reduces the weekday dependence. Others use 7-day moving averages which practically eliminates the weekday dependence. But both of these effectively associate some cases with dates other than the specimen date, distorting the date dependence I want to see.
Better is to model the weekday dependence to remove its effect. I may not know how it originates but I can model it quite well, it is reasonably constant (in logs) from week to week.
I quite like jimbob's representation with one line for each weekday. I can use it for the models I've been fitting (ggplot2 is nice!). This is today's data and fitted model in my representation:
xy1.png
This is the same in jimbob's representation:
xy2.png
That presentation makes clear that the date dependence is modelled identically for the different weekdays, they differ only in their vertical position (and that the modelled points are at different dates). I could allow the date dependence to differ between weekdays by nesting the date polynomial in weekdays, giving this:
xy3.png
That graph suggests to me that the date dependence is very similar for the different weekdays so it isn't worth going to the nested model. Especially as the nested model has 28 coefficients (each weekday has a constant + 3 polynomial coefficients) compared to 10 for the non-nested model (3 polynomial coefficients + one constant per weekday). These examples have 63 dates (of which 5 are zero weighted). The non-nested model estimates 10 coefficients from 58 data which seems reasonable. The nested model estimates 28 coefficients from 58 data which seems a stretch. Putting it another way, each weekday occurs 9 times in the 63 dates. Fitting a cubic (or even a quadratic) to 9 points feels too much of a stretch. So I'm not going to use the nested model.
Having decided that, I now prefer my representation to jimbob's, which implies separate curves for each weekday. I think my representation more clearly represents the model and its relation to the data.
Have you considered stupidity as an explanation
- sTeamTraen
- Stummy Beige
- Posts: 2601
- Joined: Mon Nov 11, 2019 4:24 pm
- Location: Palma de Mallorca, Spain
Re: COVID-19
Well, this is very boring. ECDC will no longer be producing daily data summaries. 

Something something hammer something something nail
- sTeamTraen
- Stummy Beige
- Posts: 2601
- Joined: Mon Nov 11, 2019 4:24 pm
- Location: Palma de Mallorca, Spain
Re: COVID-19
I tweaked my code to use the Johns Hopkins data.sTeamTraen wrote: Tue Dec 15, 2020 2:21 pm Well, this is very boring. ECDC will no longer be producing daily data summaries.![]()
UK 7-day moving average of new cases is 19772 - the highest since 23 November, a week before the end of the lockdown.
Something something hammer something something nail
Re: COVID-19
I also tweaked my code which uses the coronavirus.data.gov.uk data. I now correct for the weekday effect by estimating it and generating a fit with all points having a constant (Sunday) weekday effect, this shows trends much more clearly. Remember the latest 5 points are marked as "incomplete" by .gov.uk so I've given them zero weight in the fit. I agree with you, the weekday corrected values have been increasing since the end of lockdown.sTeamTraen wrote: Tue Dec 15, 2020 5:06 pmI tweaked my code to use the Johns Hopkins data.sTeamTraen wrote: Tue Dec 15, 2020 2:21 pm Well, this is very boring. ECDC will no longer be producing daily data summaries.![]()
UK 7-day moving average of new cases is 19772 - the highest since 23 November, a week before the end of the lockdown.
In case anyone's interested, here are some regression statistics,
Code: Select all
Analysis of Variance Table
Response: log(SpecCases)
Df Sum Sq Mean Sq F value Pr(>F)
poly(date, 3) 3 1.35257 0.45086 44.233 7.383e-14 ***
day 6 2.00832 0.33472 32.839 2.185e-15 ***
Residuals 48 0.48926 0.01019
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.56115 0.03601 265.502 < 2e-16 ***
poly(date, 3)1 -0.72911 0.13406 -5.439 1.78e-06 ***
poly(date, 3)2 -0.56530 0.14042 -4.026 0.000201 ***
poly(date, 3)3 0.86350 0.13549 6.373 6.73e-08 ***
dayMon 0.53046 0.04930 10.759 2.19e-14 ***
dayTue 0.42975 0.04921 8.733 1.76e-11 ***
dayWed 0.39323 0.05073 7.751 5.25e-10 ***
dayThu 0.31430 0.05062 6.209 1.20e-07 ***
dayFri 0.28242 0.05054 5.588 1.06e-06 ***
daySat 0.02138 0.05050 0.423 0.673923
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.101 on 48 degrees of freedom
Multiple R-squared: 0.8729, Adjusted R-squared: 0.8491
F-statistic: 36.64 on 9 and 48 DF, p-value: < 2.2e-16
Re: COVID-19
Just to counter any suggestion that the changes in case numbers were due to change in numbers, type etc. of specimens, here's a similar analysis of hospital admissions. Smaller weekday effect. Very similar shape of date effect.
Code: Select all
Analysis of Variance Table
Response: log(Admits)
Df Sum Sq Mean Sq F value Pr(>F)
poly(date, 3) 3 2.33169 0.77723 160.6072 < 2.2e-16 ***
day 6 0.15234 0.02539 5.2466 0.0002649 ***
Residuals 53 0.25648 0.00484
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.23734 0.02326 311.202 < 2e-16 ***
poly(date, 3)1 1.01718 0.07000 14.531 < 2e-16 ***
poly(date, 3)2 -0.98118 0.06957 -14.103 < 2e-16 ***
poly(date, 3)3 0.54315 0.07056 7.698 3.41e-10 ***
dayMon 0.04039 0.03281 1.231 0.2236
dayTue 0.05327 0.03284 1.622 0.1107
dayWed 0.07238 0.03290 2.200 0.0322 *
dayThu 0.04636 0.03298 1.405 0.1657
dayFri -0.05533 0.03310 -1.672 0.1004
daySat -0.05851 0.03281 -1.783 0.0802 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06957 on 53 degrees of freedom
Multiple R-squared: 0.9064, Adjusted R-squared: 0.8905
F-statistic: 57.03 on 9 and 53 DF, p-value: < 2.2e-16
- sTeamTraen
- Stummy Beige
- Posts: 2601
- Joined: Mon Nov 11, 2019 4:24 pm
- Location: Palma de Mallorca, Spain
Re: COVID-19
Do these data meet the assumptions for performing a regression analysis? I don't think we can treat cases or deaths as a random variable, because of the nature of the process(es) driving them. So you would need a specific analysis for time series --- perhaps a latent growth curve model.KAJ wrote: Tue Dec 15, 2020 8:36 pm In case anyone's interested, here are some regression statistics,
On a related note, we can't interpret the p values either, as we are not sampling. We have the entire population of cases or deaths. (Well, some may be missing, but we're not trying to infer that.)
Something something hammer something something nail
Re: COVID-19
I should emphasise more often and more clearly that I'm not using the regressions to draw inferences, and I'd include as inferences interpreting p values or doing significance tests or making predictions. I'm using the regressions simply as descriptive statistics - the data can be (approximately) described as "a curve of <this> form with weekday dependent offsets of <this>". A complete description would need (approximately) as many parameters as the number of data. The R-sq indicates the completeness of the regression description, I'm pleasantly surprised that a cubic curve and simple weekday offsets fits over 80% of the variation in the data. The ANOVA indicates how much that accuracy depends on the curve and on the weekday offsets.sTeamTraen wrote: Wed Dec 16, 2020 3:09 pmDo these data meet the assumptions for performing a regression analysis? I don't think we can treat cases or deaths as a random variable, because of the nature of the process(es) driving them. So you would need a specific analysis for time series --- perhaps a latent growth curve model.KAJ wrote: Tue Dec 15, 2020 8:36 pm In case anyone's interested, here are some regression statistics,
On a related note, we can't interpret the p values either, as we are not sampling. We have the entire population of cases or deaths. (Well, some may be missing, but we're not trying to infer that.)
If we were using the regression to assess the validity of a process model and estimate it's parameters, I would share your reservations. I really wouldn't want to model the process, and I would take a lot of convincing that a single model was applicable for an extended period.
Thanks for the insightful and apposite comment.
- Bird on a Fire
- Princess POW
- Posts: 10142
- Joined: Fri Oct 11, 2019 5:05 pm
- Location: Portugal
Re: COVID-19
Thanks for the interesting posts, KAJ.
Re modelling - that James Annan on twitter gets a good fit using a random walk Markov chain approach. He's not modelling the process exactly, certainly not mechanistically, beyond saying that the next week's figures will be directly dependent on the previous week's by an exponent drawn from some distribution or other. I think he constrains those distributions a bit according to lockdown level at the time, but basically lets the modelling process estimate it from the data.
But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.
Models aside, the actual numbers are not looking good
Re modelling - that James Annan on twitter gets a good fit using a random walk Markov chain approach. He's not modelling the process exactly, certainly not mechanistically, beyond saying that the next week's figures will be directly dependent on the previous week's by an exponent drawn from some distribution or other. I think he constrains those distributions a bit according to lockdown level at the time, but basically lets the modelling process estimate it from the data.
But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.
Models aside, the actual numbers are not looking good

We have the right to a clean, healthy, sustainable environment.
- sTeamTraen
- Stummy Beige
- Posts: 2601
- Joined: Mon Nov 11, 2019 4:24 pm
- Location: Palma de Mallorca, Spain
Re: COVID-19
Today's UK numbers are the highest since 24 November (deaths) / 13 November (cases).
There is a week to go of restrictions that are less severe than the November lockdown, and then there will be 5 days of merriment, with subsidised coach travel (seriously, WTAF)
At this rate, the first couple of weeks of January are going to see horrors the like of which the NHS has never faced in its existence, even without Brexit.
There is a week to go of restrictions that are less severe than the November lockdown, and then there will be 5 days of merriment, with subsidised coach travel (seriously, WTAF)
At this rate, the first couple of weeks of January are going to see horrors the like of which the NHS has never faced in its existence, even without Brexit.
Something something hammer something something nail
Re: COVID-19
I'm really not very happy with the cubic.Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.
Models aside, the actual numbers are not looking good![]()
I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.
When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.
When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.
I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?
Comments and suggestions welcome!
- shpalman
- Princess POW
- Posts: 8621
- Joined: Mon Nov 11, 2019 12:53 pm
- Location: One step beyond
- Contact:
Re: COVID-19
To get a line which makes physical sense you need a model of the underlying physical process.KAJ wrote: Wed Dec 16, 2020 9:05 pmI'm really not very happy with the cubic.Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.
Models aside, the actual numbers are not looking good![]()
I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.
When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.
When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.
I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?
Comments and suggestions welcome!
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
@shpalman@mastodon.me.uk
@shpalman.bsky.social / bsky.app/profile/chrastina.net
threads.net/@dannychrastina
Re: COVID-19
Any curve fit that tends to either plus or minus infinity is never going to work for long. In the meantime you're just feeding it more parameters to constrain it nearer to the data for a bit longer, but these don't actually tell you anything.
Re: COVID-19
I'm not looking for a line which makes physical sense. I'm just looking for a (relatively) simple and interpretable description of the data.shpalman wrote: Wed Dec 16, 2020 9:14 pm To get a line which makes physical sense you need a model of the underlying physical process.
Re: COVID-19
Yes. I'm beginning to talk myself into restricting the range of the data.AMS wrote: Wed Dec 16, 2020 9:28 pm Any curve fit that tends to either plus or minus infinity is never going to work for long. In the meantime you're just feeding it more parameters to constrain it nearer to the data for a bit longer, but these don't actually tell you anything.
My objective of a simple and interpretable description of the data conflicts with a data range covering more complex behaviours.
- Bird on a Fire
- Princess POW
- Posts: 10142
- Joined: Fri Oct 11, 2019 5:05 pm
- Location: Portugal
Re: COVID-19
I think loess is a decent idea after a while, to describe overall movements in the location of the data. You could split the data into sections and fit your own curves to them, but really that's just you being an idiosyncratic local regression algorithm 
And I don't know that modelling the underlying process in much detail really is necessary - random walks work pretty well for a lot of demographic data, and as I mentioned a few posts ago seem to be being used with some degree of success for UK covid numbers specifically. I'm not quite sure what shpalman means by 'physical sense' though.

And I don't know that modelling the underlying process in much detail really is necessary - random walks work pretty well for a lot of demographic data, and as I mentioned a few posts ago seem to be being used with some degree of success for UK covid numbers specifically. I'm not quite sure what shpalman means by 'physical sense' though.
We have the right to a clean, healthy, sustainable environment.
- sTeamTraen
- Stummy Beige
- Posts: 2601
- Joined: Mon Nov 11, 2019 4:24 pm
- Location: Palma de Mallorca, Spain
Re: COVID-19
It seems from the RHS of the chart on that page that there are several days of recent cases missing, with the peak before the absence being around ~2,000 per day. The 11,000 are apparently backlogged from last weekend, so perhaps to be spread over 5 days. That will probably bring the number for the last 5 days up to around 2,400/day. For 3.1 million people that's close to what Iowa, Nevada, and Arkansas are experiencing. It means 1% of the population will get infected in less than a fortnight.

Wales has about 180 ICU beds, and I've been told by an ICU doctor that the average stay of a COVID-9 patient, whatever the outcome, is at least 3 weeks. That means that once they are full, beds will turn over at around 9-10 per day. We're looking at a humanitarian tragedy (and PTSD on a previously unheard-of scale among medical staff, I imagine).
Something something hammer something something nail
Re: COVID-19
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.shpalman wrote: Wed Dec 16, 2020 9:14 pmTo get a line which makes physical sense you need a model of the underlying physical process.KAJ wrote: Wed Dec 16, 2020 9:05 pmI'm really not very happy with the cubic.Bird on a Fire wrote: Wed Dec 16, 2020 6:45 pm But yes, it is impressive how well the quadratic fit that section of the data - I was just about to muse on how long it would last when you posted the cubic, which to my eyes does seem to be smoothing through the peak and the trough a little but still does a decent job.
Models aside, the actual numbers are not looking good![]()
I started out with a straight line (in log numbers) because that leads directly to a doubling time. If the fit is decent it leads directly to a simple, easily interpretable description.
When it became clear that the slope was changing I used a quadratic. Not quite as simple, but not too bad - the slope is changing at a constant rate measured by the square term.
When it turned around I used a cubic as the simplest (?) curve to allow 3 slope directions (+ve, -ve, +ve). But it isn't as easy to interpret and it's a bit inflexible. I'm not certain it's meeting my purpose of an easily fitted, easily expressed, easily understood, function.
I've considered loess but that isn't really "easily expressed, easily understood" except graphically. But perhaps graphically is all I really want?
Perhaps I should restrict the range to where a quadratic fits well. I don't really expect the behaviour to be consistent over prolonged periods. Perhaps I'm being greedy trying to describe a couple of months with one description?
Comments and suggestions welcome!
we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.
*I guess that an argument can be made for geometric means
Have you considered stupidity as an explanation
Re: COVID-19
* I don't think a good argument can be made for geometric means.jimbob wrote: Wed Dec 16, 2020 10:39 pm <snip>
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.
we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.
*I guess that an argument can be made for geometric means
The sum of a weeks daily counts (cases, patients, deaths, ...) is meaningful. Indeed, a daily count is [can be interpreted as] the sum of hourly counts. The counts that agencies report are the sums of counts from different places.
But what does the product of counts mean? And do we really want a statistic that is zero if any datum is zero?
I suggest that the principal reasons we plot on log axes in this application are (a) it is the most familiar way of handling wide dynamic ranges (b) it gives a straight line for exponential growth. But neither of these are a reason for geometric means.
If we want a measure of central tendency that is more robust to outliers, then a median might be a better choice. But I haven't found that outliers are a problem here. Others experience may differ.
Re: COVID-19
Yes I was having a brainfartKAJ wrote: Thu Dec 17, 2020 10:15 am* I don't think a good argument can be made for geometric means.jimbob wrote: Wed Dec 16, 2020 10:39 pm <snip>
And this is why I went for the simplest plotting if the rolling 7 day centred average* against reported date on a log plot.
we know (or hope) that our interventions are changing the basic parameters and also that behaviour doesn't just change with official rules.
*I guess that an argument can be made for geometric means
The sum of a weeks daily counts (cases, patients, deaths, ...) is meaningful. Indeed, a daily count is [can be interpreted as] the sum of hourly counts. The counts that agencies report are the sums of counts from different places.
But what does the product of counts mean? And do we really want a statistic that is zero if any datum is zero?
I suggest that the principal reasons we plot on log axes in this application are (a) it is the most familiar way of handling wide dynamic ranges (b) it gives a straight line for exponential growth. But neither of these are a reason for geometric means.
If we want a measure of central tendency that is more robust to outliers, then a median might be a better choice. But I haven't found that outliers are a problem here. Others experience may differ.
Have you considered stupidity as an explanation