Page 196 of 258

Re: COVID-19

Posted: Sun Dec 27, 2020 4:16 am
by dyqik
Squeak wrote: Tue Dec 22, 2020 11:26 pm
dyqik wrote: Tue Dec 22, 2020 8:42 pm Oh, and CoVID has now made it to Antarctica.

Story

Fortunately the various Antarctic bases are fairly well isolated from each other (except say Amundsen-Scott and McMurdo, for example, where one is the stopping off point on the way to the other), and this is only in the Chilean base for now.
That's going to give the logistics folk the willies. I know the UK has already been running three sets of plans for this summer for several months now and even their best case scenario involved a massive reduction in fieldwork. Everyone's worked so hard to keep it out of Antarctica because it's so hard to do complicated health care down there. :(

And there's a nasty little covid gift brewing for oceanographers and climate modellers in coming years, both in Antarctica and globally. People can't get on ships to deploy gear and service moorings so there are going to be big observational holes in the next few years. :(
My colleagues down the hall* got their winter-over for 2021 on a couple of weeks ago. Their 2020 winter over leaves in a month.

But deployment of major new hardware has been put off for a year. Which is good, because I can do tests on it this year before it ships.

*The BICEP/Keck CMB group at Harvard.

Re: COVID-19

Posted: Sun Dec 27, 2020 5:21 pm
by shpalman
The UK has 30501 new covids today, but 998 covids from Northern Ireland got snuck to yesterday's data so the total was actually 35691.

Italy only has 8931 but with fewer swabs carried out, so the positivity rate was 14.9% (while the recent average is usually around 10%; the UK's was more like 8% except we haven't had testing figures through for a few days).

Veneto is having a bad time.

Re: COVID-19

Posted: Sun Dec 27, 2020 5:46 pm
by KAJ
shpalman wrote: Sun Dec 27, 2020 5:21 pm The UK has 30501 new covids today, but 998 covids from Northern Ireland got snuck to yesterday's data so the total was actually 35691.
Look at the data underlying Cases by specimen date. No cases have been reported from specimens after 22 December, where the numbers are already 41,749 with, surely, more to come. 21 December is already at 46,536 with, surely, more to come.

Code: Select all

date PubCases	SpecCases
27/12	30,501	
26/12	35,691	
25/12	32,725	
24/12	39,877	
23/12	39,237	
22/12	36,804	41,749
21/12	33,364	46,536
20/12	35,928	32,310
19/12	27,052	25,043
18/12	28,507	35,810
17/12	35,383	33,855
16/12	25,161	34,845
15/12	18,450	33,843
14/12	20,263	34,696
By the way, I'm still working on the graph we were talking about. It's led me down a rabbit hole, interesting but not the kind of thing to spend time on at the moment.

Re: COVID-19

Posted: Sun Dec 27, 2020 6:00 pm
by shpalman
KAJ wrote: Sun Dec 27, 2020 5:46 pm
shpalman wrote: Sun Dec 27, 2020 5:21 pm The UK has 30501 new covids today, but 998 covids from Northern Ireland got snuck to yesterday's data so the total was actually 35691.
Look at the data underlying Cases by specimen date. No cases have been reported from specimens after 22 December...
"by specimen date" for the UK is being held up because Northern Ireland and Scotland aren't reporting. England is reporting though:

Code: Select all

date	newCasesBySpecimenDate	cumCasesBySpecimenDate
2020-12-26	1977	1963217
2020-12-25	9394	1961240
2020-12-24	21027	1951846
2020-12-23	23487	1930819
2020-12-22	37347	1907332
2020-12-21	41813	1869985
2020-12-20	28772	1828172

Re: COVID-19

Posted: Sun Dec 27, 2020 6:35 pm
by shpalman
shpalman wrote: Sun Dec 27, 2020 5:21 pm The UK has 30501 new covids today, but 998 covids from Northern Ireland got snuck to yesterday's data so the total was actually 35691.

Italy only has 8931 but with fewer swabs carried out, so the positivity rate was 14.9% (while the recent average is usually around 10%; the UK's was more like 8% except we haven't had testing figures through for a few days).

Veneto is having a bad time.
Ok here's why Veneto is having a bad time: it has a high ICU capacity.

This means that even though its numbers of cases were going up in a way which made it obvious what was going to happen, technically it remained as a yellow zone since the evaluation of what zone to be in also takes into account health care capacity.

So, no, you can't run your covids at what you think is an acceptable level that your health system can cope with, whoever suggested that in whatever thread it was.

England is way past this kind of reasoning of course

Re: COVID-19

Posted: Sun Dec 27, 2020 11:00 pm
by sTeamTraen
shpalman wrote: Sun Dec 27, 2020 5:21 pm The UK has 30501 new covids today, but 998 covids from Northern Ireland got snuck to yesterday's data so the total was actually 35691.
Can you run me through the calculation there?

Re: COVID-19

Posted: Mon Dec 28, 2020 7:31 am
by shpalman
sTeamTraen wrote: Sun Dec 27, 2020 11:00 pm
shpalman wrote: Sun Dec 27, 2020 5:21 pm The UK has 30501 new covids today, but 998 covids from Northern Ireland got snuck to yesterday's data so the total was actually 35691.
Can you run me through the calculation there?
The total reported for 26th December was 34693, but NI hadn't reported any cases that day. Yesterday, the 27th, 998 cases in NI were added to its reporting total for the 26th not the 27th, taking the UK total for the 26th to 35691.

Re: COVID-19

Posted: Mon Dec 28, 2020 12:52 pm
by sTeamTraen
shpalman wrote: Mon Dec 28, 2020 7:31 am
sTeamTraen wrote: Sun Dec 27, 2020 11:00 pm
shpalman wrote: Sun Dec 27, 2020 5:21 pm The UK has 30501 new covids today, but 998 covids from Northern Ireland got snuck to yesterday's data so the total was actually 35691.
Can you run me through the calculation there?
The total reported for 26th December was 34693, but NI hadn't reported any cases that day. Yesterday, the 27th, 998 cases in NI were added to its reporting total for the 26th not the 27th, taking the UK total for the 26th to 35691.
Thanks. I presumed it was something like that but I hadn't been able to track down appropriate numbers to make it all fit together.

Re: COVID-19

Posted: Mon Dec 28, 2020 12:57 pm
by shpalman
I don't remember what the number for the 26th was at the time (I'm reverse-engineering the solution in my previous post) but I noticed when I put the new total in yesterday that the difference between that and the previous day didn't match up with what the website was declaring for new cases. That's when I went in to look at the last few days of cases-by-date-reported (which normally aren't corrected after the fact, unlike cases-by-date-of-test).

The sneaky thing is that those ~1000 cases didn't show up in the "new cases" figure for either day. Normally when a bunch of cases comes through late, they count towards the cases-by-date-reported total for the day on which they show up.

Re: COVID-19

Posted: Mon Dec 28, 2020 4:56 pm
by jimbob
Now the ECDC has moved to a weekly notification period, the data is

https://www.ecdc.europa.eu/en/publicati ... e-covid-19

although I might move to our world in data:

https://github.com/owid/covid-19-data/t ... ublic/data for all countries

Re: COVID-19

Posted: Mon Dec 28, 2020 5:35 pm
by OffTheRock
bob sterman wrote: Fri Dec 25, 2020 7:11 am
plodder wrote: Thu Dec 24, 2020 3:49 pm :shock:

oops. caveats apply etc
The Sun appear to have photos of the Excel centre hall empty.

Maybe the gear is all packed away? My apologies, looks like this Tice guy could have been correct about this. Even a stopped clock....
He might still be wrong. The Excel centre hall was emptied when they packed everything away in June. Could be that they’ve made a decision since then not to reopen it but the empty hall isn’t proof of that. I’d just assume Tice is wrong tbh.

Re: COVID-19

Posted: Mon Dec 28, 2020 6:57 pm
by shpalman
KAJ wrote: Sun Dec 27, 2020 5:46 pm By the way, I'm still working on the graph we were talking about. It's led me down a rabbit hole, interesting but not the kind of thing to spend time on at the moment.
I think it's now possible to see in the deaths-by-date-of-death data that the trend is increasing, as of the second week of December.

Re: COVID-19

Posted: Mon Dec 28, 2020 7:46 pm
by sTeamTraen
Russia has just admitted (in German, use Chrome translate) that its figure of 53,000 COVID-19 deaths is absurdly low. They now put the figure at 186,000 and are reporting 229,700 excess deaths for the year 2020.

Re: COVID-19

Posted: Mon Dec 28, 2020 9:00 pm
by KAJ
I'm guessing reporting delays such as recently discussed might underly my confusion, laid out below.

With respect to graphs I'd posted about lags in reporting deaths:
shpalman wrote: Wed Dec 23, 2020 9:22 pm Not sure, the curves look like they're the wrong way around.

What I want is deaths-per-day versus date-of-death for each data set reported on a different day.
I drafted this rambling response yesterday and have just finished it.

So you want:
  • x-axis: publication date
  • y-axis: number of deaths newly published on this publication date
  • colour: date-of-death.
  • Numbers from different dates-of-death 'stacked' within publication date, so total 'height' at each publication date = total number of deaths newly published on this publication date. Line joins each date-of-death to corresponding values at adjacent publication dates.
I've been saving published values of "Deaths within 28 days of positive test by date of death" (DateDeaths) (and, incidentally, cases by specimen date) from 17 November until today (27 December), 41 publication dates so far.

My immediate problem is that my data for each publication date doesn't have (and I don't think https://coronavirus.data.gov.uk/ gives) the newly published deaths versus date-of-death, just the totals at that publication date for each date-of-death. I addressed that by sorting the data by publication date within date-of-death and calculating successive differences for each date-of-death, e.g. below for date-of-death = 1 December. In this dataframe (dDF) I added column 'Published' when saving the data, and 'diff' to do the required plot.

Code: Select all

            date DateDeaths  Published diff
11203 2020-12-01        119 2020-12-02  119
11204 2020-12-01        278 2020-12-03  159
11205 2020-12-01        325 2020-12-04   47
11206 2020-12-01        346 2020-12-05   21
11207 2020-12-01        350 2020-12-06    4
11208 2020-12-01        352 2020-12-07    2
11209 2020-12-01        353 2020-12-08    1
11210 2020-12-01        364 2020-12-09   11
11211 2020-12-01        368 2020-12-10    4
11212 2020-12-01        379 2020-12-11   11
11213 2020-12-01        383 2020-12-12    4
11214 2020-12-01        386 2020-12-13    3
11215 2020-12-01        386 2020-12-14    0
11216 2020-12-01        389 2020-12-15    3
11217 2020-12-01        394 2020-12-16    5
11218 2020-12-01        394 2020-12-17    0
11219 2020-12-01        395 2020-12-18    1
11220 2020-12-01        397 2020-12-19    2
11221 2020-12-01        398 2020-12-20    1
11222 2020-12-01        398 2020-12-21    0
11223 2020-12-01        398 2020-12-22    0
11224 2020-12-01        399 2020-12-23    1
11225 2020-12-01        401 2020-12-24    2
11226 2020-12-01        399 2020-12-25   -2
11227 2020-12-01        402 2020-12-26    3
11228 2020-12-01        401 2020-12-27   -1
Two things jump out. Firstly, the number of deaths dated 1 December is still being changed at 27 December.

More surprising, there are some negative differences - on 25 Dec and 27 Dec the number decreased from the previous day. This isn't unusual. Of the 11,533 differences most are zero, as expected, 1,167 are positive, but 329 are negative.

Code: Select all

> summary(subset(dDF, diff < 0)$diff)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-20.000  -2.000  -1.000  -1.617  -1.000  -1.000  
I don't really know what to make of these. Maybe they're corrections of "blunders", but there are many of them, and they're quite widely spread.

Code: Select all

> unique(subset(dDF, diff < 0)$Published)
 [1] "2020-11-21" "2020-12-09" "2020-12-12" "2020-11-19" "2020-11-24"
 [6] "2020-11-25" "2020-11-30" "2020-12-06" "2020-12-10" "2020-12-18"
[11] "2020-12-23" "2020-12-15" "2020-11-29" "2020-12-01" "2020-12-19"
[16] "2020-12-27" "2020-12-24" "2020-12-08" "2020-12-13" "2020-11-27"
[21] "2020-12-20" "2020-11-22" "2020-12-16" "2020-12-22" "2020-12-25"
[26] "2020-12-26" "2020-11-20" "2020-11-18" "2020-12-03" "2020-12-14"
[31] "2020-12-05" "2020-12-07" "2020-12-17" "2020-12-04" "2020-12-21"
[36] "2020-11-23" "2020-11-28" "2020-12-11" "2020-12-02" 
Perhaps (I really don't know) these negative values are related to the mismatch noted below.

I had intended the 'diff' column in dataframe dDF above to be the newly published deaths versus date-of-death needed for the plot, giving me this:
plot.png
plot.png (97.1 KiB) Viewed 5363 times
The line colours differ by date-of-death. They look very similar because there are 298 dates-of-death and the most visible ones are quite similar.

But apart from that, there are some very funny features in this graph, especially in the totals at each publication date.

If I sum 'diff' across Published for each date-of-death I get the most recently published deaths by date-of-death (DateDeaths). That has to be the case, it follows from how 'diff' was calculated.

In an ideal world, if I had all the data back to day zero, and summed 'diff' across date-of-death for each Published I should get the most recently published "Deaths within 28 days of positive test by date reported" (PubDeaths). These are the cases when they disagree:

Code: Select all

>   subset(bypub,sum.diff != PubDeaths) # mismatches
      Pubdate sum.diff PubDeaths
1  2020-11-17    52744       598
2  2020-11-18      528       529
3  2020-11-19      347       501
4  2020-11-20      664       511
5  2020-11-21      219       341
6  2020-11-22      249       398
7  2020-11-23      478       206
8  2020-11-24      607       608
10 2020-11-26       25       498
11 2020-11-27      843       521
12 2020-11-28      496       479
13 2020-11-29       53       215
14 2020-11-30      494       205
15 2020-12-01      607       603
19 2020-12-05      300       397
20 2020-12-06       70       231
21 2020-12-07      446       189
22 2020-12-08      601       616
23 2020-12-09      531       533
24 2020-12-10      512       516
25 2020-12-11      397       424
26 2020-12-12      422       519
27 2020-12-13       30       144
28 2020-12-14      476       232
29 2020-12-15      390       506
30 2020-12-16      726       613
31 2020-12-17      533       532
33 2020-12-19      426       534
34 2020-12-20      120       326
35 2020-12-21      530       215
36 2020-12-22      689       691
37 2020-12-23      742       744
38 2020-12-24      455       585
39 2020-12-25      219       570
40 2020-12-26       47       230
41 2020-12-27       44       316 
The first, very large, disagreement is expected, I don't have publication dates prior to 17 November so all deaths up to then are assigned to that publication date, which I've omitted from the graph. But there are large disagreements later,

I had hoped that a given date-of-death would become constant after 2 or 3 weeks, giving zero differences, no longer contributing to sums for publication dates. But, as an example, here are the non-zero contributions to the 13 December publication date.

Code: Select all

> subset(dDF, (Published == as.Date("2020-12-13")) & (diff != 0))
            date DateDeaths  Published diff
2897  2020-05-09        377 2020-12-13   -1
3307  2020-05-19        275 2020-12-13    1
3348  2020-05-20        267 2020-12-13   -1
9826  2020-10-25        249 2020-12-13    1
10400 2020-11-08        412 2020-12-13   -1
10605 2020-11-13        433 2020-12-13   -1
10687 2020-11-15        449 2020-12-13   -1
10728 2020-11-16        423 2020-12-13   -1
10879 2020-11-20        449 2020-12-13   -1
10949 2020-11-22        474 2020-12-13   -1
11047 2020-11-25        477 2020-12-13   -1
11077 2020-11-26        431 2020-12-13    1
11105 2020-11-27        415 2020-12-13   -1
11133 2020-11-28        453 2020-12-13    1
11188 2020-11-30        419 2020-12-13    1
11214 2020-12-01        386 2020-12-13    3
11284 2020-12-04        433 2020-12-13    3
11305 2020-12-05        347 2020-12-13    1
11326 2020-12-06        365 2020-12-13    1
11346 2020-12-07        370 2020-12-13    2
11383 2020-12-09        356 2020-12-13    5
11400 2020-12-10        313 2020-12-13   20 
These go back to date-of-death 9 May. There doesn't seem to be a time after which the influence of a date-of-death on a sum for publication date disappears.

But that isn't enough to explain the differences. According to https://coronavirus.data.gov.uk/details ... itive-test "Deaths by date reported - each death is assigned to the date when it was first included in the published totals", so a "Deaths by date reported" figure shouldn't include any deaths published earlier.

As an example, on 13 December the "Deaths by date reported" was 144. But look at the by date-of-death data published on 12 and 13 December, here are all the dates-of-death where the numbers differed between the two publication dates. There just aren't 144 deaths included on 13 December which weren't included earlier.

Code: Select all

> tDF <- data.frame(subset(dDF, (Published == as.Date("2020-12-12")), 1:3),
        subset(dDF, (Published == as.Date("2020-12-13"))), check.names = TRUE)
> subset(tDF, diff !=0)
            date DateDeaths  Published     date.1 DateDeaths.1 Published.1 diff
2896  2020-05-09        378 2020-12-12 2020-05-09          377  2020-12-13   -1
3306  2020-05-19        274 2020-12-12 2020-05-19          275  2020-12-13    1
3347  2020-05-20        268 2020-12-12 2020-05-20          267  2020-12-13   -1
9825  2020-10-25        248 2020-12-12 2020-10-25          249  2020-12-13    1
10399 2020-11-08        413 2020-12-12 2020-11-08          412  2020-12-13   -1
10604 2020-11-13        434 2020-12-12 2020-11-13          433  2020-12-13   -1
10686 2020-11-15        450 2020-12-12 2020-11-15          449  2020-12-13   -1
10727 2020-11-16        424 2020-12-12 2020-11-16          423  2020-12-13   -1
10878 2020-11-20        450 2020-12-12 2020-11-20          449  2020-12-13   -1
10948 2020-11-22        475 2020-12-12 2020-11-22          474  2020-12-13   -1
11046 2020-11-25        478 2020-12-12 2020-11-25          477  2020-12-13   -1
11076 2020-11-26        430 2020-12-12 2020-11-26          431  2020-12-13    1
11104 2020-11-27        416 2020-12-12 2020-11-27          415  2020-12-13   -1
11132 2020-11-28        452 2020-12-12 2020-11-28          453  2020-12-13    1
11187 2020-11-30        418 2020-12-12 2020-11-30          419  2020-12-13    1
11213 2020-12-01        383 2020-12-12 2020-12-01          386  2020-12-13    3
11283 2020-12-04        430 2020-12-12 2020-12-04          433  2020-12-13    3
11304 2020-12-05        346 2020-12-12 2020-12-05          347  2020-12-13    1
11325 2020-12-06        364 2020-12-12 2020-12-06          365  2020-12-13    1
11345 2020-12-07        368 2020-12-12 2020-12-07          370  2020-12-13    2
11382 2020-12-09        351 2020-12-12 2020-12-09          356  2020-12-13    5
11399 2020-12-10        293 2020-12-12 2020-12-10          313  2020-12-13   20
Sorry for the loser-length post. I feel I must be misunderstanding something, probably something very simple - it has been know before! Can anyone put me straight?

Re: COVID-19

Posted: Mon Dec 28, 2020 9:09 pm
by KAJ
shpalman wrote: Mon Dec 28, 2020 6:57 pm
KAJ wrote: Sun Dec 27, 2020 5:46 pm By the way, I'm still working on the graph we were talking about. It's led me down a rabbit hole, interesting but not the kind of thing to spend time on at the moment.
I think it's now possible to see in the deaths-by-date-of-death data that the trend is increasing, as of the second week of December.
It doesn't help that (at least for the UK) we don't have any deaths-by-date-of-death data after 22 Dec, and the latest 5 of those are marked as incomplete. I, too, think I see an increasing trend, but I can't extract any really strong evidence - the (random?) scatter is of broadly equal magnitude to the underlying trend.
DateDeaths.png
DateDeaths.png (12.06 KiB) Viewed 5358 times

Code: Select all

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     6.09524    0.01374 443.754   <2e-16 ***
poly(date, 2)1  0.04221    0.09596   0.440   0.6638    
poly(date, 2)2  0.22273    0.09300   2.395   0.0244 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06162 on 25 degrees of freedom
Multiple R-squared:  0.2188,	Adjusted R-squared:  0.1563 
F-statistic: 3.501 on 2 and 25 DF,  p-value: 0.04567

Re: COVID-19

Posted: Mon Dec 28, 2020 11:37 pm
by tenchboy
Just HOW-ON-EARTH can these things 1,6,7, be compatible?
HOW.png
HOW.png (10.62 KiB) Viewed 5322 times

Re: COVID-19

Posted: Mon Dec 28, 2020 11:49 pm
by jimbob
KAJ wrote: Mon Dec 28, 2020 9:00 pm I'm guessing reporting delays such as recently discussed might underly my confusion, laid out below.

With respect to graphs I'd posted about lags in reporting deaths:
shpalman wrote: Wed Dec 23, 2020 9:22 pm Not sure, the curves look like they're the wrong way around.

What I want is deaths-per-day versus date-of-death for each data set reported on a different day.
I drafted this rambling response yesterday and have just finished it.

So you want:
  • x-axis: publication date
  • y-axis: number of deaths newly published on this publication date
  • colour: date-of-death.
  • Numbers from different dates-of-death 'stacked' within publication date, so total 'height' at each publication date = total number of deaths newly published on this publication date. Line joins each date-of-death to corresponding values at adjacent publication dates.
I've been saving published values of "Deaths within 28 days of positive test by date of death" (DateDeaths) (and, incidentally, cases by specimen date) from 17 November until today (27 December), 41 publication dates so far.

My immediate problem is that my data for each publication date doesn't have (and I don't think https://coronavirus.data.gov.uk/ gives) the newly published deaths versus date-of-death, just the totals at that publication date for each date-of-death. I addressed that by sorting the data by publication date within date-of-death and calculating successive differences for each date-of-death, e.g. below for date-of-death = 1 December. In this dataframe (dDF) I added column 'Published' when saving the data, and 'diff' to do the required plot.

Code: Select all

            date DateDeaths  Published diff
11203 2020-12-01        119 2020-12-02  119
11204 2020-12-01        278 2020-12-03  159
11205 2020-12-01        325 2020-12-04   47
11206 2020-12-01        346 2020-12-05   21
11207 2020-12-01        350 2020-12-06    4
11208 2020-12-01        352 2020-12-07    2
11209 2020-12-01        353 2020-12-08    1
11210 2020-12-01        364 2020-12-09   11
11211 2020-12-01        368 2020-12-10    4
11212 2020-12-01        379 2020-12-11   11
11213 2020-12-01        383 2020-12-12    4
11214 2020-12-01        386 2020-12-13    3
11215 2020-12-01        386 2020-12-14    0
11216 2020-12-01        389 2020-12-15    3
11217 2020-12-01        394 2020-12-16    5
11218 2020-12-01        394 2020-12-17    0
11219 2020-12-01        395 2020-12-18    1
11220 2020-12-01        397 2020-12-19    2
11221 2020-12-01        398 2020-12-20    1
11222 2020-12-01        398 2020-12-21    0
11223 2020-12-01        398 2020-12-22    0
11224 2020-12-01        399 2020-12-23    1
11225 2020-12-01        401 2020-12-24    2
11226 2020-12-01        399 2020-12-25   -2
11227 2020-12-01        402 2020-12-26    3
11228 2020-12-01        401 2020-12-27   -1
Two things jump out. Firstly, the number of deaths dated 1 December is still being changed at 27 December.

More surprising, there are some negative differences - on 25 Dec and 27 Dec the number decreased from the previous day. This isn't unusual. Of the 11,533 differences most are zero, as expected, 1,167 are positive, but 329 are negative.

Code: Select all

> summary(subset(dDF, diff < 0)$diff)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-20.000  -2.000  -1.000  -1.617  -1.000  -1.000  
I don't really know what to make of these. Maybe they're corrections of "blunders", but there are many of them, and they're quite widely spread.

Code: Select all

> unique(subset(dDF, diff < 0)$Published)
 [1] "2020-11-21" "2020-12-09" "2020-12-12" "2020-11-19" "2020-11-24"
 [6] "2020-11-25" "2020-11-30" "2020-12-06" "2020-12-10" "2020-12-18"
[11] "2020-12-23" "2020-12-15" "2020-11-29" "2020-12-01" "2020-12-19"
[16] "2020-12-27" "2020-12-24" "2020-12-08" "2020-12-13" "2020-11-27"
[21] "2020-12-20" "2020-11-22" "2020-12-16" "2020-12-22" "2020-12-25"
[26] "2020-12-26" "2020-11-20" "2020-11-18" "2020-12-03" "2020-12-14"
[31] "2020-12-05" "2020-12-07" "2020-12-17" "2020-12-04" "2020-12-21"
[36] "2020-11-23" "2020-11-28" "2020-12-11" "2020-12-02" 
Perhaps (I really don't know) these negative values are related to the mismatch noted below.

I had intended the 'diff' column in dataframe dDF above to be the newly published deaths versus date-of-death needed for the plot, giving me this:
plot.png

The line colours differ by date-of-death. They look very similar because there are 298 dates-of-death and the most visible ones are quite similar.

But apart from that, there are some very funny features in this graph, especially in the totals at each publication date.

If I sum 'diff' across Published for each date-of-death I get the most recently published deaths by date-of-death (DateDeaths). That has to be the case, it follows from how 'diff' was calculated.

In an ideal world, if I had all the data back to day zero, and summed 'diff' across date-of-death for each Published I should get the most recently published "Deaths within 28 days of positive test by date reported" (PubDeaths). These are the cases when they disagree:

Code: Select all

>   subset(bypub,sum.diff != PubDeaths) # mismatches
      Pubdate sum.diff PubDeaths
1  2020-11-17    52744       598
2  2020-11-18      528       529
3  2020-11-19      347       501
4  2020-11-20      664       511
5  2020-11-21      219       341
6  2020-11-22      249       398
7  2020-11-23      478       206
8  2020-11-24      607       608
10 2020-11-26       25       498
11 2020-11-27      843       521
12 2020-11-28      496       479
13 2020-11-29       53       215
14 2020-11-30      494       205
15 2020-12-01      607       603
19 2020-12-05      300       397
20 2020-12-06       70       231
21 2020-12-07      446       189
22 2020-12-08      601       616
23 2020-12-09      531       533
24 2020-12-10      512       516
25 2020-12-11      397       424
26 2020-12-12      422       519
27 2020-12-13       30       144
28 2020-12-14      476       232
29 2020-12-15      390       506
30 2020-12-16      726       613
31 2020-12-17      533       532
33 2020-12-19      426       534
34 2020-12-20      120       326
35 2020-12-21      530       215
36 2020-12-22      689       691
37 2020-12-23      742       744
38 2020-12-24      455       585
39 2020-12-25      219       570
40 2020-12-26       47       230
41 2020-12-27       44       316 
The first, very large, disagreement is expected, I don't have publication dates prior to 17 November so all deaths up to then are assigned to that publication date, which I've omitted from the graph. But there are large disagreements later,

I had hoped that a given date-of-death would become constant after 2 or 3 weeks, giving zero differences, no longer contributing to sums for publication dates. But, as an example, here are the non-zero contributions to the 13 December publication date.

Code: Select all

> subset(dDF, (Published == as.Date("2020-12-13")) & (diff != 0))
            date DateDeaths  Published diff
2897  2020-05-09        377 2020-12-13   -1
3307  2020-05-19        275 2020-12-13    1
3348  2020-05-20        267 2020-12-13   -1
9826  2020-10-25        249 2020-12-13    1
10400 2020-11-08        412 2020-12-13   -1
10605 2020-11-13        433 2020-12-13   -1
10687 2020-11-15        449 2020-12-13   -1
10728 2020-11-16        423 2020-12-13   -1
10879 2020-11-20        449 2020-12-13   -1
10949 2020-11-22        474 2020-12-13   -1
11047 2020-11-25        477 2020-12-13   -1
11077 2020-11-26        431 2020-12-13    1
11105 2020-11-27        415 2020-12-13   -1
11133 2020-11-28        453 2020-12-13    1
11188 2020-11-30        419 2020-12-13    1
11214 2020-12-01        386 2020-12-13    3
11284 2020-12-04        433 2020-12-13    3
11305 2020-12-05        347 2020-12-13    1
11326 2020-12-06        365 2020-12-13    1
11346 2020-12-07        370 2020-12-13    2
11383 2020-12-09        356 2020-12-13    5
11400 2020-12-10        313 2020-12-13   20 
These go back to date-of-death 9 May. There doesn't seem to be a time after which the influence of a date-of-death on a sum for publication date disappears.

But that isn't enough to explain the differences. According to https://coronavirus.data.gov.uk/details ... itive-test "Deaths by date reported - each death is assigned to the date when it was first included in the published totals", so a "Deaths by date reported" figure shouldn't include any deaths published earlier.

As an example, on 13 December the "Deaths by date reported" was 144. But look at the by date-of-death data published on 12 and 13 December, here are all the dates-of-death where the numbers differed between the two publication dates. There just aren't 144 deaths included on 13 December which weren't included earlier.

Code: Select all

> tDF <- data.frame(subset(dDF, (Published == as.Date("2020-12-12")), 1:3),
        subset(dDF, (Published == as.Date("2020-12-13"))), check.names = TRUE)
> subset(tDF, diff !=0)
            date DateDeaths  Published     date.1 DateDeaths.1 Published.1 diff
2896  2020-05-09        378 2020-12-12 2020-05-09          377  2020-12-13   -1
3306  2020-05-19        274 2020-12-12 2020-05-19          275  2020-12-13    1
3347  2020-05-20        268 2020-12-12 2020-05-20          267  2020-12-13   -1
9825  2020-10-25        248 2020-12-12 2020-10-25          249  2020-12-13    1
10399 2020-11-08        413 2020-12-12 2020-11-08          412  2020-12-13   -1
10604 2020-11-13        434 2020-12-12 2020-11-13          433  2020-12-13   -1
10686 2020-11-15        450 2020-12-12 2020-11-15          449  2020-12-13   -1
10727 2020-11-16        424 2020-12-12 2020-11-16          423  2020-12-13   -1
10878 2020-11-20        450 2020-12-12 2020-11-20          449  2020-12-13   -1
10948 2020-11-22        475 2020-12-12 2020-11-22          474  2020-12-13   -1
11046 2020-11-25        478 2020-12-12 2020-11-25          477  2020-12-13   -1
11076 2020-11-26        430 2020-12-12 2020-11-26          431  2020-12-13    1
11104 2020-11-27        416 2020-12-12 2020-11-27          415  2020-12-13   -1
11132 2020-11-28        452 2020-12-12 2020-11-28          453  2020-12-13    1
11187 2020-11-30        418 2020-12-12 2020-11-30          419  2020-12-13    1
11213 2020-12-01        383 2020-12-12 2020-12-01          386  2020-12-13    3
11283 2020-12-04        430 2020-12-12 2020-12-04          433  2020-12-13    3
11304 2020-12-05        346 2020-12-12 2020-12-05          347  2020-12-13    1
11325 2020-12-06        364 2020-12-12 2020-12-06          365  2020-12-13    1
11345 2020-12-07        368 2020-12-12 2020-12-07          370  2020-12-13    2
11382 2020-12-09        351 2020-12-12 2020-12-09          356  2020-12-13    5
11399 2020-12-10        293 2020-12-12 2020-12-10          313  2020-12-13   20
Sorry for the loser-length post. I feel I must be misunderstanding something, probably something very simple - it has been know before! Can anyone put me straight?
Using date of reporting is probably the fastest and most stable measure.

If you want to retrospectively review impacts of particular measures, then after a sufficient delay, use date of occurrence.

Re: COVID-19

Posted: Tue Dec 29, 2020 7:43 am
by shpalman
KAJ wrote: Mon Dec 28, 2020 9:00 pm I'm guessing reporting delays such as recently discussed might underly my confusion, laid out below.

With respect to graphs I'd posted about lags in reporting deaths:
shpalman wrote: Wed Dec 23, 2020 9:22 pm Not sure, the curves look like they're the wrong way around.

What I want is deaths-per-day versus date-of-death for each data set reported on a different day.
I drafted this rambling response yesterday and have just finished it.

So you want:
  • x-axis: publication date
  • y-axis: number of deaths newly published on this publication date
  • colour: date-of-death.
  • Numbers from different dates-of-death 'stacked' within publication date, so total 'height' at each publication date = total number of deaths newly published on this publication date. Line joins each date-of-death to corresponding values at adjacent publication dates.
Yes, but not necessarily "stacked", just superimposed so that you can see the time-evolution of the deaths-by-date-of-death data. It doesn't have to be "newly published", just whatever total number of deaths was considered to have happened on a certain date according to data from some other date.
KAJ wrote: Mon Dec 28, 2020 9:00 pm...

Two things jump out. Firstly, the number of deaths dated 1 December is still being changed at 27 December.
Well, this is what I was looking for. Deaths-by-reporting-date are clearly starting to go up but deaths-by-date-of-death aren't showing the same, which means the reports now must be dated all the way back to the second-wave peak and they're moving the whole graph up, not just the last few days of it.
KAJ wrote: Mon Dec 28, 2020 9:00 pmMore surprising, there are some negative differences - on 25 Dec and 27 Dec the number decreased from the previous day. This isn't unusual. Of the 11,533 differences most are zero, as expected, 1,167 are positive, but 329 are negative.
I don't know, they could be moving a death to the "correct" day in some cases; they can't be re-attributing a death to not-covid since this metric is supposed to count all deaths within 28 days of a positive test, whatever the reason.
KAJ wrote: Mon Dec 28, 2020 9:00 pmI had intended the 'diff' column in dataframe dDF above to be the newly published deaths versus date-of-death needed for the plot, giving me this:

Image

The line colours differ by date-of-death. They look very similar because there are 298 dates-of-death and the most visible ones are quite similar.

But apart from that, there are some very funny features in this graph, especially in the totals at each publication date.

If I sum 'diff' across Published for each date-of-death I get the most recently published deaths by date-of-death (DateDeaths). That has to be the case, it follows from how 'diff' was calculated.

In an ideal world, if I had all the data back to day zero, and summed 'diff' across date-of-death for each Published I should get the most recently published "Deaths within 28 days of positive test by date reported" (PubDeaths). These are the cases when they disagree:

Code: Select all

>   subset(bypub,sum.diff != PubDeaths) # mismatches
      Pubdate sum.diff PubDeaths
1  2020-11-17    52744       598
2  2020-11-18      528       529
3  2020-11-19      347       501
4  2020-11-20      664       511
5  2020-11-21      219       341
6  2020-11-22      249       398
7  2020-11-23      478       206
8  2020-11-24      607       608
10 2020-11-26       25       498
11 2020-11-27      843       521
12 2020-11-28      496       479
13 2020-11-29       53       215
14 2020-11-30      494       205
15 2020-12-01      607       603
19 2020-12-05      300       397
20 2020-12-06       70       231
21 2020-12-07      446       189
22 2020-12-08      601       616
23 2020-12-09      531       533
24 2020-12-10      512       516
25 2020-12-11      397       424
26 2020-12-12      422       519
27 2020-12-13       30       144
28 2020-12-14      476       232
29 2020-12-15      390       506
30 2020-12-16      726       613
31 2020-12-17      533       532
33 2020-12-19      426       534
34 2020-12-20      120       326
35 2020-12-21      530       215
36 2020-12-22      689       691
37 2020-12-23      742       744
38 2020-12-24      455       585
39 2020-12-25      219       570
40 2020-12-26       47       230
41 2020-12-27       44       316 
The first, very large, disagreement is expected, I don't have publication dates prior to 17 November so all deaths up to then are assigned to that publication date, which I've omitted from the graph. But there are large disagreements later,

I had hoped that a given date-of-death would become constant after 2 or 3 weeks, giving zero differences, no longer contributing to sums for publication dates. But, as an example, here are the non-zero contributions to the 13 December publication date.

Code: Select all

> subset(dDF, (Published == as.Date("2020-12-13")) & (diff != 0))
            date DateDeaths  Published diff
2897  2020-05-09        377 2020-12-13   -1
3307  2020-05-19        275 2020-12-13    1
3348  2020-05-20        267 2020-12-13   -1
9826  2020-10-25        249 2020-12-13    1
10400 2020-11-08        412 2020-12-13   -1
10605 2020-11-13        433 2020-12-13   -1
10687 2020-11-15        449 2020-12-13   -1
10728 2020-11-16        423 2020-12-13   -1
10879 2020-11-20        449 2020-12-13   -1
10949 2020-11-22        474 2020-12-13   -1
11047 2020-11-25        477 2020-12-13   -1
11077 2020-11-26        431 2020-12-13    1
11105 2020-11-27        415 2020-12-13   -1
11133 2020-11-28        453 2020-12-13    1
11188 2020-11-30        419 2020-12-13    1
11214 2020-12-01        386 2020-12-13    3
11284 2020-12-04        433 2020-12-13    3
11305 2020-12-05        347 2020-12-13    1
11326 2020-12-06        365 2020-12-13    1
11346 2020-12-07        370 2020-12-13    2
11383 2020-12-09        356 2020-12-13    5
11400 2020-12-10        313 2020-12-13   20 
These go back to date-of-death 9 May. There doesn't seem to be a time after which the influence of a date-of-death on a sum for publication date disappears.

But that isn't enough to explain the differences. According to https://coronavirus.data.gov.uk/details ... itive-test "Deaths by date reported - each death is assigned to the date when it was first included in the published totals", so a "Deaths by date reported" figure shouldn't include any deaths published earlier.

As an example, on 13 December the "Deaths by date reported" was 144. But look at the by date-of-death data published on 12 and 13 December, here are all the dates-of-death where the numbers differed between the two publication dates. There just aren't 144 deaths included on 13 December which weren't included earlier.

Code: Select all

> tDF <- data.frame(subset(dDF, (Published == as.Date("2020-12-12")), 1:3),
        subset(dDF, (Published == as.Date("2020-12-13"))), check.names = TRUE)
> subset(tDF, diff !=0)
            date DateDeaths  Published     date.1 DateDeaths.1 Published.1 diff
2896  2020-05-09        378 2020-12-12 2020-05-09          377  2020-12-13   -1
3306  2020-05-19        274 2020-12-12 2020-05-19          275  2020-12-13    1
3347  2020-05-20        268 2020-12-12 2020-05-20          267  2020-12-13   -1
9825  2020-10-25        248 2020-12-12 2020-10-25          249  2020-12-13    1
10399 2020-11-08        413 2020-12-12 2020-11-08          412  2020-12-13   -1
10604 2020-11-13        434 2020-12-12 2020-11-13          433  2020-12-13   -1
10686 2020-11-15        450 2020-12-12 2020-11-15          449  2020-12-13   -1
10727 2020-11-16        424 2020-12-12 2020-11-16          423  2020-12-13   -1
10878 2020-11-20        450 2020-12-12 2020-11-20          449  2020-12-13   -1
10948 2020-11-22        475 2020-12-12 2020-11-22          474  2020-12-13   -1
11046 2020-11-25        478 2020-12-12 2020-11-25          477  2020-12-13   -1
11076 2020-11-26        430 2020-12-12 2020-11-26          431  2020-12-13    1
11104 2020-11-27        416 2020-12-12 2020-11-27          415  2020-12-13   -1
11132 2020-11-28        452 2020-12-12 2020-11-28          453  2020-12-13    1
11187 2020-11-30        418 2020-12-12 2020-11-30          419  2020-12-13    1
11213 2020-12-01        383 2020-12-12 2020-12-01          386  2020-12-13    3
11283 2020-12-04        430 2020-12-12 2020-12-04          433  2020-12-13    3
11304 2020-12-05        346 2020-12-12 2020-12-05          347  2020-12-13    1
11325 2020-12-06        364 2020-12-12 2020-12-06          365  2020-12-13    1
11345 2020-12-07        368 2020-12-12 2020-12-07          370  2020-12-13    2
11382 2020-12-09        351 2020-12-12 2020-12-09          356  2020-12-13    5
11399 2020-12-10        293 2020-12-12 2020-12-10          313  2020-12-13   20
Sorry for the loser-length post. I feel I must be misunderstanding something, probably something very simple - it has been know before! Can anyone put me straight?
Ok, I don't know what's going on but you might have found something interesting. I save yesterday's by-date-of-death data so when today's comes out I'll compare it.

Re: COVID-19

Posted: Tue Dec 29, 2020 10:14 am
by Little waster
tenchboy wrote: Mon Dec 28, 2020 11:37 pm Just HOW-ON-EARTH can these things 1,6,7, be compatible?

HOW.png
Is it because 1 and 6 involve experts and 7 comes from the Gove comfort space of making baseless assertions which are politically convenient?

It is almost like the haunted mannequin was a journalist in a previous life.

Re: COVID-19

Posted: Tue Dec 29, 2020 11:43 am
by KAJ
shpalman wrote: Tue Dec 29, 2020 7:43 am
KAJ wrote: Mon Dec 28, 2020 9:00 pm I'm guessing reporting delays such as recently discussed might underly my confusion, laid out below.

With respect to graphs I'd posted about lags in reporting deaths:
shpalman wrote: Wed Dec 23, 2020 9:22 pm Not sure, the curves look like they're the wrong way around.

What I want is deaths-per-day versus date-of-death for each data set reported on a different day.
I drafted this rambling response yesterday and have just finished it.

So you want:
  • x-axis: publication date
  • y-axis: number of deaths newly published on this publication date
  • colour: date-of-death.
  • Numbers from different dates-of-death 'stacked' within publication date, so total 'height' at each publication date = total number of deaths newly published on this publication date. Line joins each date-of-death to corresponding values at adjacent publication dates.
Yes, but not necessarily "stacked", just superimposed so that you can see the time-evolution of the deaths-by-date-of-death data. It doesn't have to be "newly published", just whatever total number of deaths was considered to have happened on a certain date according to data from some other date.
Same graph but not stacked.
unstacked.png
unstacked.png (100.09 KiB) Viewed 5235 times

Re: COVID-19

Posted: Tue Dec 29, 2020 12:43 pm
by Woodchopper
Why is COVID-19 less severe in children? A review of the proposed mechanisms underlying the age-related difference in severity of SARS-CoV-2 infections

Factors proposed to explain the difference in severity of COVID-19 in children and adults include those that put adults at higher risk and those that protect children. The former include: (1) age-related increase in endothelial damage and changes in clotting function; (2) higher density, increased affinity and different distribution of angiotensin converting enzyme 2 receptors and transmembrane serine protease 2; (3) pre-existing coronavirus antibodies (including antibody-dependent enhancement) and T cells; (4) immunosenescence and inflammaging, including the effects of chronic cytomegalovirus infection; (5) a higher prevalence of comorbidities associated with severe COVID-19 and (6) lower levels of vitamin D.

Factors that might protect children include: (1) differences in innate and adaptive immunity; (2) more frequent recurrent and concurrent infections; (3) pre-existing immunity to coronaviruses; (4) differences in microbiota; (5) higher levels of melatonin; (6) protective off-target effects of live vaccines and (7) lower intensity of exposure to SARS-CoV-2.
Article is open access.
https://adc.bmj.com/content/early/2020/ ... 020-320338

Re: COVID-19

Posted: Tue Dec 29, 2020 1:26 pm
by KAJ
KAJ wrote: Tue Dec 29, 2020 11:43 am
shpalman wrote: Tue Dec 29, 2020 7:43 am
KAJ wrote: Mon Dec 28, 2020 9:00 pm I'm guessing reporting delays such as recently discussed might underly my confusion, laid out below.

With respect to graphs I'd posted about lags in reporting deaths:

I drafted this rambling response yesterday and have just finished it.

So you want:
  • x-axis: publication date
  • y-axis: number of deaths newly published on this publication date
  • colour: date-of-death.
  • Numbers from different dates-of-death 'stacked' within publication date, so total 'height' at each publication date = total number of deaths newly published on this publication date. Line joins each date-of-death to corresponding values at adjacent publication dates.
Yes, but not necessarily "stacked", just superimposed so that you can see the time-evolution of the deaths-by-date-of-death data. It doesn't have to be "newly published", just whatever total number of deaths was considered to have happened on a certain date according to data from some other date.
Same graph but not stacked.
unstacked.png
Alternative represent
Reasonable subset of that data.
Offset each date-of-death series to x-axis origin (i.e. x axis = lag = Published - date of death)
Boxplot.

Code: Select all

dDF$lag <- with(dDF, Published - date)
ggplot(subset(dDF, (lag <20) & Published > as.Date("2020-11-17") ), 
       aes(y = diff, x = as.factor(lag))) + 
  geom_boxplot()
boxplot.png
boxplot.png (8.91 KiB) Viewed 5200 times
The contribution of a date-of-death peaks after a couple of days then tails off and is more or less complete after a couple of weeks.
This is pretty much the same as my routine chart where the y axis is scaled by maximum of each date-of-death series (and on a non-linear y scale to emphasise differences from 1).
boxplot2.png
boxplot2.png (32.06 KiB) Viewed 5200 times

Re: COVID-19

Posted: Tue Dec 29, 2020 5:24 pm
by shpalman
This is limited because I only started yesterday, but this is the deaths-by-date-of-death data for yesterday and today compared, with a graph starting on the 1st of September:
by-date-of-since-sep1st.png
by-date-of-since-sep1st.png (37.64 KiB) Viewed 5169 times
The 7-day average line stops just before the numbers obviously start dropping off because they haven't come through yet (and in the case of yesterday's data, a few of the countries in the UK hadn't reported for a few days).

Having a closer look at the data since the 1st of November:
by-date-of-since-nov1st.png
by-date-of-since-nov1st.png (31.72 KiB) Viewed 5169 times
You can see the uptick in the line right at the end, suggesting that most of the new deaths being reported really are from the last week or two. Otherwise you can see a few points which move up a tiny bit but not enough to really move the line.

Re: COVID-19

Posted: Tue Dec 29, 2020 5:33 pm
by KAJ
shpalman wrote: Tue Dec 29, 2020 5:24 pm This is limited because I only started yesterday, but this is the deaths-by-date-of-death data for yesterday and today compared, with a graph starting on the 1st of September:
If you want, drop me a PM with an email address and I'll send you the deaths-by-date-of-death data back to 17 Nov i(389 KB) n this format:

Code: Select all

        date DateDeaths  Published
2 2020-11-16        103 2020-11-17
3 2020-11-15        260 2020-11-17
4 2020-11-14        273 2020-11-17
5 2020-11-13        306 2020-11-17
6 2020-11-12        354 2020-11-17
7 2020-11-11        339 2020-11-17
etc.

Re: COVID-19

Posted: Tue Dec 29, 2020 6:38 pm
by KAJ
shpalman wrote: Tue Dec 29, 2020 5:24 pm This is limited because I only started yesterday, but this is the deaths-by-date-of-death data for yesterday and today compared, with a graph starting on the 1st of September:
To see if I'm making a data collection error somewhere, can you use your data to compare the total of deaths-by-date-of-death today and yesterday?

I get:

Code: Select all

> sum(aDF$DateDeaths, na.rm = TRUE) # today
[1] 71560
> sum(subset(dDF, Published == as.Date("2020-12-28"))$DateDeaths, na.rm = TRUE) # yesterday
[1] 69818
> 71560 - 69818 # difference
[1] 1742
But "Deaths within 28 days of positive test by date reported" for today (29-12-2020) is 414 :?