Page 1 of 2

Weird date-as-numeric format

Posted: Tue Sep 28, 2021 5:40 pm
by Bird on a Fire
Got a weird problem here.

I have collected proportion-over-time data very similar to this published graph:
date.png
date.png (48.28 KiB) Viewed 2961 times
and I'd like to compare my data with the fitted curve.

The paper gives the fitted quadratic as Y = 598809 – 31.075X + 0.0004X2.

I'm trying to figure out what format I need to convert my dates into to fit a quadratic on the same scale. (Wouldn't be my first choice of model, but I'm trying to compare with the old paper).

That quadratic has a root at 42280.960215058, and I cannot figure out how to make 15 July 2005 be even roughly that number. By X=42281.3 Y i's already over 1, so the entire x-axis with about 50 days has to fit between 42280.9 and 42281.3 - dafuq?

The paper's analysis was done in SPSS, which apparently uses Lilian format (number of seconds since October 14, 1582). So it's not that.

In Excel (well, LibreOffice), 42280 is 03/10/15, whereas 15/07/05 is 38548 - so it's not that (number of days from 1 Jan 1900) either.

I'm 99.9% sure no fraud has been committed here, but I cannot figure out how whichever program was used to fit the curve was treating the dates. Anybody got an idea?

Thanks in advance.

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 5:45 pm
by jimbob
What numbers does it spit out on the x-axis to make that fit?

Would it be simply a delta from a recent starting point - say when the population is zero?

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 5:54 pm
by Bird on a Fire
It could be a delta - days after the 1st of some month is quite common. Except that it can't be days, or even months, because the range is too small.

(I don't have the original data, that's a screenshot from the pdf.)

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:13 pm
by WFJ
My guess would be they used a count rather than proportion in the fit, then changed the y axis later. The dates, when converted to reals for the fit, would almost certainly have units of seconds or days.

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:18 pm
by dyqik
Bird on a Fire wrote:
Tue Sep 28, 2021 5:54 pm
It could be a delta - days after the 1st of some month is quite common. Except that it can't be days, or even months, because the range is too small.

(I don't have the original data, that's a screenshot from the pdf.)
Could it be quarters? Your 0.4 range looks like it covers maybe slightly more than a month, which would be a third of a quarter.

Or maybe 100 days?

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:19 pm
by Bird on a Fire
WFJ wrote:
Tue Sep 28, 2021 6:13 pm
My guess would be they used a count rather than proportion in the fit, then changed the y axis later. The dates, when converted to reals for the fit, would almost certainly have units of seconds or days.
That's possible. The y-axis values are supposedly daily mean proportions (of juvenile birds in different flocks - 66 flocks over 16 different days), so each day would have a different denominator etc.

I might just resort to plotting my data separately with the same x-limits, and overlaying the published curve using paint.

Reproducibility ftw.

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:24 pm
by Bird on a Fire
dyqik wrote:
Tue Sep 28, 2021 6:18 pm
Bird on a Fire wrote:
Tue Sep 28, 2021 5:54 pm
It could be a delta - days after the 1st of some month is quite common. Except that it can't be days, or even months, because the range is too small.

(I don't have the original data, that's a screenshot from the pdf.)
Could it be quarters? Your 0.4 range looks like it covers maybe slightly more than a month, which would be a third of a quarter.

Or maybe 100 days?
Hmm, yes that would fit fairly well. I make it that 40 days is 0.36, roughlyish.

But if it's 42280 quarters, that implies starting 10570 years ago. I know SPSS is old, but that seems a bit crazy.

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:30 pm
by dyqik
There's a typo in the formula. Plotting it shows that the quadratic term is nowhere near strong enough to produce the curvature in the plot.
plot.png
plot.png (28.72 KiB) Viewed 2917 times

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:37 pm
by Bird on a Fire
Thanks dyqik! I think you've cracked the problem.

Interestingly the other reported formulas in the paper (for other species) have coefficients on very similar scales.... f.ck knows what's going on there - some weirdness in SPSS's curve-fitting thing, or a f.ckup from the author.

So I'll use Paint, then. You've saved me a lot of time - cheers.

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:46 pm
by Bird on a Fire
The reported degrees of freedom for some of the F tests don't stack up either, which gives me an inkling as to which hypothesis is likelier...

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:47 pm
by dyqik
This gets you somewhat closer to the curve in the plot, but I still have no idea what the date units are.
plot.png
plot.png (43.97 KiB) Viewed 2909 times

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 6:51 pm
by dyqik
By the way, SMath is a very useful tool for quickly throwing this kind of thing together.

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 8:00 pm
by Bird on a Fire
Thanks again! MathCAD looks handy.

I'll drop the author an email and see if he can shed some light, but otherwise won't sweat it.

Re: Weird date-as-numeric format

Posted: Tue Sep 28, 2021 9:56 pm
by monkey
Bird on a Fire wrote:
Tue Sep 28, 2021 8:00 pm
Thanks again! MathCAD looks handy.

I'll drop the author an email and see if he can shed some light, but otherwise won't sweat it.
If there was a typo, the paper will need a correction. I've made that happen before when I found one where the equations were correct but put on the wrong figures (nothing else wrong with the paper).

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 1:14 am
by Millennie Al
I have had a look at the paper and I think the equations are wrong. I'd guess that the wrong input was used to derive them. The graphs look plausible, though, so maybe they were plotted first and then the analysis was run for the equations and used the wrong columns or something.

Here are my notes as I may have made errors in copying or solving the equations. For each equation I solve for y=0 and also estimate it by eye from the corresponding graph. The values are then summarised at the end, sorted. T is 15 July. I also not to minor anomalies between graph and text.

redshank
y = 366.48 – 0.0095x
0 at about T+90 days = 38577

Black-Tailed godwit
y = 598809 – 31.075x + 0.0004x^2
0 at about T+7 = 42281
The text says "The first fledged juvenile in a flock was found on 19th July" (T+4), but looks more like T+7

dunlin
y = –330820 + 17.134x – 0.0002x^2
0 at about T-2 = 29391

red knot
y = – 520573 + 26.956x – 0.0003x^2
0 at about T+10 = 28099

Purple sandpiper
y = – 907.64 + 0.0235x
0 at about T+3 = 38623

Sanderling
y = –1154.764 + 0.0299x
0 at about T+17 = 38621

Turnstone
y = 772857 – 40.088x + 0.0005x^2
0 at about T+10 = 47920

Ringed plover
Y = – 484.43 + 0.0126X
"When the first flocks were surveyed on 24 July, JP was already as high as 0.2"
but graph shows first two values are 0.4
0 at about T-20 = 38447

Sorting all those zeroes by solution:
-2 29391
-20 38447
+90 38577
+17 38621
+3 38623
+7 42281
+10 47920

There is clearly no consistent time value there. Maybe the fitting used total number of juveniles seen, or total birds seen, instead of JP. Or maybe the date conversion went wrong. It's probably a simple fix for anyone who has the raw data - hopefully it's still around as the paper was published in 2006.

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 10:40 am
by Bird on a Fire
Wow - thanks Millennie Al. That's above and beyond what I was expecting!

I'll get in touch with the author (I've met him) and see if he can shed some light.

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 3:55 pm
by sheldrake
The number given is close to, but not exactly, the number of days between Jan 1st 1900 and July 5th 2015

42188 vs 42280

Probably a red herring.

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 4:09 pm
by shpalman
That's definitely the "wrong" way to be doing it if similar curves have such wildly different coefficients, mainly because the date squared ends up being a huge number which needs a similarly huge constant to offset it.

Something like a(x-x0)^2+b(x-x0)+c would have been better - set x0 to the start of the data series somewhere, or leave it free (and its value will actually tell you something useful). Then depending on the model you might even be able to set c=0.

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 6:29 pm
by shpalman
Bird on a Fire wrote:
Tue Sep 28, 2021 5:40 pm
... 15 July 2005...
sheldrake wrote:
Thu Sep 30, 2021 3:55 pm
The number given is close to, but not exactly, the number of days between Jan 1st 1900 and July 5th 2015
Which year is it supposed to be anyway?

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 6:33 pm
by sheldrake
Typo on my part, I did the calculation for 2005

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 6:38 pm
by shpalman
shpalman wrote:
Thu Sep 30, 2021 4:09 pm
That's definitely the "wrong" way to be doing it if similar curves have such wildly different coefficients, mainly because the date squared ends up being a huge number which needs a similarly huge constant to offset it.
... which means that its coefficient is a small number, but it's vitally important to be machine-precise with it or else it completely changes the behaviour. You can't just say, as the author did, meh it's small I'll just give one significant figure.

Using LibreOffice's zero day convention (30/12/1899) and assuming the graph is 2005 I get something similar to the graph with

0.0004395799826*(date)^2-33.885347*date+653019.2

but if you don't use all those decimal places it is nowhere near.

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 8:32 pm
by shpalman
It's https://www.hi.is/sites/default/files/m ... _props.pdf ?

By the way my "data" from the Black-Tailed Godwit image is something like

Code: Select all

15/07/05	0.0
09/09/05	1.0
18/07/05	0.0
20/07/05	0.0
21/07/05	0.1
22/07/05	0.0
27/07/05	0.0
02/08/05	0.1
03/08/05	0.1
04/08/05	0.3
08/08/05	0.1
10/08/05	0.6
12/08/05	0.7
16/08/05	0.8
18/08/05	0.4
24/08/05	0.7
26/08/05	1.0
27/08/05	1.0
... if anyone wants to play with fitting a quadratic to it using various options for day zero.

As I said I think the problem, apart from not knowing which date is day zero, is imprecision in the reported coefficients, and I think the linear fits will suffer less from that:
Millennie Al wrote:
Thu Sep 30, 2021 1:14 am
...
redshank
y = 366.48 – 0.0095x
0 at about T+90 days = 38577
...
Purple sandpiper
y = – 907.64 + 0.0235x
0 at about T+3 = 38623

Sanderling
y = –1154.764 + 0.0299x
0 at about T+17 = 38621
...
Ringed plover
Y = – 484.43 + 0.0126X
"When the first flocks were surveyed on 24 July, JP was already as high as 0.2"
but graph shows first two values are 0.4
0 at about T-20 = 38447
Keeping the zeroes corresponding to the linear fits:
-20 38447
+90 38577
+17 38621
+3 38623
They're much closer together. 38548 is 15/07/15 in LibreOffice.

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 8:53 pm
by dyqik
Is there really enough data there to conclude that a quadratic fit is better than a linear fit?

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 8:58 pm
by shpalman

Re: Weird date-as-numeric format

Posted: Thu Sep 30, 2021 9:08 pm
by monkey
dyqik wrote:
Thu Sep 30, 2021 8:53 pm
Is there really enough data there to conclude that a quadratic fit is better than a linear fit?
Both are wrong - I'd be surprised if you started a bit earlier and measured a negative population :)