Weird date-as-numeric format

Get your science fix here: research, quackery, activism and all the rest
User avatar
Bird on a Fire
Princess POW
Posts: 10137
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: Weird date-as-numeric format

Post by Bird on a Fire » Thu Sep 30, 2021 10:05 pm

That is the right paper, and yes the data is from 2005.

I'm not sure why I got such a different value from LibreOffice - is that really the solution?

Good point about the precision.
We have the right to a clean, healthy, sustainable environment.

User avatar
Bird on a Fire
Princess POW
Posts: 10137
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: Weird date-as-numeric format

Post by Bird on a Fire » Thu Sep 30, 2021 10:07 pm

monkey wrote:
Thu Sep 30, 2021 9:08 pm
dyqik wrote:
Thu Sep 30, 2021 8:53 pm
Is there really enough data there to conclude that a quadratic fit is better than a linear fit?
Both are wrong - I'd be surprised if you started a bit earlier and measured a negative population :)
Yeah, it's got to start at 0 (no juveniles at the beginning of the season). You may or may not reach a mean JP of 1, depending on whether the last juveniles depart alone or with adults, but it seems sensible to have 1 as another asymptote.

I'm using logistic regression to do it properly, but I'd also like to compare these curves if possible.
We have the right to a clean, healthy, sustainable environment.

User avatar
Bird on a Fire
Princess POW
Posts: 10137
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: Weird date-as-numeric format

Post by Bird on a Fire » Thu Sep 30, 2021 10:29 pm

Bird on a Fire wrote:
Thu Sep 30, 2021 10:05 pm
That is the right paper, and yes the data is from 2005.

I'm not sure why I got such a different value from LibreOffice - is that really the solution?

Good point about the precision.
Yes, I just double-checked and if I put 01/06/2005 as a date in LibreOffice (version 6.4.7.2) then convert to numeric I get 38504. And 42280 comes out as 03/10/15. Checking the options, the zero date is 30/12/1899. The rest is presumably the decimal precision error shpalman identified.

Still, if I just need to work out an integer offset for Julian day of year I think this is doable. I'll re-fit the original quadratic in R.

Thank you, hivemind! :D
We have the right to a clean, healthy, sustainable environment.

Millennie Al
After Pie
Posts: 1621
Joined: Mon Mar 16, 2020 4:02 am

Re: Weird date-as-numeric format

Post by Millennie Al » Thu Sep 30, 2021 11:19 pm

shpalman wrote:
Thu Sep 30, 2021 8:32 pm
Keeping the zeroes corresponding to the linear fits:
-20 38447
+90 38577
+17 38621
+3 38623
They're much closer together. 38548 is 15/07/15 in LibreOffice.
That's almost certainly it. If you re-run the linear equations making the x coefficient +/- 1 unit in the leasyt significant place you get:

T -unit exact +unit
-20 38144 38447 38754
0 38175 38577 38987
0 38492 38621 38750
3 38459 38623 38788

and 38548 is in all those intervals.

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Fri Oct 01, 2021 7:09 am

shpalman wrote:
Thu Sep 30, 2021 8:32 pm
...
They're much closer together. 38548 is 15/07/15 in LibreOffice.
15/07/05

Sorry.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Fri Oct 01, 2021 9:43 am

shpalman wrote:
Thu Sep 30, 2021 6:38 pm
shpalman wrote:
Thu Sep 30, 2021 4:09 pm
That's definitely the "wrong" way to be doing it if similar curves have such wildly different coefficients, mainly because the date squared ends up being a huge number which needs a similarly huge constant to offset it.
... which means that its coefficient is a small number, but it's vitally important to be machine-precise with it or else it completely changes the behaviour. You can't just say, as the author did, meh it's small I'll just give one significant figure.

Using LibreOffice's zero day convention (30/12/1899) and assuming the graph is 2005 I get something similar to the graph with

0.0004395799826*(date)^2-33.885347*date+653019.2

but if you don't use all those decimal places it is nowhere near.
So here's my scrape of the data from the image, LibreOffice's "I wanted a curve so I made one with math" fit as the solid line with the equation displayed on the graph, and the crosses which follow that line are plotted using the equation given above to check that you really do need all those stupid digits.
weird-date-quadfit.png
weird-date-quadfit.png (25.52 KiB) Viewed 2304 times
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

User avatar
Bird on a Fire
Princess POW
Posts: 10137
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: Weird date-as-numeric format

Post by Bird on a Fire » Fri Oct 01, 2021 10:21 am

shpalman wrote:
Fri Oct 01, 2021 9:43 am
shpalman wrote:
Thu Sep 30, 2021 6:38 pm
shpalman wrote:
Thu Sep 30, 2021 4:09 pm
That's definitely the "wrong" way to be doing it if similar curves have such wildly different coefficients, mainly because the date squared ends up being a huge number which needs a similarly huge constant to offset it.
... which means that its coefficient is a small number, but it's vitally important to be machine-precise with it or else it completely changes the behaviour. You can't just say, as the author did, meh it's small I'll just give one significant figure.

Using LibreOffice's zero day convention (30/12/1899) and assuming the graph is 2005 I get something similar to the graph with

0.0004395799826*(date)^2-33.885347*date+653019.2

but if you don't use all those decimal places it is nowhere near.
So here's my scrape of the data from the image, LibreOffice's "I wanted a curve so I made one with math" fit as the solid line with the equation displayed on the graph, and the crosses which follow that line are plotted using the equation given above to check that you really do need all those stupid digits.
weird-date-quadfit.png
That's awesome - thank you so much! :D
We have the right to a clean, healthy, sustainable environment.

User avatar
dyqik
Princess POW
Posts: 7526
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: Weird date-as-numeric format

Post by dyqik » Fri Oct 01, 2021 11:44 am

It still looks like a straight line starting at some point on the x axis fits better to me.

And a gaussian* convolved with a line or a step function starting at some point in time would be a much better motivated model (i.e. a "cloud" of birds arriving at some point in time, with a few ahead of the cloud and some stragglers, or the equivalent for what's being measured).

A quadratic needs some kind of explanation for why the distribution should be quadratic before you try to fit it.

* Because all soft edge things look like a gaussian convolved with something if you squint a bit.

User avatar
Bird on a Fire
Princess POW
Posts: 10137
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: Weird date-as-numeric format

Post by Bird on a Fire » Fri Oct 01, 2021 11:54 am

The data is really representing the ratio between two (very) roughly normal distributions centred on different dates - adults arrive in Iceland, successful fledging happens in a staggered way, then adults leave before juveniles.

I don't think a quadratic is the best way to model that, but you would expect a curve that trundles along near zero for a while. I think it's fine for descriptive, if not predictive, purposes. (The godwit is one of few species in that paper where all the birds are breeding in Iceland, rather than being joined by migrants from more northerly populations in Greenland and Canada.)

At some point I can play around with simulations, but I've got to finish writing, practise and record a conference presentation by the end of the day, so y'know ;)
We have the right to a clean, healthy, sustainable environment.

User avatar
basementer
Dorkwood
Posts: 1504
Joined: Mon Nov 11, 2019 1:03 pm
Location: 8024, Aotearoa
Contact:

Re: Weird date-as-numeric format

Post by basementer » Fri Oct 01, 2021 1:41 pm

dyqik wrote:
Fri Oct 01, 2021 11:44 am
It still looks like a straight line starting at some point on the x axis fits better to me.

And a gaussian* convolved with a line or a step function starting at some point in time would be a much better motivated model (i.e. a "cloud" of birds arriving at some point in time, with a few ahead of the cloud and some stragglers, or the equivalent for what's being measured).

A quadratic needs some kind of explanation for why the distribution should be quadratic before you try to fit it.

* Because all soft edge things look like a gaussian convolved with something if you squint a bit.
This is so f.cking important.
Money is just a substitute for luck anyway. - Tom Siddell

User avatar
dyqik
Princess POW
Posts: 7526
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: Weird date-as-numeric format

Post by dyqik » Fri Oct 01, 2021 3:54 pm

Bird on a Fire wrote:
Fri Oct 01, 2021 11:54 am
The data is really representing the ratio between two (very) roughly normal distributions centred on different dates - adults arrive in Iceland, successful fledging happens in a staggered way, then adults leave before juveniles.

I don't think a quadratic is the best way to model that, but you would expect a curve that trundles along near zero for a while. I think it's fine for descriptive, if not predictive, purposes. (The godwit is one of few species in that paper where all the birds are breeding in Iceland, rather than being joined by migrants from more northerly populations in Greenland and Canada.)

At some point I can play around with simulations, but I've got to finish writing, practise and record a conference presentation by the end of the day, so y'know ;)
A step function smoothed with a gaussian would give you a function that goes from zero to one with some width, and only requires two parameters to be fit rather than three for a quadratic.

That's not a model, it's just the simplest description of the measure that obeys the same constraints as the measure (can't be less than zero, can't go above one).

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Fri Oct 01, 2021 4:18 pm

dyqik wrote:
Fri Oct 01, 2021 3:54 pm
Bird on a Fire wrote:
Fri Oct 01, 2021 11:54 am
The data is really representing the ratio between two (very) roughly normal distributions centred on different dates - adults arrive in Iceland, successful fledging happens in a staggered way, then adults leave before juveniles.

I don't think a quadratic is the best way to model that, but you would expect a curve that trundles along near zero for a while. I think it's fine for descriptive, if not predictive, purposes. (The godwit is one of few species in that paper where all the birds are breeding in Iceland, rather than being joined by migrants from more northerly populations in Greenland and Canada.)

At some point I can play around with simulations, but I've got to finish writing, practise and record a conference presentation by the end of the day, so y'know ;)
A step function smoothed with a gaussian would give you a function that goes from zero to one with some width, and only requires two parameters to be fit rather than three for a quadratic.

That's not a model, it's just the simplest description of the measure that obeys the same constraints as the measure (can't be less than zero, can't go above one).
That's the error function isn't it? Or rather, (1+erf[(x-x0)/w])/2 if you want to go from 0 to 1 around x0 and your width parameter is w.

https://docs.scipy.org/doc/scipy/refere ... l.erf.html

I'm sure there's an implementation in whatever mathematical software you're using.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

User avatar
dyqik
Princess POW
Posts: 7526
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: Weird date-as-numeric format

Post by dyqik » Fri Oct 01, 2021 4:31 pm

shpalman wrote:
Fri Oct 01, 2021 4:18 pm
dyqik wrote:
Fri Oct 01, 2021 3:54 pm
Bird on a Fire wrote:
Fri Oct 01, 2021 11:54 am
The data is really representing the ratio between two (very) roughly normal distributions centred on different dates - adults arrive in Iceland, successful fledging happens in a staggered way, then adults leave before juveniles.

I don't think a quadratic is the best way to model that, but you would expect a curve that trundles along near zero for a while. I think it's fine for descriptive, if not predictive, purposes. (The godwit is one of few species in that paper where all the birds are breeding in Iceland, rather than being joined by migrants from more northerly populations in Greenland and Canada.)

At some point I can play around with simulations, but I've got to finish writing, practise and record a conference presentation by the end of the day, so y'know ;)
A step function smoothed with a gaussian would give you a function that goes from zero to one with some width, and only requires two parameters to be fit rather than three for a quadratic.

That's not a model, it's just the simplest description of the measure that obeys the same constraints as the measure (can't be less than zero, can't go above one).
That's the error function isn't it? Or rather, (1+erf[(x-x0)/w])/2 if you want to go from 0 to 1 around x0 and your width parameter is w.

https://docs.scipy.org/doc/scipy/refere ... l.erf.html

I'm sure there's an implementation in whatever mathematical software you're using.
Yeah, that's the one.

User avatar
jimbob
Light of Blast
Posts: 5276
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: Weird date-as-numeric format

Post by jimbob » Fri Oct 01, 2021 8:21 pm

Screenshot 2021-10-01 211926.png
Screenshot 2021-10-01 211926.png (19.36 KiB) Viewed 2233 times
Given the type of data, If you are going to do a curve, aren't the populations going to be something akin to the epidemic modelling that we've got rather familiar with over time.

I hesitate to use the phrase "Gompertz function" but it *is* population dynamics.

Can the value exceed 1?
Have you considered stupidity as an explanation

Millennie Al
After Pie
Posts: 1621
Joined: Mon Mar 16, 2020 4:02 am

Re: Weird date-as-numeric format

Post by Millennie Al » Sat Oct 02, 2021 4:30 am

jimbob wrote:
Fri Oct 01, 2021 8:21 pm
Can the value exceed 1?
No. It's the proportion of birds seen which are juveniles, so it must be a rational number between 0 and 1 inclusive, further limited by the maximum possible number of birds you can count (whether due to counting ability or the fact that there's only so many birds in the world).

User avatar
jimbob
Light of Blast
Posts: 5276
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: Weird date-as-numeric format

Post by jimbob » Sat Oct 02, 2021 8:39 am

Millennie Al wrote:
Sat Oct 02, 2021 4:30 am
jimbob wrote:
Fri Oct 01, 2021 8:21 pm
Can the value exceed 1?
No. It's the proportion of birds seen which are juveniles, so it must be a rational number between 0 and 1 inclusive, further limited by the maximum possible number of birds you can count (whether due to counting ability or the fact that there's only so many birds in the world).

I thought it had to be something like that, so any sensible function that tells you anything useful would have to also fit in that range.

and which is why I mentioned the Gompertz function.

https://en.m.wikipedia.org/wiki/Gompertz_function
Have you considered stupidity as an explanation

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Sat Oct 02, 2021 10:35 am

black-tailed-godwit.png
black-tailed-godwit.png (22.26 KiB) Viewed 2194 times
Most of the time it took to do that was me learning how to deal with date formats in numpy and matplotlib.

Fitting the error function centres it on the 12th of August with a width* of 14.7 days.

* - or 10.4 days if you multiply by sqrt(2) in the definition, i.e. maybe it's more "correct" to do 0.5*(1+erf([x-x0]/[sqrt(2)*w]))
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Sat Oct 02, 2021 11:37 am

black-tailed-godwit-log.png
black-tailed-godwit-log.png (22.28 KiB) Viewed 2188 times
That's using the simple logistic curve, 1/(1+exp(-((d-d0)/w))), and d0 is still the 12th of August but its width parameter is 6.2 days. Of course you can't quantitatively compare the width parameters between the erf and logistic models.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Sat Oct 02, 2021 11:46 am

black-tailed-godwit-fit.png
black-tailed-godwit-fit.png (28.12 KiB) Viewed 2186 times
Yeah there's not much difference; I wouldn't try to make the model more complicated than this.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Sat Oct 02, 2021 2:18 pm

Since there was no queue at the petrol station and no shortages at the supermarket I had time also to check the quadratic fit:
black-tailed-godwit-quad.png
black-tailed-godwit-quad.png (40.83 KiB) Viewed 2171 times
So the fitting of a(d-d0)^2+b(d-d0)+c shifts d0 to 7th of August, and gives a=0.0004197306249464087, b=0.025756320921000418, c=0.33376244249276016, so the coefficient for the second order term is still roughly what it is in the paper.

This shifted version is a lot less sensitive, since I also plotted it with the coefficients rounded off and you can see it's more or less the same. Just that it's not at all physically reasonable to use this model.
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

Millennie Al
After Pie
Posts: 1621
Joined: Mon Mar 16, 2020 4:02 am

Re: Weird date-as-numeric format

Post by Millennie Al » Mon Oct 04, 2021 12:59 am

If we're getting around to discussing how the data should be graphed, I'd very much agree with the sentiment already expressed that it should depend on the underlying processes rather than just looking for stuff which fits (though, of course, that strategy does occasionally reveal some underlying mechanism that was previously unknown). So here's a trivial example. Suppose we have obsevations whereby starting on day 1, one adult arrives per day. Starting a day later, one juvenile arrives per day. When 16 adults have arrived, they stop arriving and start leaving at one per day. The same happens with the juveniles. This can be shown in this simple graph:
birds.png
birds.png (4.25 KiB) Viewed 2146 times
If we then decide to graph JP instead (which is juvenile/(adult + juvenile)), we get:
jp.png
jp.png (3.28 KiB) Viewed 2146 times
which looks nice and complicated, but tells us a lot less than the simple graph.

It may be that the proportion of juveniles is inherently significant. For example, maybe parents feed their offspring, so a low proportion of juveniles implies food shortage, while an unusually high proportion implies a good year for breeding (or heavier predation on adults).

User avatar
shpalman
Princess POW
Posts: 8241
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond
Contact:

Re: Weird date-as-numeric format

Post by shpalman » Mon Oct 04, 2021 8:44 am

Millennie Al wrote:
Mon Oct 04, 2021 12:59 am
If we're getting around to discussing how the data should be graphed, I'd very much agree with the sentiment already expressed that it should depend on the underlying processes rather than just looking for stuff which fits (though, of course, that strategy does occasionally reveal some underlying mechanism that was previously unknown). So here's a trivial example. Suppose we have obsevations whereby starting on day 1, one adult arrives per day. Starting a day later, one juvenile arrives per day. When 16 adults have arrived, they stop arriving and start leaving at one per day. The same happens with the juveniles. This can be shown in this simple graph...
Bird on a Fire wrote:
Fri Oct 01, 2021 11:54 am
The data is really representing the ratio between two (very) roughly normal distributions centred on different dates...
Here's what you get with two normal distributions with peak height of 1 and the same width. The error function and logistic curves give roughly the right behaviour but the width parameters are related to both the width and the spacing of the two Gaussians. And then it depends if the peak height should be fixed, normalized so that the area is fixed, or variable, or what.
equally-normal.png
equally-normal.png (58.75 KiB) Viewed 2124 times
having that swing is a necessary but not sufficient condition for it meaning a thing
@shpalman@mastodon.me.uk

Post Reply