Ok great, thanks! That's clear now.
So the problem here isn't with the calculation of means per se, it's linking the calculated means to the rest of the data. It's basically a database join problem, but luckily a pretty straightforward one I think.
It's weird about R not finding join() if you have the dplyr or plyr package loaded. You can force R to look within a particular package by using a double colon (ooo errr) like this:
Code: Select all
but if R is just being super weird I'm fairly sure the merge() function from base R does the same thing in this case, just slower which shouldn't matter unless you have a bazillion otoliths.
If you want a dataset with a row for each site, and preserving all the higher level geographical information, you need to make two data.frames and then join/merge them.
Dataframe 1 is the means you already have.
Dataframe 2 is a dataset with one line for each site, and all the other columns of interest, but without all the individual measurements. You can get this by doing a subset of your dataset that only has the columns you need. Then, you can use the duplicated() function to find all the extra site rows, because many sites will have multiple measurements:
Code: Select all
new_df_withoutAllThosePeskyDuplications <- new_df[!duplicated,]
(The duplicated function finds all the duplicated rows in whatever it's given (so it also works on vectors, which is very handy), and then you're using the square brackets to subset the data.frame and the ! to remove whichever rows return TRUE (this time it is
Your new_df_withoutAllThosePeskyDuplications will then have one line for each site, and all the other columns you selected for the subset earlier (which as I understand it should all have unique values).
To create the table of your dreams, all you need to do now is join them. The merge() and join() functions both work in the same way, appending the entries of the second table to the first table, according to the values in columns shared between the two tables. Your two data.frames should have exactly one column in common, Site, which will be used to join them.
Code: Select all
joined_data <- merge(new_df_withoutAllThosePeskyDuplications, Site_Means)
should give you a dataset of unique site details, with an additional column for the mean of whatever value.
If there are several values you want to take site means of, you can then repeat the process adding each new column of means to joined_data, and so on.
I hope some of that makes sense? As it happens I've just been tackling almost exactly the same problem for a PhD analysis, but with network analysis instead of isotopes. (I had a cool isotope plan for my PhD, but dropped it because I couldn't get the samples)
I've sort-of deliberately not given all the exact code, because finding the answers on my own with some hints was how I learned, and it sounds like you're keen to build this as a skill long term. But also I'd be happy to go through the actual data, by email or zooming or just on here. I'm a bit of an R evangelist and I've had plenty of help with it, so I'm trying to pay it forward
He has the grace of a swan, the wisdom of an owl, and the eye of an eagle—ladies and gentlemen, this man is for the birds!