Title: | Supporting Files and Functions for the Book Bayesian Modelling with 'JAGS' |
---|---|
Description: | All the data and functions used to produce the book. We do not expect most people to use the package for any other reason than to get simple access to the 'JAGS' model files, the data, and perhaps run some of the simple examples. The authors of the book are David Lucy (now sadly deceased) and James Curran. It is anticipated that a manuscript will be provided to Taylor and Francis around Augus 2020, with bibliographic details to follow at that point. Until such time, further information can be obtained by emailing James Curran. |
Authors: | James Curran [aut, cre], David Lucy [aut] |
Maintainer: | James Curran <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2024-11-25 02:51:33 UTC |
Source: | https://github.com/jmcurran/jaggr |
Aspartic acid data for modern upper and lower first pre-molars: taken from Gillard et al 1991
acid.df
acid.df
A data.frame with 37 rows and 3 columns:
Age in years.
Period of tooth, modern or victorian.
Percentage of D-aspartic acid.
Gillard, R.D., Hardman, S.M., Pollard, A.M., Sutton, P.A. and Whittaker, D.K. (1991) 'Determinations of age at death in the archaeological populations using the D/L ratio of aspartic acid in dental collagen' in Archaeometry 90, eds. Pernicka, E. and Wagner, G.A., p.637-644, Birkhauser Verlag, Berlin.
An experiment was conducted to compare the energy requirements of three physical activities: running, walking and bicycle riding. Eight subjects were asked to run, walk and bicycle a measured distance, and the number of kilocalories expended per kilometre was measured for each subject during each activity. The activities are run in random order with time for recovery between activities. Each activity was monitored exactly once for each individual.
activity.df
activity.df
A data.frame with 24 rows and 3 columns:
a subject ID.
running, walking, riding.
energy expended during activity, in kilocalories (Cal)
Milton, J. S. (1992). Statistical Methods in the Biological and Health Sciences 2nd Edition, McGraw-Hill, New York, p. 316–319.
This data consists of 50 sentence lengths from each of 8 books. The books “Disclosure” and “Rising Sun” were written by Michael Crichton, whilst the others “Four Past Midnight”, “The Dark Half”, “ Eye of the Dragon”, “The Shining”, “The Stand” and “The Tommy-Knockers” where written by Stephen King. The pages and sentences where chosen using a multistage design where the pages where selected at random, and then sentences within each page were selected at random. These data were collected by James Curran.
books.df
books.df
The data frame consists of 400 observations on 2 variables.
sentence length
a factor with levels: 4.Past.Mid, Dark.Half, Disclosure, Eye.Drag,Rising.Sun, Shining, Stand, T.Knock.
a factor with levels: MC, SK.
Students learning to programme are often taught the bubble sort algorithm for several reasons. Firstly, sorting is a commonly used operation in programming, so having a way of sorting vectors into order is useful. Secondly, it lets the instructor talk about the order of the algorithm, and how it is very inefficient. In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows. The bubble sort algorithm is known to be O(n^2). That is, the time taken to run the algorithm increases quadratically (with the square) with the size of the vector.
bsort.df
bsort.df
A data.frame with 200 rows and 2 columns:
Size of the random vector.
Time in seconds taken to sort the vector using bubbleSort
.
This data set consists of 200 observations generated using the following code: “' set.seed(123) N = 200 bsort.df = data.frame(n = rep(0, N), time = rep(0, N))
n = sample(100:1000, size = N, replace = TRUE)
pb = txtProgressBar(0, N, style = 3)
for(i in 1:N) x = rnorm(n[i]) bsort.df$n[i] = n[i] bsort.df$time[i] = system.time(bubbleSort(x))[1] setTxtProgressBar(pb, i) close(pb) “' It consists of the times taken to sort 200 vectors of random length between 100 and 1,000. The vectors themselves are random samples of size n[i] from the standard normal distribution.
Sorts the vector x into ascending order using a very inefficient bubble sort algorithm
bubbleSort(x)
bubbleSort(x)
x |
a vector of numbers |
the vector x
sorted into ascending order
set.seed(123) x = rnorm(10) bubbleSort(x)
set.seed(123) x = rnorm(10) bubbleSort(x)
Calculus marks from the 2012 first year calculus course from the Department of Mathematics and Statistics at Lancaster University.
calculus.df
calculus.df
A data.frame with 147 rows and two columns:
final coursework mark out of 100.
final examination mark out of 100.
George Moran, Department of Mathematics and Statistics at Lancaster.
This data set consists of 3,618 listings scraped from the New Zealand website tradee. trademe is similar to ebay in that is an online auction site which allows sellers to list new and used goods for sale. Goods may be purchased via auction, or outright if the seller has enabled that option. Many New Zealanders, including commericial car dealers, use the website to buy and sell cars. The listings gathered consist are mostly for Mazda 3 and Toyota Corolla vehicles, along with imported vehicles which may be the same car, but with different badging.
car.prices.df
car.prices.df
An object of class data.frame
with 3618 rows and 13 columns.
@format A data.frame with 3,618 rows and 14 columns:
The observation number, from 1 to 3618.
The listing title - basically the make and model of the car.
The year of manufacture of the vehicle.
The age of the vehicle as of 2013 (when this data was collected). So a car manufactured in 2009 would have an age of 4, for example.
The asking price, in NZD.
The number of kilometres on the odometer—i.e. the "mileage."
The displacement of the engine in cubic centimetres.
The fuel used by the vehicle: either Petrol (gasoline) or Diesel.
The number of doors in the car. Note 3 and 5 door cars are hatchbacks.
The colour of the car given in the listing.
An attempt to standardise the colour to a reduced category. For example sky blue, and light blue would both get transformed to blue.
The manufacturer of the car: either Mazda or Toyota
These observations were made by Robertson et. al. They are the mean delta 13 C compositions of several individual trees from two locations in Central England mean temperatures from the CET are also given
carbon.df
carbon.df
A data.frame with 200 rows and 4 columns:
The data comes from an experiment to measure the mortality of cancer cells under radiation under taken in the Department of Radiology, University of Cape Town. Four hundred cells were placed on a dish, and three dishes were irradiated at a time, or occasion. After the cells were irradiated, the surviving cells were counted. Since cells would also die naturally, dishes with cells were put into the radiation chamber without being irradiated, to establish the natural mortality. These data gives only these zero-dose data. these data are from ozDASL
cell_surv.df
cell_surv.df
An object of class data.frame
with 27 rows and 2 columns.
The amount of fat (g) and energy (Cal) in 16 chocolate bars. Source is unknown, but we would be happy to give credit if someone tells us.
chocolate.df
chocolate.df
A data.frame with 16 rows and 2 columns:
energy, in Calories = kilocalories
fat content, in grams
Source is unknown, but we would be happy to give credit if someone tells us.
This data arose from an experiment conducted by David to testing the insulation of the ground floor bedroom of his house–The Spinney. The idea was that the better the insulation the slower the rate cooling, so for some exponential model y(t) = y(0) exp(-lambda t) - the value of lambda should go down for a better insulated room In the experiment, David ran two extension cords into the room through a service port to power two electric heaters and a fan. He then sealed up the room by shutting windows and door. The heaters were left to heat up the room as much as they could. This happened to be about 24.6 C. He then turned the heaters and fan off and the recorded the rate of cooling by observing a temperture probe from outside the room for about two hours. Standard theory says that the rate of cooling is proportional to the temperature differential between the indoor and outdoor temperatures. To control for this days were selected which had approximately the same external temperatures. The room has walls which are external and internal. It was assumed that the outside and internal house (no heating) had reached an equilibrium so that we only need to know the outside room, but inside house temperature rather than both
cooling.df
cooling.df
A data.frame with 47 rows and 3 columns:
The time since turning off the heaters and fan
The recorded temperature with absolutely no insulation in the room whatsoever—outside temperature 8.0 C.
The recorded temperature with part of a wall and the floor insulated— outside temperature 8.1 C
David Lucy
This function makes it easy to extract sampled values of one or more parameters. The function can extract multiple parameters from multiple chains
extractValues(x, params, chain = NULL, drop = TRUE, ...)
extractValues(x, params, chain = NULL, drop = TRUE, ...)
x |
an object of class mcmc.list - usually from |
params |
a vector of one or more strings OR regular expressions which identifies the parameters we want to extract from the chain |
chain |
the chain, or chains we want to extract the parameters from. If |
drop |
used to preserve the dimensions of an array. If a single parameter is requested, then the results
will be returned as a vector rather than a matrix if |
... |
any other arguments. Not used yet. |
If there is only one chain or the user asks for results from exactly one chain, then a matrix with class mcmc will be returned
containing only the parameters of interest in the columns. The column names of the matrix will correspond to the parameter. If there is
more than one chain, and the user asks for results from more than one chain, or alternatively leaves chain
as NULL
, then
a list of matrices with class mcmc will be returned where each matrix contains only the parameters of interest in the columns.
The column names of each of the matrices will correspond to the parameter.
This dataset contains the height, weight and 4 fingerprint measurements (length, width, area and circumference), collected from 200 participants. This data was collected with the intention of performing regression analysis to asses whether a significant relationship exists between fingerprint size and physical stature.
fingerprints.df
fingerprints.df
a data.frame with 200 rows and 11 columns:
participant number
self-declared gender of participant female
or male
age in years
dominant hand left
or right
height in centimetres, average of three measurements
weight in kilograms, average of three measurements
fingerprint temperature in degrees Celius
fingerprint height in millimetres
fingerprint width in millimetres
fingerprint area in squared millimetres
fingerprint circumference in millimetres
McMurchie, Beth; Torrens, George; Kelly, Paul (2019). Height, weight and fingerprint measurements collected from 200 participants. Loughborough University. [Dataset](https://doi.org/10.17028/rd.lboro.7539206.v1)
This function provides an easy way for readers to get the JAGS model
files used in the book. The modelID
is the 4-5 character identifier
used in the book. For example to get 'model-001.bugs.R', you would use
getModel("001")
.
getModel(modelID)
getModel(modelID)
modelID |
a string containing a valid model ID |
a string containing the model. The intention is that this can be written to disk.
getModel("001")
getModel("001")
Age estimation based on changes in dental characteristics
gustafson.df
gustafson.df
a data.frame with 759 rows and 10 columns:
sex of subject, female or male.
age, in years.
location in mouth of tooth
tooth identifier
qualitative assessment of remaining dentine
Hedgehog growth
hedgehog.growth.df
hedgehog.growth.df
a data.frame with 77 rows and 2 columns:
Date in DD-Month-YYYY format
weight of the hedgehog, in grams
David Lucy
The Bunnell Index (or BI) is a measurement of how tightly the hedgehog are curled into a ball. One measurement is taken round the middle of the animal to cross at the point where the nose ends ("A," latitudinal circumference). The other measurement, using a second tape measure already secured underneath the animal, is taken round the hedgehog from head to tail ("B," longitudinal circumference). Care must be taken with both measurements to ensure that the ends of the tape measure meet easily without altering the shape/positioning of the hedgehog. When obtaining measurement A, the positioning of the tape measure is crucial; a measurement taken lower down toward the tail can result in a lower (inaccurate) reading. Repeatedly measuring many hedgehogs over several consecutive days demonstrated consistent BI values and hence the reliability of the method. A is divided by B to give a value for the BI. It is important to determine the BI value to two decimal places (i.e., a value of 0.794, becomes 0.79, while a value of 0.805 becomes 0.81).
hedgehog.survival.df
hedgehog.survival.df
A data.frame with 31 observations and 2 columns:
The Bunnell Index (BI) of the hedgehog at the time of admission.
A logical variable recording whether the hedgehog survived or died.
Bunnell, T. (2002) The Assessment of British Hedgehog (Erinaceus europaeus) Casualties on Arrival and Determination of Optimum Release Weights Using a New IndexJournal of Wildlife Rehabilitation 25 (4):11-21
Impact strength of insulation cuts in foot-pounds.
insulation.df
insulation.df
a data.frame with 100 rows and 3 colums:
Lot of insulating material
Lengthwise (Length) or crosswise (Cross)
Impact strength, in foot-pounds (ft-lb)
Ostle, B. (1963). Statistics in Research: Basic Concepts and Techniques for Research. Ames, Iowa. Iowa State University Press.
A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors.
James Curran, David Lucy
Michelson's speed of light data
lightspeed.df
lightspeed.df
a data.frame with 43 rows and 2 columns:
The scaled speed of light measured in a single experiment. The scaling is the measurement minus 299,000 km/s. E.g. the first entry in the data.frame is 850, which is 299,850 km/s.
The year in which the experiment was conducted, either 1879 or 1882.
Stigler, S. M. (1977), "Do robust estimators work with real data?", The Annals of Statistics 5:1055-1098.
Ecologists Michael McCoy and James Gillooly were interested in predicting mortality rates for different species based on a number of variables including body mass, temperature. In their paper (McCoy and Gillooly, 2008) they explore the hypothesis that the natural logarithm of temperature‐corrected mortality rate should be a linear function of the natural logarithm of body mass. The temperature-corrected mortality rate is based upon previous work which draws on results from biology, biochemistry, and thermodynamics. Users are encouraged to read the original source for a deeper explanation.
mortality.df
mortality.df
a data.frame with 2117 rows and 4 columns:
a factor indicating which one of the six taxonimic groups the observation belongs to: bird, fish, invertebrate, mammal, multicellular plant, and phytoplankton.
the species of the observation.
the body mass in grams (g).
the mortality rate.
the average body temperature in degrees Celcius.
average activation energy of heterotrophic respiration in animals (0.65 eV) or photosynthesis in plants (0.32 eV).
mortality corrected by a Boltzmann-Arrhenius factor, specifically, divided by exp(-E/k * (1 / T - 1 / T20)),
where k is Boltzmann constant 8.62 x 10^-5, T20 is 20 degrees Celcius in degrees Kelvin, i.e. 293, and T is average
body temperature temp
in degrees Kelvin.
McCoy, M.W. and Gillooly, J.F. (2008), Predicting natural mortality rates of plants and animals. Ecology Letters, 11: 710-716. https://doi-org.ezproxy.auckland.ac.nz/10.1111/j.1461-0248.2008.01190.x
A group from Queensland University of Technology conducted an experiment where they recorded the distance flown by paper aeroplanes. The experimenters used a sealed corridor at the University, and controlled the design of the aeroplane, the weight of the paper from which each aeroplace was constructed, and the angle of incidence at launch for each paper plane. The data and further notes for this experiment can be found at OzDASL - Australasian Data and Story Library.
planes.df
planes.df
A data.frame with 16 rows and 6 columns:
Distance travelled in mm.
Paper weight in grams per square metre (gsm), either 50 gsm or 80 gsm.
Angle of launch, horizontal (0 degrees) or 45 degrees.
Design of the plane, either simple or advanced.
The treatment number used in the experiment. There are eight combinations of the levels of the factors, so the treatment number corresponds to one of these unique combinations.
Replicate number within treatment. Each treatment is repeated twice so rep
is either 1 or 2.
Mackisack, M. S. (1994). What is the use of experiments conducted by statistics students? Journal of Statistics Education, 2(1).
Smyth, G. K. (2011). Australasian Data and Story Library (OzDASL).
This function overrides the hidden method in the coda
package
that provides a print method for the output of the coda{summary}
function. The idea is to be able to suppress some of the output so that
only the summary statistics of interest are shown. This is primarily used in the
preparation of the book.
## S3 method for class 'summary.mcmc' print( x, digits = max(3, .Options$digits - 3), runDetails = FALSE, means = FALSE, quantiles = TRUE, ... )
## S3 method for class 'summary.mcmc' print( x, digits = max(3, .Options$digits - 3), runDetails = FALSE, means = FALSE, quantiles = TRUE, ... )
x |
an object of type |
digits |
The number of digits to print. |
runDetails |
if |
means |
if |
quantiles |
if |
... |
other arguments passed to |
x is invisibly returned
To measure the health consequences of this contamination, an index of exposure was calculated for each of the nine Oregon counties having frontage on either the Columbia River or the Pacific Ocean. This particular index was based on several factors, including the county's stream distance from Hanford and the average distance of its population from any water frontage. As a covariate, the cancer mortality rate was determined for each of these same counties. The data give the index of exposure and the cancer mortality rate during 1959-1964 for the nine Oregon counties affected. Higher index values represent higher levels of contamination.
radiation.df
radiation.df
An object of class data.frame
with 9 rows and 3 columns.
Fadeley, R. C. (1965). Oregon malignancy pattern physiographically related to Hanford, Washington, Radioisotope Storage. Journal of Environmental Health 27, 883-897.
Times taken for a rat to navigate through a maze
ratmaze.df
ratmaze.df
A data.frame with 135 rows and 4 columns:
An ID for each rat
The treatment adminstered to the subject: control/none, thouiracil, thyroxin.
A maze number.
time, in seconds taken for the rat to navigate the maze.
Root dentine translucency is, in humans, an age related physiological feature. In the dentine of teeth in adult humans the tubecular microstructures fill with a highly crystalline substance making them become nearly invisible when looked at in normal light. This process starts from the apical foramen in early adulthood, and progresses up the tooth into advanced old age. Solheim (Lucy et al., 1996) collected data on age, root dentine translucency for 71 maxillary second incisors from a Norweigian population. The sex of each individual was also noted.
rdt.df
rdt.df
A data.frame with 71 rows and 3 columns:
Age of subject, in years
Sex of subject, female or male
root dentine translucency
Lucy, D., Aykroyd, R.G., Pollard, A.M. and Solheim (1996), T.,"A Bayesian approach to adult human age estimation from dental observations by Johanson's age changes", Journal of Forensic Sciences 41(2):189-194.
Reorders the output from rjags{coda.samples}
to match
the preferred order of the user. The function will stop if one or more of the
specified variable names does not match the variable names in the first
mcmc
object of x
.
## S3 method for class 'mcmc.list' reorder(x, variable.names, ...)
## S3 method for class 'mcmc.list' reorder(x, variable.names, ...)
x |
an object of type |
variable.names |
a vector of variable names in user order. |
... |
other arguments. Currently ignored. |
an object of type mcmc.list
Set Plotting Preferences
setPlotPrefs( mar = c(3, 4, 1, 1), cex = 1, oma = c(0, 0, 0, 0), tcl = -0.35, mgp = c(1.5, 0.5, 0), las = 1, cex.lab = 1, font.lab = 1, lwd = 1, on.graph.line = 3, shading.density = 8, arrow.length = 0.1, on.graph.cex = 1, margin.cex = 1.2, ... )
setPlotPrefs( mar = c(3, 4, 1, 1), cex = 1, oma = c(0, 0, 0, 0), tcl = -0.35, mgp = c(1.5, 0.5, 0), las = 1, cex.lab = 1, font.lab = 1, lwd = 1, on.graph.line = 3, shading.density = 8, arrow.length = 0.1, on.graph.cex = 1, margin.cex = 1.2, ... )
mar |
plot margings |
cex |
character expansion factor |
oma |
outer margins |
tcl |
tick length |
mgp |
somethen |
las |
text rotation on axes |
cex.lab |
plot labels cex |
font.lab |
font of plot labels |
lwd |
line width |
on.graph.line |
no idea |
shading.density |
shading density |
arrow.length |
arrow head length |
on.graph.cex |
character expansion for text on graphs |
margin.cex |
character expansion for text for margins |
... |
other arguments to be passed to par |
the previous par settings so that they can be restored
Shotgun range data In order to test the validity of range-of-fire estimates obtained by the application of regression analysis to shotgun pellet patterns, a blind study was conducted in which questioned pellet patterns were fired at randomly selected ranges between 3.0 and 15.2 m (10 and 50 ft) with two different 12-gauge shotguns. each firing a different type of buckshot cartridge. Test firings at known ranges were also conducted with the same weapons and ammunition.
shotgun.df
shotgun.df
A data frame with 70 observations on 4 variables.
The range in feet of the firing.
The model of shotgun used in the experiment.
A factor recording whether the data was to be used for building/training the model, or testing it.
The area of the smallest rectangle that would enclose the pellet pattern.
Rowe, W.F. and Hanson, S.R. (1985) Range-of-fire estimates from regression analysis applied to the spreads of shotgun pellet patterns: Results of a blind study, Forensic Science International, 28(3-4): 239-250.
Simulated samples of weights from English terrier breeds with the parameter values for the means for the simulation taken from http://www.dogsindepth.com. The variances are assumed to be constant.
terriers.df
terriers.df
A data.frame with 30 rows and 2 columns.
Weight of dog in kg.
Breed, either Skye, Manchester or Norwich.
This function cleans up the formatting
tidy_bugs( path = ".", arrow = TRUE, brace.newline = FALSE, indent = 2, wrap = TRUE, width.cutoff = 50 )
tidy_bugs( path = ".", arrow = TRUE, brace.newline = FALSE, indent = 2, wrap = TRUE, width.cutoff = 50 )
path |
location of file(s) |
arrow |
use the |
brace.newline |
move braces to a new line if TRUE |
indent |
number of spaces to indent code blocks |
wrap |
whether to wrap comments to the linewidth determined by
|
width.cutoff |
passed to |