Title: | Hotelling's T^2 Test and Variants |
---|---|
Description: | A set of R functions which implements Hotelling's T^2 test and some variants of it. Functions are also included for Aitchison's additive log ratio and centred log ratio transformations. |
Authors: | James Curran [aut, cre], Taylor Hersh [aut] |
Maintainer: | James Curran <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0-8 |
Built: | 2024-11-13 02:42:55 UTC |
Source: | https://github.com/jmcurran/hotelling |
A set of R functions and data sets which implements Hotelling's T^2 test, and some variants of it. Functions are also included for Aitchison's additive log ratio and centred log ratio transformations.
James Curran Maintainer: James Curran <[email protected]> ~~ The author and/or maintainer of the package ~~
Aitchison's additive log ratio tranformation for compositional data
alr(form, data, group = NULL)
alr(form, data, group = NULL)
form |
a formula which specifies the denominator variable as the response |
data |
a data frame in which the data is stored |
group |
if not NULL then a character string specifying the name of the grouping variable |
This function will give a warning if zeros are present because the transformed data will have -Infs.
a data frame with the ALR transformation applied to data. Each row in the data frame is standardized with respect to a specific variable by dividing by that variable. The logarithms of the resulting ratios are returned. If a grouping variable is specified, then this is preserved.
James M. Curran
Aitchison, J. (1986). “The Statistical Analysis of Compositional Data”, Chapman and Hall, reprinted in 2003 with additional material by The Blackburn Press
clr
data(bottle.df) ## transform with respect to manganese alr(Mn~., bottle.df, "Number") ## transform the data with respect to barium, but removing the ## grouping in column 1 alr(Ba~., bottle.df[,-1])
data(bottle.df) ## transform with respect to manganese alr(Mn~., bottle.df, "Number") ## transform the data with respect to barium, but removing the ## grouping in column 1 alr(Ba~., bottle.df[,-1])
This data contains the elemental concentration of five different elements (Manganese, Barium, Strontium, Zirconium, and Titanium) in samples of glass taken from six different Heineken beer bottles. 20 measurements were taken from each bottle.
R. L. Bennett. Aspects of the analysis and interpretation of glass trace evidence. Master's thesis, Department of Chemistry, University of Waikato, 2002.
Aitchison's centered log ratio tranformation for compositional data
clr(data, group = NULL)
clr(data, group = NULL)
data |
a data frame in which the data is stored |
group |
if not NULL then a character string specifying the name of the grouping variable |
This function will give a warning if zeros are present because the transformed data will have -Infs.
a data frame with the CLR transformation applied to data. Each row in the data frame is standardized by dividing by the geometric mean of that row. The logarithms of the resulting ratios are returned. If a grouping variable is specified, then this is preserved.
James M. Curran
Aitchison, J. (1986). “The Statistical Analysis of Compositional Data”, Chapman and Hall, reprinted in 2003 with additional material by The Blackburn Press
alr
data(bottle.df) ## transform preserving grouping clr(bottle.df, "Number") ## transform the data but remove the ## grouping in column 1 clr(bottle.df[,-1])
data(bottle.df) ## transform preserving grouping clr(bottle.df, "Number") ## transform the data but remove the ## grouping in column 1 clr(bottle.df[,-1])
This data contains the elemental concentration of nine different elements (Titanium, Aluminium, Iron, Manganese, Magnesium, Calcium, Barium, Strontium, and Zirconium) in specimens of glass taken from two different containers. Ten measurements were taken from each container.
Jose R. Almirall. Discrimination of glass samples by solution based ICP-OES PhD thesis, Department of Chemistry, Florida International University, 1998.
Calculate Hotelling's T-squared test statistic for the difference in two multivariate means.
hotelling.stat(x, y, shrinkage = FALSE, var.equal = TRUE)
hotelling.stat(x, y, shrinkage = FALSE, var.equal = TRUE)
x |
a nx by p matrix containing the data points from sample 1 or a list containing elements |
y |
a ny by p matrix containg the data points from sample 2 or a list containing elements |
shrinkage |
set to |
var.equal |
set to |
Note, the sample size requirements are that nx + ny - 1 > p. The procedure will stop if this is not met and the shrinkage estimator is not being used. The shrinkage estimator has not been rigorously tested for this application (small p, smaller n).
A list containing the following components:
statistic |
Hotelling's (unscaled) T-squared statistic |
m |
The scaling factor - this can be used by by multiplying it with the test statistic, or dividing the critical F value |
df |
a vector of length containing the numerator and denominator degrees of freedom |
nx |
The sample size of sample 1 |
ny |
The sample size of sample 2 |
p |
The number of variables to be used in the comparison |
James M. Curran
Taylor Hersh
Hotelling, H. (1931). “The generalization of Student's ratio.” Annals of Mathematical Statistics 2 (3): 360–378.
Schaefer, J., and K. Strimmer (2005). “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.” Statist. Appl. Genet. Mol. Biol. 4: 32.
Opgen-Rhein, R., and K. Strimmer (2007). “Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach.” Statist. Appl. Genet. Mol. Biol. 6: 9.
NEL, D.G. and VAN DER MERWE, C.A. (1986). “A solution to the - multivariate Behrens-Fisher problem.” Comm. Statist. Theor.- Meth., A15, 12, 3719-3736.
data(container.df) split.data = split(container.df[,-1],container.df$gp) x = split.data[[1]] y = split.data[[2]] hotelling.stat(x, y) hotelling.stat(x, y, TRUE)
data(container.df) split.data = split(container.df[,-1],container.df$gp) x = split.data[[1]] y = split.data[[2]] hotelling.stat(x, y) hotelling.stat(x, y, TRUE)
Performs a two-sample Hotelling's T-squared test for the difference in two multivariate means
hotelling.test(x, ...) ## Default S3 method: hotelling.test( x, y, shrinkage = FALSE, var.equal = TRUE, perm = FALSE, B = 10000, progBar = (perm && TRUE), ... ) ## S3 method for class 'formula' hotelling.test(x, data = NULL, pair = c(1, 2), ...)
hotelling.test(x, ...) ## Default S3 method: hotelling.test( x, y, shrinkage = FALSE, var.equal = TRUE, perm = FALSE, B = 10000, progBar = (perm && TRUE), ... ) ## S3 method for class 'formula' hotelling.test(x, data = NULL, pair = c(1, 2), ...)
x |
a matrix containing the data points from sample 1, or a formula
specifying the elements to be used as a response and the grouping variable
as a predictor, or a list containing elements |
... |
any additional arguments. This is useful to pass the optional arguments for the default call from the formula version |
y |
a matrix containing the data points from sample 2, or a list
containing elements |
shrinkage |
if |
var.equal |
set to |
perm |
if |
B |
if perm is TRUE, then B is the number of permutations to perform |
progBar |
if |
data |
a data frame needs to be specified if a formula is to be used to perform the test |
pair |
a vector of length two which can be used when the grouping factor
has more than two levels to select different pairs of groups. For example
for a 3-level factor, pairs could be set to |
A list (which is also of class 'hotelling.test') with the following elements:
stats |
a list containing all of the output from
|
pval |
the P-value from the test |
results |
if |
default
: Two-sample Hotelling's T-squared test
formula
: Two-sample Hotelling's T-squared test
James M. Curran
Hotelling, H. (1931). “The generalization of Student's ratio.” Annals of Mathematical Statistics 2 (3): 360–378.
Schaefer, J., and K. Strimmer (2005). “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.” Statist. Appl. Genet. Mol. Biol. 4: 32.
Opgen-Rhein, R., and K. Strimmer (2007). “Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach.” Statist. Appl. Genet. Mol. Biol. 6: 9.
Campbell, G.P. and J. M. Curran (2009). “The interpretation of elemental composition measurements from forensic glass evidence III.” Science and Justice, 49(1),2-7.
hotelling.stat
data(container.df) fit = hotelling.test(.~gp, data = container.df) fit subs.df = container.df[1:10,] subs.df$gp = rep(1:2, c(5,5)) fitPerm = hotelling.test(Al+Fe~gp, data = subs.df, perm = TRUE) fitPerm plot(fitPerm) data(bottle.df) fit12 = hotelling.test(.~Number, data = bottle.df) fit12 fit23 = hotelling.test(.~Number, data = bottle.df, pair = c(2,3)) fit23 data(manova1.df) fit = hotelling.test(wratr+wrata~treatment, data = manova1.df, var.equal = FALSE) fit x = list(mean = c(7.81, 108.77, 44.92), cov = matrix(c(0.461, 1.18, 4.49, 1.18, 3776.4, -17.35, 4.49, -17.35, 147.24), nc = 3, byrow = TRUE), n = 13) y = list(mean = c(5.89, 41.9, 20.8), cov = matrix(c(0.148, -0.679, 0.209, -0.679, 96.10, 20.20, 0.209, 20.20, 24.18), nc = 3, byrow = TRUE), n = 10) fit = hotelling.test(x, y, var.equal = FALSE) fit
data(container.df) fit = hotelling.test(.~gp, data = container.df) fit subs.df = container.df[1:10,] subs.df$gp = rep(1:2, c(5,5)) fitPerm = hotelling.test(Al+Fe~gp, data = subs.df, perm = TRUE) fitPerm plot(fitPerm) data(bottle.df) fit12 = hotelling.test(.~Number, data = bottle.df) fit12 fit23 = hotelling.test(.~Number, data = bottle.df, pair = c(2,3)) fit23 data(manova1.df) fit = hotelling.test(wratr+wrata~treatment, data = manova1.df, var.equal = FALSE) fit x = list(mean = c(7.81, 108.77, 44.92), cov = matrix(c(0.461, 1.18, 4.49, 1.18, 3776.4, -17.35, 4.49, -17.35, 147.24), nc = 3, byrow = TRUE), n = 13) y = list(mean = c(5.89, 41.9, 20.8), cov = matrix(c(0.148, -0.679, 0.209, -0.679, 96.10, 20.20, 0.209, 20.20, 24.18), nc = 3, byrow = TRUE), n = 10) fit = hotelling.test(x, y, var.equal = FALSE) fit
The data contains example data for testing the unequal variance option in the
package. The dataset has four varibles, wratr
, wrata
, treatment
,
and disability
. treatment is the grouping variable
and wratr
and wrata
are the responses. disability
is not used.
Plots a histogram of the distribution of the permuted test statistics for a permutation version of Hotelling's T-squared
## S3 method for class 'hotelling.test' plot(x, ...)
## S3 method for class 'hotelling.test' plot(x, ...)
x |
an object of type hotelling.test |
... |
any additional arguments to be passed to the hist command |
This function only works if you have performed a permutation test. It will return an error message if not. It could be programmed to draw the relevant F distribution in the standard case, but this seems rather pointless.
James M. Curran
data(bottle.df) bottle.df = subset(bottle.df, Number == 1) bottle.df$Number = rep(1:2,c(10,10)) fit = hotelling.test(.~Number, bottle.df, perm = TRUE) plot(fit) plot(fit, col = "lightblue")
data(bottle.df) bottle.df = subset(bottle.df, Number == 1) bottle.df$Number = rep(1:2,c(10,10)) fit = hotelling.test(.~Number, bottle.df, perm = TRUE) plot(fit) plot(fit, col = "lightblue")
Prints the test stastic, degrees of freedom and P-value from Hotelling's T-squared test for the difference in two multivariate sample means
## S3 method for class 'hotelling.test' print(x, ...)
## S3 method for class 'hotelling.test' print(x, ...)
x |
an object of type hotelling.test |
... |
any additional arguments to be passed to the hist command |
James M. Curran
data(bottle.df) bottle.df = subset(bottle.df, Number == 1) bottle.df$Number = rep(1:2,c(10,10)) fit = hotelling.test(.~Number, bottle.df, perm = TRUE) fit fit = hotelling.test(.~Number, bottle.df) fit ## an explict call print(fit)
data(bottle.df) bottle.df = subset(bottle.df, Number == 1) bottle.df$Number = rep(1:2,c(10,10)) fit = hotelling.test(.~Number, bottle.df, perm = TRUE) fit fit = hotelling.test(.~Number, bottle.df) fit ## an explict call print(fit)
Easily get summary statistics for each group present in the data
summarise(x, ...) ## Default S3 method: summarise( x, y, stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length), ... ) ## S3 method for class 'formula' summarise( x, data = NULL, stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length), ... ) ## S3 method for class 'data.frame' summarise(x, y, ...)
summarise(x, ...) ## Default S3 method: summarise( x, y, stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length), ... ) ## S3 method for class 'formula' summarise( x, data = NULL, stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length), ... ) ## S3 method for class 'data.frame' summarise(x, y, ...)
x |
a matrix of multivariate observations, a list of summary statistics from multivariate observations, a data.frame of multivariate observations, or a formula with a multivariate response on the right hand side, and a grouping variable/factor on the left hand side. |
y |
a matrix of multivariate observations, a list of summary statistics from multivariate observations, OR a data.frame of multivariate observations |
stats |
a named list of summary statistics to compute on each variable in each group. Note 1: Quantiles are not supported yet because I can't think of a good way to handle the extra arguments. Help welcome. Note 2: The names of the elements in the list are used to label the columns of the output. They probably should be unique. |
data |
a data.frame containing the variables used in a formula |
... |
other arguments such as another matrix of multivariate observations:
see |
default
: Summary statistics for grouped data
formula
: Summary statistics for grouped data
data.frame
: Summary statistics for grouped data
data(container.df) split.data = split(container.df[,-1],container.df$gp) x = split.data[[1]] y = split.data[[2]] summarise(x, y) ## Using the formula interface data(container.df) summarise(gp~., data = container.df) summarise(gp~Al+Ti, data = container.df)
data(container.df) split.data = split(container.df[,-1],container.df$gp) x = split.data[[1]] y = split.data[[2]] summarise(x, y) ## Using the formula interface data(container.df) summarise(gp~., data = container.df) summarise(gp~Al+Ti, data = container.df)