Package 'Hotelling' reference manual

Title:	Hotelling's T^2 Test and Variants
Description:	A set of R functions which implements Hotelling's T^2 test and some variants of it. Functions are also included for Aitchison's additive log ratio and centred log ratio transformations.
Authors:	James Curran [aut, cre], Taylor Hersh [aut]
Maintainer:	James Curran <[email protected]>
License:	GPL (>= 2)
Version:	1.0-8
Built:	2025-02-11 02:40:27 UTC
Source:	https://github.com/jmcurran/hotelling

Hotelling

Description

A set of R functions and data sets which implements Hotelling's T^2 test, and some variants of it. Functions are also included for Aitchison's additive log ratio and centred log ratio transformations.

Author(s)

James Curran Maintainer: James Curran <[email protected]> ~~ The author and/or maintainer of the package ~~

Additive log ratio transformation

Description

Aitchison's additive log ratio tranformation for compositional data

Usage

alr(form, data, group = NULL)
alr(form, data, group = NULL)

Arguments

`form`	a formula which specifies the denominator variable as the response
`data`	a data frame in which the data is stored
`group`	if not NULL then a character string specifying the name of the grouping variable

Details

This function will give a warning if zeros are present because the transformed data will have -Infs.

Value

a data frame with the ALR transformation applied to data. Each row in the data frame is standardized with respect to a specific variable by dividing by that variable. The logarithms of the resulting ratios are returned. If a grouping variable is specified, then this is preserved.

Author(s)

James M. Curran

References

Aitchison, J. (1986). “The Statistical Analysis of Compositional Data”, Chapman and Hall, reprinted in 2003 with additional material by The Blackburn Press

Examples


data(bottle.df)

## transform with respect to manganese
alr(Mn~., bottle.df, "Number")

## transform the data with respect to barium, but removing the
## grouping in column 1
alr(Ba~., bottle.df[,-1])
data(bottle.df)

## transform with respect to manganese
alr(Mn~., bottle.df, "Number")

## transform the data with respect to barium, but removing the
## grouping in column 1
alr(Ba~., bottle.df[,-1])

Bottle data

Description

This data contains the elemental concentration of five different elements (Manganese, Barium, Strontium, Zirconium, and Titanium) in samples of glass taken from six different Heineken beer bottles. 20 measurements were taken from each bottle.

References

R. L. Bennett. Aspects of the analysis and interpretation of glass trace evidence. Master's thesis, Department of Chemistry, University of Waikato, 2002.

Centered log ratio transformation

Description

Aitchison's centered log ratio tranformation for compositional data

Usage

clr(data, group = NULL)
clr(data, group = NULL)

Arguments

`data`	a data frame in which the data is stored
`group`	if not NULL then a character string specifying the name of the grouping variable

Details

This function will give a warning if zeros are present because the transformed data will have -Infs.

Value

a data frame with the CLR transformation applied to data. Each row in the data frame is standardized by dividing by the geometric mean of that row. The logarithms of the resulting ratios are returned. If a grouping variable is specified, then this is preserved.

Author(s)

James M. Curran

References

Aitchison, J. (1986). “The Statistical Analysis of Compositional Data”, Chapman and Hall, reprinted in 2003 with additional material by The Blackburn Press

Examples


data(bottle.df)

## transform preserving grouping
clr(bottle.df, "Number")

## transform the data but remove the
## grouping in column 1
clr(bottle.df[,-1])

data(bottle.df)

## transform preserving grouping
clr(bottle.df, "Number")

## transform the data but remove the
## grouping in column 1
clr(bottle.df[,-1])

Container data

Description

This data contains the elemental concentration of nine different elements (Titanium, Aluminium, Iron, Manganese, Magnesium, Calcium, Barium, Strontium, and Zirconium) in specimens of glass taken from two different containers. Ten measurements were taken from each container.

References

Jose R. Almirall. Discrimination of glass samples by solution based ICP-OES PhD thesis, Department of Chemistry, Florida International University, 1998.

Calculate Hotelling's two sample T-squared test statistic

Description

Calculate Hotelling's T-squared test statistic for the difference in two multivariate means.

Usage

hotelling.stat(x, y, shrinkage = FALSE, var.equal = TRUE)
hotelling.stat(x, y, shrinkage = FALSE, var.equal = TRUE)

Arguments

`x`	a nx by p matrix containing the data points from sample 1 or a list containing elements `mean`, `cov`, and `n` where `mean` is a mean vector of length p, `cov` is a variance-covariance matrix of dimension p by p, and `n` is the sample size
`y`	a ny by p matrix containg the data points from sample 2 or a list containing elements `mean`, `cov`, and `n` where `mean` is a mean vector of length p, `cov` is a variance-covariance matrix of dimension p by p, and `n` is the sample size
`shrinkage`	set to `TRUE` if the covariance matrices are to be estimated using Schaefer and Strimmer's James-Stein shrinkage estimator. Note this only works when raw data is supplied, and will not work if summary statistics are supplied.
`var.equal`	set to `TRUE` if the covariance matrices are (assumed to be) equal

Details

Note, the sample size requirements are that nx + ny - 1 > p. The procedure will stop if this is not met and the shrinkage estimator is not being used. The shrinkage estimator has not been rigorously tested for this application (small p, smaller n).

Value

A list containing the following components:

`statistic`	Hotelling's (unscaled) T-squared statistic
`m`	The scaling factor - this can be used by by multiplying it with the test statistic, or dividing the critical F value
`df`	a vector of length containing the numerator and denominator degrees of freedom
`nx`	The sample size of sample 1
`ny`	The sample size of sample 2
`p`	The number of variables to be used in the comparison

Author(s)

James M. Curran

Taylor Hersh

References

Hotelling, H. (1931). “The generalization of Student's ratio.” Annals of Mathematical Statistics 2 (3): 360–378.

Schaefer, J., and K. Strimmer (2005). “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.” Statist. Appl. Genet. Mol. Biol. 4: 32.

Opgen-Rhein, R., and K. Strimmer (2007). “Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach.” Statist. Appl. Genet. Mol. Biol. 6: 9.

NEL, D.G. and VAN DER MERWE, C.A. (1986). “A solution to the - multivariate Behrens-Fisher problem.” Comm. Statist. Theor.- Meth., A15, 12, 3719-3736.

Examples


data(container.df)
split.data = split(container.df[,-1],container.df$gp)
x = split.data[[1]]
y = split.data[[2]]
hotelling.stat(x, y)
hotelling.stat(x, y, TRUE)

data(container.df)
split.data = split(container.df[,-1],container.df$gp)
x = split.data[[1]]
y = split.data[[2]]
hotelling.stat(x, y)
hotelling.stat(x, y, TRUE)

Two-sample Hotelling's T-squared test

Description

Performs a two-sample Hotelling's T-squared test for the difference in two multivariate means

Usage

hotelling.test(x, ...)

## Default S3 method:
hotelling.test(
  x,
  y,
  shrinkage = FALSE,
  var.equal = TRUE,
  perm = FALSE,
  B = 10000,
  progBar = (perm && TRUE),
  ...
)

## S3 method for class 'formula'
hotelling.test(x, data = NULL, pair = c(1, 2), ...)
hotelling.test(x, ...)

## Default S3 method:
hotelling.test(
  x,
  y,
  shrinkage = FALSE,
  var.equal = TRUE,
  perm = FALSE,
  B = 10000,
  progBar = (perm && TRUE),
  ...
)

## S3 method for class 'formula'
hotelling.test(x, data = NULL, pair = c(1, 2), ...)

Arguments

`x`	a matrix containing the data points from sample 1, or a formula specifying the elements to be used as a response and the grouping variable as a predictor, or a list containing elements `mean`, `cov`, and `n` where `mean` is a mean vector of length p, `cov` is a variance-covariance matrix of dimension p by p, and `n` is the sample size
`...`	any additional arguments. This is useful to pass the optional arguments for the default call from the formula version
`y`	a matrix containing the data points from sample 2, or a list containing elements `mean`, `cov`, and `n` where `mean` is a mean vector of length p, `cov` is a variance-covariance matrix of dimension p by p, and `n` is the sample size
`shrinkage`	if `TRUE` then Shaefer and Strimmer's James-Stein shrinkage estimator is used to calculate the sample covariance matrices
`var.equal`	set to `TRUE` if the covariance matrices are (assumed to be) equal
`perm`	if `TRUE` then permutation testing is used to estimate the non-parametric P-value for the hypothesis test
`B`	if perm is TRUE, then B is the number of permutations to perform
`progBar`	if `TRUE` and `perm` is TRUE then a progress bar will be displayed whilst the permutation procedure is carried out
`data`	a data frame needs to be specified if a formula is to be used to perform the test
`pair`	a vector of length two which can be used when the grouping factor has more than two levels to select different pairs of groups. For example for a 3-level factor, pairs could be set to `c(1,3)` to perform Hotelling's test between groups 1 an 3

Value

A list (which is also of class 'hotelling.test') with the following elements:

`stats`	a list containing all of the output from `hotelling.stat`
`pval`	the P-value from the test
`results`	if `perm == TRUE`, then all of the permuation test statisics are stored in results

Methods (by class)

default: Two-sample Hotelling's T-squared test
formula: Two-sample Hotelling's T-squared test

Author(s)

James M. Curran

References

Hotelling, H. (1931). “The generalization of Student's ratio.” Annals of Mathematical Statistics 2 (3): 360–378.

Schaefer, J., and K. Strimmer (2005). “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.” Statist. Appl. Genet. Mol. Biol. 4: 32.

Opgen-Rhein, R., and K. Strimmer (2007). “Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach.” Statist. Appl. Genet. Mol. Biol. 6: 9.

Campbell, G.P. and J. M. Curran (2009). “The interpretation of elemental composition measurements from forensic glass evidence III.” Science and Justice, 49(1),2-7.

Examples


data(container.df)
fit = hotelling.test(.~gp, data = container.df)
fit

subs.df = container.df[1:10,]
subs.df$gp = rep(1:2, c(5,5))
fitPerm = hotelling.test(Al+Fe~gp, data  = subs.df, perm =  TRUE)
fitPerm
plot(fitPerm)

data(bottle.df)
fit12 = hotelling.test(.~Number, data = bottle.df)
fit12

fit23 = hotelling.test(.~Number, data = bottle.df, pair = c(2,3))
fit23

data(manova1.df)
fit = hotelling.test(wratr+wrata~treatment, data = manova1.df, var.equal = FALSE)
fit

x = list(mean = c(7.81, 108.77, 44.92),
         cov = matrix(c(0.461, 1.18, 4.49,
                        1.18, 3776.4, -17.35, 
                        4.49, -17.35, 147.24), nc = 3, byrow = TRUE),
         n = 13)
y = list(mean = c(5.89, 41.9, 20.8),
         cov = matrix(c(0.148, -0.679, 0.209, 
                       -0.679, 96.10, 20.20,
                        0.209, 20.20, 24.18), nc = 3, byrow = TRUE),
         n = 10)
fit = hotelling.test(x, y, var.equal = FALSE)
fit
data(container.df)
fit = hotelling.test(.~gp, data = container.df)
fit

subs.df = container.df[1:10,]
subs.df$gp = rep(1:2, c(5,5))
fitPerm = hotelling.test(Al+Fe~gp, data  = subs.df, perm =  TRUE)
fitPerm
plot(fitPerm)

data(bottle.df)
fit12 = hotelling.test(.~Number, data = bottle.df)
fit12

fit23 = hotelling.test(.~Number, data = bottle.df, pair = c(2,3))
fit23

data(manova1.df)
fit = hotelling.test(wratr+wrata~treatment, data = manova1.df, var.equal = FALSE)
fit

x = list(mean = c(7.81, 108.77, 44.92),
         cov = matrix(c(0.461, 1.18, 4.49,
                        1.18, 3776.4, -17.35, 
                        4.49, -17.35, 147.24), nc = 3, byrow = TRUE),
         n = 13)
y = list(mean = c(5.89, 41.9, 20.8),
         cov = matrix(c(0.148, -0.679, 0.209, 
                       -0.679, 96.10, 20.20,
                        0.209, 20.20, 24.18), nc = 3, byrow = TRUE),
         n = 10)
fit = hotelling.test(x, y, var.equal = FALSE)
fit

manova1 data

Description

The data contains example data for testing the unequal variance option in the package. The dataset has four varibles, wratr, wrata, treatment, and disability. treatment is the grouping variable and wratr and wrata are the responses. disability is not used.

References

https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Hotellings_Two-Sample_T2.pdf

Plots the results from a permutation based version of Hotelling's T-squared test for the difference in two multivariate sample means

Description

Plots a histogram of the distribution of the permuted test statistics for a permutation version of Hotelling's T-squared

Usage

## S3 method for class 'hotelling.test'
plot(x, ...)
## S3 method for class 'hotelling.test'
plot(x, ...)

Arguments

`x`	an object of type hotelling.test
`...`	any additional arguments to be passed to the hist command

Details

This function only works if you have performed a permutation test. It will return an error message if not. It could be programmed to draw the relevant F distribution in the standard case, but this seems rather pointless.

Author(s)

James M. Curran

Examples


data(bottle.df)
bottle.df = subset(bottle.df, Number == 1)
bottle.df$Number = rep(1:2,c(10,10))
fit = hotelling.test(.~Number, bottle.df, perm = TRUE)
plot(fit)
plot(fit, col = "lightblue")

data(bottle.df)
bottle.df = subset(bottle.df, Number == 1)
bottle.df$Number = rep(1:2,c(10,10))
fit = hotelling.test(.~Number, bottle.df, perm = TRUE)
plot(fit)
plot(fit, col = "lightblue")

Prints the results from a Hotelling's T-squared test for the difference in two multivariate sample means

Description

Prints the test stastic, degrees of freedom and P-value from Hotelling's T-squared test for the difference in two multivariate sample means

Usage

## S3 method for class 'hotelling.test'
print(x, ...)
## S3 method for class 'hotelling.test'
print(x, ...)

Arguments

`x`	an object of type hotelling.test
`...`	any additional arguments to be passed to the hist command

Author(s)

James M. Curran

Examples


data(bottle.df)
bottle.df = subset(bottle.df, Number == 1)
bottle.df$Number = rep(1:2,c(10,10))
fit = hotelling.test(.~Number, bottle.df, perm = TRUE)
fit
fit = hotelling.test(.~Number, bottle.df)
fit

## an explict call
print(fit)

data(bottle.df)
bottle.df = subset(bottle.df, Number == 1)
bottle.df$Number = rep(1:2,c(10,10))
fit = hotelling.test(.~Number, bottle.df, perm = TRUE)
fit
fit = hotelling.test(.~Number, bottle.df)
fit

## an explict call
print(fit)

Summary statistics for grouped data

Description

Easily get summary statistics for each group present in the data

Usage

summarise(x, ...)

## Default S3 method:
summarise(
  x,
  y,
  stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length),
  ...
)

## S3 method for class 'formula'
summarise(
  x,
  data = NULL,
  stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length),
  ...
)

## S3 method for class 'data.frame'
summarise(x, y, ...)
summarise(x, ...)

## Default S3 method:
summarise(
  x,
  y,
  stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length),
  ...
)

## S3 method for class 'formula'
summarise(
  x,
  data = NULL,
  stats = list(Mean = mean, Median = median, `Std. Dev.` = sd, N = length),
  ...
)

## S3 method for class 'data.frame'
summarise(x, y, ...)

Arguments

`x`	a matrix of multivariate observations, a list of summary statistics from multivariate observations, a data.frame of multivariate observations, or a formula with a multivariate response on the right hand side, and a grouping variable/factor on the left hand side.
`y`	a matrix of multivariate observations, a list of summary statistics from multivariate observations, OR a data.frame of multivariate observations
`stats`	a named list of summary statistics to compute on each variable in each group. Note 1: Quantiles are not supported yet because I can't think of a good way to handle the extra arguments. Help welcome. Note 2: The names of the elements in the list are used to label the columns of the output. They probably should be unique.
`data`	a data.frame containing the variables used in a formula
`...`	other arguments such as another matrix of multivariate observations: see `summarise.default`, or a data to be used with a formula: see `summarise.formula`

Methods (by class)

default: Summary statistics for grouped data
formula: Summary statistics for grouped data
data.frame: Summary statistics for grouped data

Examples


data(container.df)
split.data = split(container.df[,-1],container.df$gp)
x = split.data[[1]]
y = split.data[[2]]
summarise(x, y)

## Using the formula interface
data(container.df)
summarise(gp~., data = container.df)

summarise(gp~Al+Ti, data = container.df)

data(container.df)
split.data = split(container.df[,-1],container.df$gp)
x = split.data[[1]]
y = split.data[[2]]
summarise(x, y)

## Using the formula interface
data(container.df)
summarise(gp~., data = container.df)

summarise(gp~Al+Ti, data = container.df)

Package 'Hotelling'

Help Index

Hotelling

Description

Author(s)

Additive log ratio transformation

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Bottle data

Description

References

Centered log ratio transformation

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Container data

Description

References

Calculate Hotelling's two sample T-squared test statistic

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Two-sample Hotelling's T-squared test

Description

Usage

Arguments

Value

Methods (by class)

Author(s)

References

See Also

Examples

manova1 data

Description

References

Plots the results from a permutation based version of Hotelling's T-squared test for the difference in two multivariate sample means

Description

Usage

Arguments

Details

Author(s)

Examples

Prints the results from a Hotelling's T-squared test for the difference in two multivariate sample means

Description

Usage

Arguments

Author(s)

Examples

Summary statistics for grouped data

Description

Usage

Arguments

Methods (by class)

Examples