Type: Package
Title: Data and Functions Used in Linear Models and Regression with R: An Integrated Approach
Version: 1.3
Date: 2025-11-10
Description: Data files and a few functions used in the book 'Linear Models and Regression with R: An Integrated Approach' by Debasis Sengupta and Sreenivas Rao Jammalamadaka (2019).
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Depends: MASS
Imports: stats
NeedsCompilation: no
Packaged: 2025-10-12 11:35:16 UTC; kjana
Author: Debasis Sengupta [aut], S. Rao Jammalamadaka [aut], Jinwen Qiu [aut], Kaushik Jana [cre]
Maintainer: Kaushik Jana <kaushikjana11@gmail.com>
Repository: CRAN
Date/Publication: 2025-10-12 16:50:02 UTC

Fisher's Iris data

Description

Measurements of four dimensions of flowers of three species of the plant Iris (Iris setosa, Iris versicolor, and Iris virginica).

Usage

data(Iris)

Format

A data frame with 150 observations on the following 6 variables.

Species_No

Species number

Petal_width

Petal width (in cm)

Petal_length

Petal length (in cm)

Sepal_width

Sepal width (in cm)

Sepal_length

Sepal length (in cm)

Species_name

Species names: Setosa, Verginica or Versicolor, a character vector

Source

Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7, pp.179-188.

Examples

data(Iris)
head(Iris)

LA crime and temperature data

Description

Monthly total counts of homicides and rapes in the city of Los Angeles from January 1975 to December 1993.

Usage

data(LAcrime)

Format

A data frame with 228 observations on the following 7 variables.

Year

Year of record

Month

Month of record

Population

Population of the city in the year of record

TempCelsius

Monthly average temperature recorded at the Los Angeles International Airport (in Celsius)

Fahrenheit

Monthly average temperature recorded at the Los Angeles International Airport (in Fahrenheit)

Homicide

Total count of homicides in the month and year of record

Rape

Total count of rapes in the month and year of record

Source

The crime data: Carlson, S.M. (1998), Uniform Crime Reports: Monthly Weapon-Specific Crime and Arrest Time Series, 1975-1993, ICPSR06792-v1, Interuniversity Consortium for Political and Social Research, Ann Arbor, MI (https://www.icpsr.umich.edu/icpsrweb/NACJD/studies/6792). Temperature data for LAX (WMO ID 72295): National Oceanic and Atmospheric Administration, USA (http://www.ncdc.noaa.gov/ghcnm/v2.php)

Examples

data(LAcrime)
head(LAcrime)

Wright brothers' wind tunnel data

Description

Wright brothers' 1901 wind tunnel data on pressure over different types of wings at different angles.

Usage

data(Wright)

Format

A data frame with 222 observations on the following 3 variables.

Pressure

Air pressure (in psi)

Angle

Angle of wing (in degrees)

Wing

Wing type

Source

Dataplot webpage of the National Institute of Standards and Technology (NIST),
USA (https://www.itl.nist.gov/div898/software/dataplot/data/WRIGHT11.DAT)

Examples

data(Wright)
head(Wright)

Air speed experiment data

Description

Air speed data, which is part of a larger data set from a designed experiment (Wilkie, 1962).

Usage

data(airspeed)

Format

A data frame with 18 observations on the following 3 variables.

Posmaxspeed

The position of highest speed of air blown down the space between a roughened rod and a smoothed pipe surrounding it. The position is defined as the distance (in inches) from the center of the rod, in excess of 1.4 inches

Reynolds

Reynolds number of air flow (dimensionless)

Ribht

Height of ribs on the roughened rod (in inches)

Source

Wilkie, D. (1962) A method of analysis of mixed level factorial experiments. Applied Statistics, pp.184-195.

Examples

data(airspeed)
head(airspeed)

Six data sets with similar regression summary

Description

Six synthetic data sets with similar regression summary, for illustrating the importance of regression diagnostics.

Usage

data(anscombeplus)

Format

A data frame with 20 observations on 8 synthetic real-valued variables, labelled as x1, y1, y2, y3, y4, y5, x2, y6.

x1

Explanatory variable of first five data sets

y1

Response variable of first data set

y2

Response variable of second data set

y3

Response variable of third data set

y4

Response variable of fourth data set

y5

Response variable of fifth data set

x2

Explanatory variable of sixth data set

y6

Response variable of sixth data set

Details

This data set is presented by Sengupta and Jammalamadaka (2019), after expanding on the ideas of Anscombe (1973)

Source

Anscombe, F.J. (1973), Graphs in statistical analysis, American Statistician, vol.27, pp.17-21.

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach, World Scientific Publishing Co., Table 5.1.

Examples

data(anscombeplus)
head(anscombeplus)

Apple yield with cropping under tree

Description

Apple crop volume under various ground covers underneath tree (Pearce, 1983)

Usage

data(appletree)

Format

A data frame with 24 observations on the following 4 variables.

Weight

Total weight (in pounds) of apple produced in a plot in four years, post-treatment

Treatment

Five types of permanent cropping under the apple tree (coded as 1 to 5), or no cropping at all (0)

Block

Blocks coded as 1 to 4

Volume

Total crop volume (in bushels) in four years, pre-treatment

Source

Pearce, S.C. (1983) The Agricultural Field Experiment, Wiley, Chechester, p.284.

Examples

data(appletree)
head(appletree)

Basis of column space of a matrix

Description

Computes an orthonormal basis of the column space of a given matrix.

Usage

basis(M, tol=sqrt(.Machine$double.eps))

Arguments

M

Matrix for which basis of the column space is needed.

tol

A relative tolerance to determine rank through qr decomposition
(default = sqrt(.Machine$double.eps)).

Value

Returns a semi-orthogonal matrix with columns forming an orthonormal basis of the column space of M.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

  basis(matrix(c(2,1,3,4,2,3,2,6,4,2,6,8),4,3))

Convert categorical variable to several binary variables

Description

Stacks up in columns the values of all the binary variables that can be associated with different levels of a categorical variable.

Usage

binaries(x)

Arguments

x

A categorical variable (either numeric or character).

Details

The name of each new variable is of the type v.x, where x is the level of the categorical variable for which this binary variable is equal to 1.

Value

A set of binary vectors, each having the value 1 for a unique level of x.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

x <- c(1,2,2,3,1,1,2,3,3,2,1)
binaries(x)
binaries(as.factor(x))

Simultaneous confidence intervals in a linear model

Description

Produces two-sided Bonferroni and Scheffe simultaneous confidence intervals, together with corresponding single confidence intervals, for any vector of estimable functions A.beta in a linear model.

Usage

cisimult(y, X, A, alpha, tol=sqrt(.Machine$double.eps))

Arguments

y

Responese vector in linear model.

X

Design/model matrix or matrix containing values of explanatory variables (generally including intercept).

A

Coefficient matrix (A.beta is the vector for which confidence interval is needed).

alpha

Collective non-coverage probability of confidence intervals.

tol

A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)).

Details

Normal distribution of response (given explanatory variables and/or factors) is assumed.

Value

The three sets of confidence intervals listed as below:

BFCB

Two-sided Bonferroni simultaneous confidence intervals.

SFCB

Two-sided Scheffe simultaneous confidence intervals.

SNCB

The single confidence intervals.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(denim)
attach(denim)
X <- cbind(1, binaries(Denim), binaries(Laundry))
A <- rbind(c(0,1,-1,0,0,0,0), c(0,1,0,-1,0,0,0), c(0,0,1,-1,0,0,0))
cisimult(Abrasion, X, A, 0.05, tol = 1e-10)
detach(denim)

Confidence interval for a linear parametric function in a linear model

Description

Computes point estimate and confidence interval for a single linear parametric function in a linear model.

Usage

cisngl(y, X, p, alpha, type, tol=sqrt(.Machine$double.eps))

Arguments

y

Responese vector in linear model.

X

Design/model matrix or matrix containing values of explanatory variables (generally including intercept).

p

Coefficient vector of linear parametric function for which confidence interval is needed.

alpha

Non-coverage probability of confidence interval.

type

Type of confidence interval ("lower", "upper", "both").

tol

A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)).

Details

Normal distribution of response (given explanatory variables and/or factors) is assumed.

Value

Returns a list of two objects:

estimate

Point estimate.

ci

Confidence interval.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

library(MASS)
data(birthwt)
attach(birthwt)
X <- cbind(1, smoke, binaries(race))
p <- c(0,1,0,0,0)
cisngl(bwt, X, p, 0.05, type = "upper", tol = 1e-10)
cisngl(bwt, X, p, 0.05, type = "both", tol = 1e-10) 
detach(birthwt)

Table of condition indices and singular vectors

Description

Computes the table of condition indices and model matrix singular vectors for a linear model.

Usage

cisv(lmobj)

Arguments

lmobj

An object produced by lm fitting.

Details

Columns containing different elements of a singular vector are labelled either as (Intercept) or by the variable name.

Value

Returns the table of condition indices and model matrix right singular vectors for the chosen model, with singular vectors appearing as rows next to the corresponding condition index. Columns containing different elements of a singular vector are labelled either as (Intercept) or by the variable name.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(imf2015)
lmimf <- lm(UNMP~CAB+DEBT+EXP+GDP+INFL+INV, data = imf2015)
cisv(lmimf)

Basis of orthogonal complement of column space of a matrix

Description

Computes an orthonormal basis of the orthogonal complement of the column space of a given matrix.

Usage

compbasis(M, tol=sqrt(.Machine$double.eps))

Arguments

M

Matrix for which basis of the orthogonal complement of the column space is needed.

tol

A relative tolerance to determine rank through qr decomposition
(default = sqrt(.Machine$double.eps)).

Value

Returns a semi-orthogonal matrix with columns forming an orthonormal basis of the orthogonal complement of the column space of M.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

compbasis(matrix(c(3,3,3,3),2,2))

Confidence ellipsiod for multiple parameters in a linear model.

Description

Computes confidence ellipsiod for a vector of estimable functions in a linear model.

Usage

confelps(y, X, A, alpha, tol=sqrt(.Machine$double.eps))

Arguments

y

Responese vector in linear model.

X

Design/model matrix or matrix containing values of explanatory variables (generally including intercept).

A

Coefficient matrix (A.beta is the vector for which confidence interval is needed).

alpha

The non-coverage probability of confidence ellipsoid.

tol

A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)).

Details

Normal distribution of response (given explanatory variables and/or factors) is assumed.

Value

Returns a list of three objects:

CenterOfEllipse

Center of ellipsoid.

MatrixOfEllipse

Matrix of ellipsoid, for describing quadratic form in terms of the vector of deviations from center of ellipsoid.

threshold

Upper limit of quadratic form that completes specification of ellipsoid.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(denim)
attach(denim)
X <- cbind(1,binaries(Denim),binaries(Laundry))
A <- rbind(c(0,1,0,-1,0,0,0),c(0,0,1,-1,0,0,0))
confelps(Abrasion, X, A, 0.05,tol=1e-12)
detach(denim)

Abrasion of denim jeans

Description

Effects of Laundering Cycles and denim treatment on edge abrasion of denim jeans (Card et al., 2006). Data simulated to match means/SDs.

Usage

data(denim)

Format

A data frame with 90 observations on the following 3 variables.

Laundry

Three levels of laundry cycles (1 = 0 cycle, 2 = 5 cycles, 3 = 25 cycles)

Denim

Three types of denim treatments (1 = pre-washed, 2 = stone-washed, 3 = enzyme washed)

Abrasion

abrasion score (lower score means higher damage)

Source

Card, A., Moore, M.A. and Ankeny, M. (2006) Garment washed jeans: Impact of launderings on physical properties. Int. J. Clothing Sc. Tech., 18, pp.43-52.

Examples

data(denim)
head(denim)

Price of drugs under generic and brand names

Description

Across-countries median of median price ratio (MPR) of some medicines available in the private market under the generic name and the brand name of the originator (Gelders et al., 2005).

Usage

data(drugprice)

Format

A data frame with 13 observations on the following 2 variables.

Drug

Generic name of drug, a character vector

Quantity

Unit for price computation, a character vector

OriginatorMPR

Originator median price ratio, a numeric vector

GenericMPR

Generic median price ratio, a numeric vector

Details

The data comes from a World Health Organization (WHO) commissioned study on variation of drug prices over a number of developing countries. For comparability, the price in a particular region is expressed as a ratio (called median price ratio or MPR) with respect to the organization's drug price indicator median values. The data reflect the across-country median of these ratios in respect of 13 medicines, most of which are in the WHO list of essential medicines.

Source

Gelders, S., Ewen, M., Noguchi, N. and Laing R. (2005). Price, Availability and Affordability: An International Comparison of Chronic Disease Medicines, Background report prepared for the WHO Planning Meeting on the Global Initiative for Treatment of Chronic Diseases, Cairo, December 2005.

Examples

data(drugprice)
head(drugprice)

Frobenius norm of a matrix

Description

Computes the Frobenius norm of a given matrix.

Usage

frob(M)

Arguments

M

Matrix whose Frobenius norm is to be computed.

Value

A scalar value, describing the Frobenius norm (positive square root of sum of squared elements) of M.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

frob(matrix(2,3,2))

ANOVA table for linear hypothesis in a linear model

Description

Prepares Analysis of Variance table for testing a general linear hypothesis in a linear model

Usage

ganova(y, X, A, xi, tol=sqrt(.Machine$double.eps))

Arguments

y

Responese vector in linear model.

X

Design matrix or matrix containing values of explanatory variables (generally including intercept).

A

Coefficient matrix (A.beta = xi is the null hypothesis to be tested).

xi

A vector (A.beta = xi is the null hypothesis to be tested).

tol

A relative tolerance to detect zero singular values while computing generalized inverse, in case the model matrix is rank deficient (default = sqrt(.Machine$double.eps)).

Value

Returns analysis of variance table for testing A.beta = xi in the linear model with response vector y and matrix of explanatory variables/factors X.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(denim)
attach(denim)
X <- cbind(1,binaries(Denim), binaries(Laundry))
A <- rbind(c(0,1,-1,0,0,0,0), c(0,1,0,-1,0,0,0))
xi <- c(0, 0)
ganova(Abrasion, X, A, xi)
detach(denim)

Growth data for girls

Description

Heights of some adolescent girls, aged 7 to 12, in the southern part of Kolkata, India around the year 2008.

Usage

data(girlgrowth)

Format

A data frame with 905 observations on the following 2 variables.

Age

Age of girls (in years)

Height

Height of girls (in cm)

Source

Dasgupta (2015), Physical Growth, Body Composition and Nutritional Status of Bengali School aged Children, Adolescents and Young adults of Calcutta, India: Effects of Socioeconomic Factors on Secular Trends, Report 158, Ney-van Hoogstraten Foundation, The Netherlands.

Examples

data(girlgrowth)
head(girlgrowth)

ANOVA table for adequacy of a subset in a linear model)

Description

Prepares the Analysis of Variance table for testing adequacy of a subset model within a linear model.

Usage

hanova(lm1, lm2)

Arguments

lm1

An lm object describing full model.

lm2

An lm object describing subset model.

Details

Normal distribution of response (given explanatory variables and/or factors) is assumed. The program simply reformats the output of the anova function.

Value

Returns analysis of variance table for testing adequacy of lm2 within lm1.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(birthwt)
lmbw <- lm(bwt ~ smoke+factor(race), data = birthwt)
lm1 <- lm(bwt ~ smoke, data = birthwt)
hanova(lm1,lmbw)

HIV data

Description

Light absorbance for positive control samples in an ELISA test for HIV (Hoaglin et al., 1991).

Usage

data(hiv)

Format

A data frame with 75 observations on the following 3 variables.

Absorbance

Measurement of absorbance of light (dimensionless)

Lot

Five levels of lot

Run

Five levels of run

Source

Hoaglin, D.C., Mosteller, F. and Tukey, J.W. (1991) Fundamentals of Exploratory Analysis of Variance, Wiley, New York, p.107.

Examples

data(hiv)
head(hiv)

Hoop tree data

Description

Compressive strength and moisture content of wood in hoop trees (Williams, 1959).

Usage

data(hoop)

Format

A data frame with 50 observations on the following 4 variables.

Temp

Temperature (in Celsius)

Tree

Hoop tree number

Strength

Maximum compressive strength parallel to the grain (in MPa)

Moisture

Moisture content (100 times water mass/dry wood mass)

Source

Williams, E.J. (1959) Regression Analysis, Wiley, New York.

Examples

data(hoop)
head(hoop)

Testable and untestable hypotheses in linear model

Description

Reduces a general hypothesis in a linear model into a pair of completely testable and completely untestable hypotheses.

Usage

hypsplit(X, A, xi, tol=sqrt(.Machine$double.eps))

Arguments

X

Design/model matrix or matrix containing values of explanatory variables (generally including intercept).

A

Coefficient matrix (A.beta = xi is the null hypothesis to be split).

xi

A vector (A.beta = xi is the null hypothesis to be tested).

tol

A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)).

Value

A list of two objects:

testable

Coefficient matrix and constant vector for testable part of hypotheses.

untestable

Coefficient matrix and constant vector for untestable part of hypotheses.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(denim)
attach(denim)
X <- cbind(1, binaries(Denim), binaries(Laundry))
A <- rbind(c(0,1,0,0,0,0,0), c(0,0,1,0,0,0,0), c(0,0,0,1,0,0,0))
xi <- c(0,0,0)
hypotheses <- hypsplit(X, A, xi, tol=1e-13)
hypotheses[[1]]  # testable
hypotheses[[2]]  # untestable
detach(denim)

Test of a linear hypothesis in a linear model

Description

Carries out test of a single linear hypothesis in a linear model.

Usage

hyptest(lmobj, p, xi = 0, type = "both")

Arguments

lmobj

An object produced by lm fitting.

p

A numeric vector containing coefficients of the linear combination of model parameters.

xi

A numeric variable containing hypothesized value of the linear combination of model parameters (default = 0).

type

A character variable indicating the type of alternative: "upper" (one-sided), "lower" (one-sided) or "both" (default, two-sided).

Details

It is assumed that all the model parameters are estimable and the linear model is homoscedastic and normal.

Value

Returns the estimated value of the linear combination of model parameters, its standard error, the t-statistic, the degrees of freedom and the p-value.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(lifelength)
lmlife <- lm(Lifelength~factor(Category), data = lifelength)
p <- c(0,0,0,1,-1,0,0,0)
hyptest(lmlife, p, xi = 1, type = "upper")

IMF unemployment data

Description

The estimated or reported figures of a number of economic variables for a few countries in the year 2015, extracted from IMF World Economic Outlook (2017)

Usage

data(imf2015)

Format

A data frame with 33 observations on the following 8 variables.

Country

Country name, a character vector

CAB

Current account balance as % of GDP, a numeric vector

DEBT

Governmentt gross debt as % of GDP, a numeric vector

EXP

Government total expenditure as % of GDP, a numeric vector

GDP

GDP per capita, current prices in '000 US$, a numeric vector

INFL

Inflation, average consumer prices in %, a numeric vector

INV

Total investment as % of GDP, a numeric vector

UNMP

Unemployment as % of labor force, a numeric vector

Source

http://www.imf.org/external/pubs/ft/weo/2017/01/weodata/weoselgr.aspx.

Examples

data(imf2015)
head(imf2015)

Basis of intersection of two column spaces

Description

Computes an orthonormal basis of the intersection of column spaces of two given matrices.

Usage

intsectbasis(A, B, tol1=sqrt(.Machine$double.eps), tol2=sqrt(.Machine$double.eps))

Arguments

A

First matrix.

B

Second matrix with identical number of rows.

tol1

A relative tolerance to detect zero singular values while computing generalized inverse, in case the matrix concerned is rank deficient (default = sqrt(.Machine$double.eps)).

tol2

A tolerance to detect if there is any non-zero singular value of a 'parallel sum' matrix, without which the intersection space is null (default = sqrt(.Machine$double.eps)).

Value

Returns a semi-orthogonal matrix with columns forming an orthonormal basis of the intersection of the column spaces of A and B.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

A<-matrix(2,3,5)
B<-matrix(3,3,2)
intsectbasis(A,B, tol1=sqrt(.Machine$double.eps), tol2=1e-14)

Whether one column space is contained in another

Description

Checks whether column space of one matrix is a subset of the column space of another matrix.

Usage

is.included(B, A, tol1=sqrt(.Machine$double.eps), tol2=sqrt(.Machine$double.eps))

Arguments

B

The matrix whose column space is to be checked for being a subset.

A

The matrix whose column space is to be checked for being a superset.

tol1

A relative tolerance to detect zero singular values while computing generalized inverse, in case A is rank deficient (default = sqrt(.Machine$double.eps)).

tol2

A relative tolerance to detect whether there is sufficient closeness between B and A.ginv(A).B (default = sqrt(.Machine$double.eps)).

Value

A logical value (TRUE if the column space of B is contained in the column space of A).

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

A <- cbind(c(2,1,-2),c(3,1,-1))
I <- diag(1,3)
is.included(A, I, tol1=sqrt(.Machine$double.eps), tol2=1e-15)
is.included(I, A, tol1=1e-14, tol2=sqrt(.Machine$double.eps))
is.included(projector(A), A, tol1=1e-15, tol2=1e-14)
is.included(A, projector(A))

Intercept augmented variance inflation factors

Description

Computes the intercept augmented variance inflation factors for a linear model.

Usage

ivif(lmobj)

Arguments

lmobj

An object produced by lm fitting.

Value

Returns the intercept augmented variance inflation factors for the model, with each VIF labelled either as (Intercept) or by the variable name.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(imf2015)
lmimf <- lm(UNMP~CAB+DEBT+EXP+GDP+INFL+INV, data = imf2015)
ivif(lmimf)

Kink bands in rocks

Description

Measurements of an angular dimension (beta angle) found in kink bands of Daling phyllite in the Darjeeling-Sikkim Himalayas.

Usage

data(kinks)

Format

A data frame with 100 observations on the following 3 variables.

beta

Beta angle in kink bands (in degrees)

order

Fold order (1 = main fold, 2 = sub-fold, 3,4 = sub-folds of successively higher order)

type

Type of kink band (1 = conjugate, 2 = dextral, 3 = sinistral)

Source

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach, World Scientific Publishing Co., Table 6.8.

Examples

data(kinks)
head(kinks) 

Treatment of leprosy

Description

Pre- and post-treatment scores on abundance of leprosy for patients receiving different treatments (Senedecor and Cochran, 1967).

Usage

data(leprosy)

Format

A data frame with 30 observations on the following 3 variables.

treatment

Treatment type: A, D or F (placebo), a character vector

pre

Pre-treatment score, a numerical vector

post

Post-treatment score, a numerical vector

Source

Snedecor, G.W. and Cochran, W.G. (1967) Statistical Methods, Iowa State University, Ames, p.421.

Examples

data(leprosy)
head(leprosy) 

Age at death

Description

William Guy's nineteenth century data on the age at death of persons belonging to different professions.

Usage

data(lifelength)

Format

A data frame with 690 observations on the following 2 variables.

Category

Code for profession: 1 = historian, 2 = poet, 3 = painter, 4 = musician, 5 = mathematician or astronomer, 6 = chemist or natural philosopher, 7 = naturalist, 8 = engineer, architect or surveyor

Lifelength

Age (in years) of deceased

Source

Guy, W. (1859) On the duration of life as affected by the pursuits of literature, science and art. J. Statist. Soc. London, 22.

Examples

data(lifelength)
head(lifelength)

Multiple comparison tests

Description

Produces p-values of Bonferroni and Scheffe multiple comparison tests of several testable linear hypotheses.

Usage

multcomp(y, X, A, xi, tol=sqrt(.Machine$double.eps))

Arguments

y

Responese vector in linear model.

X

Design/model matrix or matrix containing values of explanatory variables (generally including intercept).

A

Coefficient matrix (A.beta=xi is the set of multiple hypotheses that has to be tested).

xi

A vector of values (A.beta=xi is the set of multiple hypotheses that has to be tested).

tol

A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)).

Details

Normal distribution of response (given explanatory variables and/or factors) is assumed.

Value

Returns F statistics and p-values of Bonferroni and Scheffe multiple comparison tests of the set of linear hypotheses. A set of five vectors:

A

Specified coefficient matrix.

xi

Specified values of A.beta.

Fstat

Set of F-ratios for each hypothesis.

Bonferroni.p

Set of Bonferroni p-values for different hypotheses.

Scheffe.p

Set of Scheffe p-values for different hypotheses.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(denim)
attach(denim)
X <- cbind(1,binaries(Denim),binaries(Laundry))
A <- rbind(c(0,1,-1,0,0,0,0),c(0,1,0,-1,0,0,0),c(0,0,1,-1,0,0,0))
xi <- c(0,0,0)
multcomp(Abrasion, X, A, xi, tol=1e-14)
detach(denim)

Olympic sprint finals data

Description

Times recorded by winners of men's olympic sprint finals in different categories from 1900 to 1988 (Lunn and McNeil, 1991).

Usage

data(olympic)

Format

A data frame with 20 observations on the following 6 variables.

Year

Olympic year

X100m

Winner's time (in seconds) for 100 meters sprint

X200m

Winner's time (in seconds) for 200 meters sprint

X400m

Winner's time (in seconds) for 400 meters sprint

X800m

Winner's time (in seconds) for 800 meters sprint

X1500m

Winner's time (in seconds) for 1500 meters sprint

Details

There are three missing years in the data; 1916, 1940 and 1944, when world wars prevented the olympic games from being held.

Source

Lunn, A.D. and McNeil, D.R. (1991) Computer-Interactive Data Analysis, Wiley, Chichester.

Examples

data(olympic)
head(olympic) 

Survival times of poisoned animals

Description

Survival times of animals exposed to poison and treatment (Box and Cox, 1964).

Usage

data(poison)

Format

A data frame with 48 observations on the following 3 variables.

Survtime

Survival time (in 10 hour units)

Treatment

Treatment type: 1 = treatment A, 2 = treatment B, 3 = treatment C, 4 = treatment D

Poison

Poison type: 1 = Poison I, 2 = Poison II, 3 = Poison III

Source

Box, G.E.P. and Cox, D.R. (1964) An analysis of transformations. J. Roy. Statist. Soc. Ser. B, 26, pp.211-252.

Examples

data(poison)
head(poison) 

Orthogonal projector of a matrix

Description

Computes the orthogonal projection matrix for the column space of a given matrix.

Usage

projector(M, tol=sqrt(.Machine$double.eps))

Arguments

M

A matrix for which the orthogonal projection matrix is to be computed.

tol

A relative tolerance to detect zero singular values while computing generalized inverse, in case M is rank deficient (default = sqrt(.Machine$double.eps)).

Value

Returns the orthogonal projection matrix for the column space of M.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

projector(matrix(3,3,3))

Egyptian skull development

Description

Measurements of male Egyptian skulls from time periods ranging from 4000 BC to 150 AD.

Usage

data(skulls)

Format

A data frame with 150 observations on the following 5 variables.

MB

Maximal breadth (in mm)

BH

Basibregmatic height (in mm)

BL

Basialveolar length (in mm)

NH

Nasal height (in mm)

Year

Approximate Year of Skull Formation (negative = B.C., positive = A.D.)

Source

Thomson, A. and Randall-Maciver, R. (1905) Ancient Races of the Thebaid, Oxford University Press, Oxford.

Examples

data(skulls)
head(skulls)

Energy data

Description

Energy absorbed by four machines for Charpy V-notch testing.

Usage

data(splett2)

Format

A data frame with 99 observations on the following 2 variables.

Energy

Energy absorbed by machine (in foot-pounds)

Machine

Machine type (1 = Tinius1, 2 = Tinius2, 3 = Satec, 4 = Tokyo)

Source

Dataplot webpage of the National Institute of Standards and Technology (NIST),
USA (https://www.itl.nist.gov/div898/software/dataplot/data/SPLETT2.DAT).

Examples

data(splett2)
head(splett2)

Stars data 1

Description

Distance of galactic objects from Earth and their velocities (Hubble, 1929).

Usage

data(stars1)

Format

A data frame with 24 observations on the following 2 variables.

Distance

Distance from Earth (in million parsec; 1 parsec = 3.26 light years)

Velocity

Velocity of galaxy (in km/s)

Source

Hubble, E. (1929) A relation between distance and radial velocity among extra galactic nebulae. Proc. Nat. Acad. Sc. 15, pp.168-73.

Examples

data(stars1)
head(stars1)

Stars data 2

Description

Distance of additional galactic objects from Earth and their velocities (Humason, 1936).

Usage

data(stars2)

Format

A data frame with 21 observations on the following 2 variables.

Distance

Distance from Earth (in million parsec; 1 parsec = 3.26 light years)

Velocity

Velocity of Galaxy (in km/s)

Details

The galactic objects in this data set are much further away from Earth than those in the data set stars1.txt. These became available within a few years of the publication of Hubble's original work, through rapid advancesment in technology. Although the new data cemented Hubble's hypothesis that distant objects have proportionately higher velocity (as they should in a universe expanding with constant acceleration), the constant of proportionality turned out to be somewhat different from Hubble's original estimate.

Source

Humason, M.L. (1936) The apparent radial velocities of 100 extra galactic nebula. Astrophys. J. 83, pp.10-22.

Examples

data(stars2)
head(stars2)

Supplementary basis vectors for column space of a matrix

Description

Computes a basis which, together with a basis of some columns of a matrix, constitute a basis of the column space of the entire matrix.

Usage

supplbasis(A, B, tol=sqrt(.Machine$double.eps))

Arguments

A

Sub-matrix containing some columns of a matrix.

B

Sub-matrix containing remaining columns of same matrix.

tol

A relative tolerance to detect rank deficiency during qr decomposition (default = sqrt(.Machine$double.eps)).

Value

Returns a semi-orthogonal matrix whose columns, together with a basis of the column space of A, constitute a basis of the column space of the entire matrix (A:B).

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

A <- cbind(c(2,1,-2),c(3,1,-1))
B <- diag(c(1,1,0))
supplbasis(A,B)

Trace of matrix

Description

Computes the trace of a given matrix.

Usage

tr(M)

Arguments

M

A matrix whose trace is to be computed.

Value

A scalar value, describing the trace of M.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

tr(matrix(2,2,2))

Brown trout hemoglobin data

Description

The measured hemoglobin content in the blood of brown trout that were randomly allocated to four troughs, where different concentrations of sulfamerazine in food were administered 35 days prior to measurement (Gutsell, 1951).

Usage

data(trout)

Format

A data frame with 40 observations on the following 2 variables.

Sulfamerazine

Concentrations of sulfamerazine (in grams per 100 pounds of fish)

Hemoglobin

Hemoglobin content (in grams per 100 ml of blood)

Source

Gutsell, James S. (1951) The effect of sulfamerazine on the erythrocyte and hemoglobin content of trout blood, Biometrics 7(2), pp.171-179.

Examples

data(trout)
head(trout)

Waist circumference and adipose tissue data

Description

Waist circumference and adipose tissue data (Daniel and Cross, 2013).

Usage

data(waist)

Format

A data frame with 109 observations on the following 2 variables.

Waist

Waist circumference (in centimeters)

AT

Area of lower abdominal adipose tissue (in squared centimeters)

Source

Daniel, W.W. and Cross, C.L. (2013) Biostatistics: A Foundation for Analysis in the Health Sciences, tenth edition, Wiley, New York, Table 9.3.1.

Examples

data(waist)
head(waist)

World population data

Description

The midyear population of the world for the years 1981-2000.

Usage

data(worldpop)

Format

A data frame with 20 observations on the following 2 variables.

Year

Calendar year

Pop.billion

Population (in billion)

Source

U.S. Census Bureau, International Data Base (http://www.census.gov/ipc/www/idbnew.html)

Examples

data(worldpop)
head(worldpop)

World record running times data

Description

Men's and women's world record times for various out-door running distances, recognized by the International Association of Athletics Federations (IAAF) as of 17 November, 2017.

Usage

data(worldrecord)

Format

A data frame with 10 observations on the following 3 variables.

Distance

Running distance (in meters)

MenRecord

Men's record time (in seconds)

WomenRecord

Women's record time (in seconds)

Source

International Association of Athletics Federations (https://www.iaaf.org/records/by-category/world-records).

Examples

data(worldrecord)
head(worldrecord)

Prepare design matrix for two way layout with single oberservation per cell

Description

Prepares design matrix for two way classified data with single observation per cell and response vector in corresponding order.

Usage

yX(response, treatments, blocks)

Arguments

response

Response vector as provided (numeric).

treatments

Vector of treatment levels as provided (either numeric or character).

blocks

Vector of block levels as provided (either numeric or character).

Value

Returns a list with following components.

X

A binary matrix with number of rows equal to length of response and number of columns equal to the total number of levels of treatments and blocks plus one. Each row has exactly three 1s: in the first position and in the two positions representing the treatment and block levels.

y

Numeric vector of response values, permuted to correspond with the rows of X.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(airspeed)
yX(airspeed$Posmaxspeed,airspeed$Reynolds,airspeed$Ribht)

Prepare design matrix for balanced two way layout

Description

Prepares design matrix for balanced two way classified data and response vector in corresponding order.

Usage

yXm(response, treatments, blocks)

Arguments

response

Response vector as provided (numeric).

treatments

Vector of treatment levels as provided (either numeric or character).

blocks

Vector of block levels as provided (either numeric or character).

Value

Returns a list with following components.

X

A binary matrix with number of rows equal to length of response and number of columns equal to the total number of levels of treatments and blocks plus one. Each row has exactly three 1s: in the first position and in the two positions representing the treatment and block levels.

y

Numeric vector of response values, permuted to correspond with the rows of X.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(poison)
yXm(poison$Survtime,poison$Treatment,poison$Poison)

Prepare design matrix for nested model with groups and subgroups

Description

Prepares design matrix for nested model with groups and subgroups and response vector in corresponding order.

Usage

yXn(response, group, subgroup)

Arguments

response

Response vector as provided (numeric).

group

Vector of group labels as provided (either numeric or character).

subgroup

Vector of subgroup labels as provided (either numeric or character).

Value

Returns a list with following components.

X

A binary matrix with number of rows equal to length of response and number of columns equal to the total number of levels of treatments and blocks plus one. Each row has exactly three 1s: in the first position and in the two positions representing the group and the subgroup.

y

Numeric vector of response values, permuted to correspond with the rows of X.

Author(s)

Debasis Sengupta <shairiksengupta@gmail.com>, Jinwen Qiu <qjwsnow_ctw@hotmail.com>

References

Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.

Examples

data(kinks)
yXn(kinks$beta,kinks$type,kinks$order)