| Version: | 0.1-9 | 
| Date: | 2016-09-22 | 
| Title: | Sparse Discriminant Analysis | 
| Author: | Line Clemmensen <lhc@imm.dtu.dk>, contributions by Max Kuhn | 
| Maintainer: | Max Kuhn <mxkuhn@gmail.com> | 
| Imports: | elasticnet, MASS, mda | 
| Depends: | R (≥ 2.10) | 
| Description: | Performs sparse linear discriminant analysis for Gaussians and mixture of Gaussian models. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| URL: | http://www.imm.dtu.dk/~lhc, https://github.com/topepo/sparselda | 
| NeedsCompilation: | no | 
| Packaged: | 2016-09-22 14:49:09 UTC; kuhna03 | 
| Repository: | CRAN | 
| Date/Publication: | 2016-09-22 17:10:01 | 
Normalize training data
Description
Normalize a vector or matrix to zero mean and unit length columns
Usage
normalize(X)
Arguments
X | 
 a matrix with the training data with observations down the rows and variables in the columns.  | 
Details
The function can e.g. be used for the training data in sda or smda.
Value
Returns a list with the following attributes:
Xc | 
 The normalized data.  | 
mx | 
 Mean of columns of X.  | 
vx | 
 Length of columns of X.  | 
Id | 
 Logical vector indicating which variables are included in X. If some of the columns have zero length they are omitted.  | 
Author(s)
Line Clemmensen
References
Clemmensen, L., Hastie, T. and Ersboell, K. (2008) "Sparse discriminant analysis", Technical report, IMM, Technical University of Denmark
See Also
Examples
## Data
X<-matrix(sample(seq(3),12,replace=TRUE),nrow=3)
## Normalize data
Nm<-normalize(X)
print(Nm$Xc)
## See if any variables have been removed
which(!Nm$Id)
Normalize test data
Description
Normalize test data using output from the normalize() of the training data
Usage
normalizetest(Xtst,Xn)
Arguments
Xtst | 
 a matrix with the test data with observations down the rows and variables in the columns.  | 
Xn | 
 List with the output from normalize(Xtr) of the training data.  | 
Details
The function can e.g. be used to normalize the testing data in sda or smda.
Value
Returns the normalized test data
Xtst | 
 The normalized data.  | 
Author(s)
Line Clemmensen
References
Clemmensen, L., Hastie, T. and Ersboell, K. (2007) "Sparse discriminant analysis", Technical report, IMM, Technical University of Denmark
See Also
Examples
## Data
Xtr<-matrix(sample(seq(3),12,replace=TRUE),nrow=3)
Xtst<-matrix(sample(seq(3),12,replace=TRUE),nrow=3)
## Normalize training data 
Nm<-normalize(Xtr)
## Normalize test data
Xtst<-normalizetest(Xtst,Nm)
Data set of three species of Penicillium fungi
Description
The data set penicilliumYES has 36 rows and 3754 columns. The variables are 
1st order statistics from multi-spectral images of three species of Penicillium fungi: 
Melanoconidium, Polonicum, and Venetum.
These are the data used in the Clemmemsen et al "Sparse Discriminant Analysis" paper.
Usage
data(penicilliumYES)
Format
This data set contains the following matrices:
- X
 A matrix with 36 columns and 3754 rows. The training and test data. The first 12 rows are P. Melanoconidium species, rows 13-24 are P. Polonicum species, and the last 12 rows are P. Venetum species. The samples are ordered so that each pair of three is from the same isolate.
- Y
 A matrix of dummy variables for the training data.
- Z
 Z matrix of probabilities for the subcalsses of the training data.
Details
The X matrix is not normalized.
Source
References
Clemmensen, Hansen, Frisvad, Ersboell (2007) "A method for comparison of growth media in objective identification of Penicillium based on multi-spectral imaging" Journal of Microbiological Methods
Predict method for Sparse Discriminant Methods
Description
Prediction functions for link{sda} and link{smda}.
Usage
## S3 method for class 'sda'
predict(object, newdata = NULL, ...)
## S3 method for class 'smda'
predict(object, newdata = NULL, ...)
Arguments
object | 
 an object of class    | 
newdata | 
 a matrix or data frame of predictors  | 
... | 
 arguments passed to   | 
Details
The current implementation for mixture discriminant models current predicts the subclass probabilities.
Value
A list with components:
class | 
 The classification (a factor)  | 
posterior | 
 posterior probabilities for the classes (or subclasses for   | 
x | 
 the scores  | 
Sparse discriminant analysis
Description
Performs sparse linear discriminant analysis. Using an alternating minimization algorithm to minimize the SDA criterion.
Usage
sda(x, ...)
## Default S3 method:
sda(x, y, lambda = 1e-6, stop = -p, maxIte = 100,
    Q = K-1, trace = FALSE, tol = 1e-6, ...)
Arguments
x | 
 A matrix of the training data with observations down the rows and variables in the columns.  | 
y | 
 A matrix initializing the dummy variables representing the groups.  | 
lambda | 
 The weight on the L2-norm for elastic net regression. Default: 1e-6.  | 
stop | 
 If STOP is negative, its absolute value corresponds to the desired number of variables. If STOP is positive, it corresponds to an upper bound on the L1-norm of the b coefficients. There is a one to one correspondence between stop and t. The default is -p (-the number of variables).  | 
maxIte | 
 Maximum number of iterations. Default: 100.  | 
Q | 
 Number of components. Maximum and default is K-1 (the number of classes less one).  | 
trace | 
 If TRUE, prints out its progress. Default: FALSE.  | 
tol | 
 Tolerance for the stopping criterion (change in RSS). Default is 1e-6.  | 
... | 
 additional arguments  | 
Details
The function finds sparse directions for linear classification.
Value
Returns a list with the following attributes:
beta | 
 The loadings of the sparse discriminative directions.  | 
theta | 
 The optimal scores.  | 
rss | 
 A vector of the Residual Sum of Squares at each iteration.  | 
varNames | 
 Names on included variables  | 
.
Author(s)
Line Clemmensen, modified by Trevor Hastie
References
Clemmensen, L., Hastie, T. Witten, D. and Ersboell, K. (2011) "Sparse discriminant analysis", Technometrics, To appear.
See Also
normalize, normalizetest, smda
Examples
## load data
data(penicilliumYES)
X <- penicilliumYES$X
Y <- penicilliumYES$Y
colnames(Y) <- c("P. Melanoconidium",
                 "P. Polonicum",
                 "P. Venetum")
## test samples
Iout<-c(3,6,9,12)
Iout<-c(Iout,Iout+12,Iout+24)
## training data
Xtr<-X[-Iout,]
k<-3
n<-dim(Xtr)[1]
## Normalize data
Xc<-normalize(Xtr)
Xn<-Xc$Xc
p<-dim(Xn)[2]
## Perform SDA with one non-zero loading for each discriminative
## direction with Y as matrix input
out <- sda(Xn, Y,
           lambda = 1e-6,
           stop = -1,
           maxIte = 25,
           trace = TRUE)
## predict training samples
train <- predict(out, Xn)
## testing
Xtst<-X[Iout,]
Xtst<-normalizetest(Xtst,Xc)
test <- predict(out, Xtst)
print(test$class)
## Factor Y as input
Yvec <- factor(rep(colnames(Y), each = 8))
out2 <- sda(Xn, Yvec,
            lambda = 1e-6,
            stop = -1,
            maxIte = 25,
            trace = TRUE)
Sparse mixture discriminant analysis
Description
Performs sparse linear discriminant analysis for mixture of gaussians models.
Usage
smda(x, ...)
## Default S3 method:
smda(x, y, Z = NULL, Rj = NULL, 
     lambda = 1e-6, stop, maxIte = 50, Q=R-1,
     trace = FALSE, tol = 1e-4, ...)
Arguments
x | 
 A matrix of the training data with observations down the rows and variables in the columns.  | 
y | 
 A matrix initializing the dummy variables representing the groups.  | 
Z | 
 Am optional matrix initializing the probabilities representing the groups.  | 
Rj | 
 K length vector containing the number of subclasses in each of the K classes.  | 
lambda | 
 The weight on the L2-norm for elastic net regression. Default: 1e-6.  | 
stop | 
 If STOP is negative, its absolute value corresponds to the desired number of variables. If STOP is positive, it corresponds to an upper bound on the L1-norm of the b coefficients. There is a one to one correspondence between stop and t.  | 
maxIte | 
 Maximum number of iterations. Default: 50.  | 
Q | 
 The number of components to include. Maximum and default is R-1 (total number of subclasses less one).  | 
trace | 
 If TRUE, prints out its progress. Default: FALSE.  | 
tol | 
 Tolerance for the stopping criterion (change in RSS). Default: 1e-4  | 
... | 
 additional arguments  | 
Details
The function finds sparse directions for linear classification of mixture og gaussians models.
Value
Returns a list with the following attributes:
call | 
 The call  | 
beta | 
 The loadings of the sparse discriminative directions.  | 
theta | 
 The optimal scores.  | 
Z | 
 Updated subclass probabilities.  | 
Rj | 
 a vector of the number of ssubclasses per class  | 
rss | 
 A vector of the Residual Sum of Squares at each iteration.  | 
Author(s)
Line Clemmensen
References
Clemmensen, L., Hastie, T., Witten, D. and Ersboell, K. (2007) "Sparse discriminant analysis", Technometrics, To appear.
See Also
Examples
# load data
data(penicilliumYES)
X <- penicilliumYES$X
Y <- penicilliumYES$Y
Z <- penicilliumYES$Z
## test samples
Iout <- c(3, 6, 9, 12)
Iout <- c(Iout, Iout+12, Iout+24)
## training data
Xtr <- X[-Iout,]
k <- 3
n <- dim(Xtr)[1]
Rj <- rep(4, 3)
## Normalize data
Xc <- normalize(Xtr)
Xn <- Xc$Xc
p <- dim(Xn)[2]
## perform SMDA with one non-zero loading for each discriminative
## direction
## Not run: 
smdaFit <- smda(x = Xn,
                y = Y, 
                Z = Z, 
                Rj = Rj,
                lambda = 1e-6,
                stop = -5,
                maxIte = 10,
                tol = 1e-2)
# testing
Xtst <- X[Iout,]
Xtst <- normalizetest(Xtst, Xc)
test <- predict(smdaFit, Xtst)
## End(Not run)