library(Colossus)
library(data.table)
library(parallel)
Colossus was developed in the context of radiation epidemiologists wanting to use new methods and more data. At the time of development, 32-bit Epicure was a popular software for running radiation epidemiological analysis. This vignette was written with that in mind to help any transitioning users see how the two are similar and different, in addition to providing guidance for converting between the two. The script provided will discuss both Colossus-specific differences as well as general R capabilities that differ from Epicure.
The following peanuts script was used as part of validation efforts for Colossus. The script runs three regressions. A linear excess relative risk model, a log-linear hazard ratio model for dose overall, and a log-linear hazard ratio model for categorical bins of dose. Similar regressions were performed with radiation cohort data to assess the relationship between the risk of different mortality events and cumulative radiation exposure to organs.
The script starts by loading the dataset. In this example, we assume there is a file called “EX_DOSE.csv” in the local directory.
RECORDS 4100000 @
WSOPT VARMAX 30 @
USETXT EX_DOSE.csv @
INPUT @
Next, the script sets the interval start column and interval end column, and designates which columns have multiple levels.
ENTRY age_entry @
EXIT age_exit @
EVENT nonCLL @
levels SES_CAT YOB_CAT sexm dose_cat @
Past this point, the script specifies which columns belong to each subterm and term number. In this case, the model has a linear element in the second term. The script additionally sets the convergence options. The maximum iterations are set to 100 and the convergence criteria are set to 1e-9. The regression is also set to print deviance and parameter estimates for each iteration and to print the final parameter estimates and covariance after the regression concludes.
LOGLINEAR 0 SES_CAT YOB_CAT sexm @
LINEAR 1 cumulative_dose @
FITOPT P V ITER 100 CONV -9 @
Finally, the regression is run and results are printed to the console.
FIT @
Past this point, the same general process is followed for the two hazard ratio regressions. The model definition is reset, the subterm/term description is set, and the regression is run.
NOMODEL @
LOGLINEAR 0 SES_CAT YOB_CAT sexm cumulative_dose @
FIT @
NOMODEL @
LOGLINEAR 0 SES_CAT YOB_CAT sexm dose_cat @
FIT @
For the most part, the same steps are performed in Colossus. The script starts by reading the input file. The only difference is that R does not require memory to be set aside before reading the file.
fread("EX_DOSE.csv") df_Dose <-
The next step defines covariates with levels and assigns the interval/event column information. The main difference is that Colossus currently does not automatically split factored columns. So the first several lines transform the dataset to include factored columns, and later on the user has to list all of the factored columns.
c("SES_CAT", "YOB_CAT", "dose_cat")
col_list <- factorize(df_Dose, col_list)
val <- val$df
df_Dose <-
"age_entry"
t0 <- "age_exit"
t1 <- "nonCLL" event <-
Next, the script defines the model and control information. Colossus stores the model information like columns in a table. So it expects lists of column names, subterm types, term numbers, etc. In this case, the only non-default values are the column names and subterm types. The “control” variable serves a similar purpose as the “FITOPT” option in Epicure.
# ERR
c(
names <-"cumulative_dose", "SES_CAT_1", "SES_CAT_2", "YOB_CAT_1", "YOB_CAT_2",
"YOB_CAT_3", "YOB_CAT_4", "sexm"
) c("plin", rep("loglin", length(names) - 1))
tform <- list("Ncores" = 8, "maxiter" = 100, "verbose" = 2, "epsilon" = 1e-9, "der_epsilon" = 1e-9) control <-
Finally, the regression is run and a list of the results is returned to a variable “e”. Regressions in Colossus return a list of results, a summary can be printed with an included function.
RunCoxRegression(df_Dose, t0, t1, event, names, tform = tform, control = control)
e <-Interpret_Output(e)
A similar process is run for the two hazard ratio regressions.
# HR
c(
names <-"cumulative_dose", "SES_CAT_1", "SES_CAT_2", "YOB_CAT_1", "YOB_CAT_2",
"YOB_CAT_3", "YOB_CAT_4", "sexm"
) rep("loglin", length(names))
tform <- RunCoxRegression(df_Dose, t0, t1, event, names, tform = tform, control = control)
e <-Interpret_Output(e)
# Categorical
c(
names <-"dose_cat_1", "dose_cat_2", "dose_cat_3", "dose_cat_4", "dose_cat_5",
"dose_cat_6", "SES_CAT_1", "SES_CAT_2", "YOB_CAT_1", "YOB_CAT_2",
"YOB_CAT_3", "YOB_CAT_4", "sexm"
) rep("loglin", length(names))
tform <- RunCoxRegression(df_Dose, t0, t1, event, names, tform = tform, control = control)
e <-Interpret_Output(e)