| Type: | Package | 
| Title: | Bioinformatics Modeling with Recursion and Autoencoder-Based Ensemble | 
| Version: | 0.1.0 | 
| Description: | Tools for bioinformatics modeling using recursive transformer-inspired architectures, autoencoders, random forests, XGBoost, and stacked ensemble models. Includes utilities for cross-validation, calibration, benchmarking, and threshold optimization in predictive modeling workflows. The methodology builds on ensemble learning (Breiman 2001 <doi:10.1023/A:1010933404324>), gradient boosting (Chen and Guestrin 2016 <doi:10.1145/2939672.2939785>), autoencoders (Hinton and Salakhutdinov 2006 <doi:10.1126/science.1127647>), and recursive transformer efficiency approaches such as Mixture-of-Recursions (Bae et al. 2025 <doi:10.48550/arXiv.2507.10524>). | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.3 | 
| Depends: | R (≥ 4.2.0) | 
| Imports: | caret, recipes, themis, xgboost, magrittr, dplyr, pROC | 
| Suggests: | randomForest, testthat (≥ 3.0.0), PRROC, ggplot2, purrr, tibble, yardstick, knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| Config/testthat/edition: | 3 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-09-27 09:30:29 UTC; apple | 
| Author: | MD. Arshad [aut, cre] | 
| Maintainer: | MD. Arshad <arshad10867c@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-10-03 13:50:02 UTC | 
BioMoR: Bioinformatics Modeling with Recursion, Autoencoders, and Stacked Models
Description
The BioMoR package provides a modeling framework for bioinformatics tasks, combining recursive deep learning architectures (transformer-inspired), autoencoders for feature compression, and stacked models (RF, XGBoost, meta-learners).
Details
Main features:
Data preparation utilities with recipe-based preprocessing and SMOTE-ready CV.
Base learners: Random Forest and XGBoost (caret interface).
Meta-models: stacked learners with recursive refinements.
Evaluation: ROC, PR, F1 tuning, balanced accuracy, Brier score, calibration.
Authors
Maintainer: MD. Arshad arshad10867c@gmail.com
Author(s)
Maintainer: MD. Arshad arshad10867c@gmail.com
Benchmark a trained model
Description
Evaluates a trained caret model on test data, returning Accuracy, F1 score, and ROC-AUC. If only one class is present in the test set, ROC-AUC is returned as NA.
Usage
biomor_benchmark(model, test_data, outcome_col)
Arguments
model | 
 A trained caret model  | 
test_data | 
 Dataframe containing predictors and outcome  | 
outcome_col | 
 Name of outcome column  | 
Value
A named list of metrics
Run full BioMoR pipeline
Description
Run full BioMoR pipeline
Usage
biomor_run_pipeline(data, feature_cols = NULL, epochs = 50)
Arguments
data | 
 dataframe with Label + descriptors  | 
feature_cols | 
 optional feature set  | 
epochs | 
 autoencoder epochs  | 
Value
list of trained models + benchmark reports
Compute Brier Score
Description
The Brier score is the mean squared error between predicted probabilities and the true binary outcome (0/1). Lower is better.
Usage
brier_score(y_true, y_prob, positive = "Active")
Arguments
y_true | 
 True factor labels.  | 
y_prob | 
 Predicted probabilities for the positive class.  | 
positive | 
 Name of the positive class (default   | 
Value
Numeric Brier score.
Calibrate model probabilities
Description
Calibrate model probabilities
Usage
calibrate_model(model, test_data, method = "platt")
Arguments
model | 
 caret or xgboost model  | 
test_data | 
 test dataframe  | 
method | 
 "platt" or "isotonic"  | 
Value
calibrated probs
Compute optimal threshold for maximum F1 score
Description
Sweeps thresholds between 0 and 1 to find the one that maximizes F1.
Usage
compute_f1_threshold(y_true, y_prob, positive = "Active")
Arguments
y_true | 
 True factor labels.  | 
y_prob | 
 Predicted probabilities for the positive class.  | 
positive | 
 Name of the positive class (default   | 
Value
A list with elements:
- threshold
 Best probability cutoff.
- best_f1
 Maximum F1 score achieved.
Get caret cross-validation control
Description
Creates a caret::trainControl object for cross-validation, configured for two-class problems, ROC-based performance, and optional sampling strategies such as SMOTE or ROSE.
Usage
get_cv_control(cv = 5, sampling = NULL)
Arguments
cv | 
 Number of folds (default 5).  | 
sampling | 
 Sampling method (e.g., "smote", "rose", or NULL).  | 
Value
A caret::trainControl object.
Get Embeddings from Autoencoder (stub)
Description
Placeholder for extracting embeddings from a trained autoencoder.
Usage
get_embeddings(ae_obj, data, feature_cols = NULL)
Arguments
ae_obj | 
 Autoencoder object  | 
data | 
 Input data  | 
feature_cols | 
 Columns to use as features  | 
Value
Matrix of embeddings (currently NULL since this is a stub)
Prepare dataset for modeling
Description
Prepare dataset for modeling
Usage
prepare_model_data(df, outcome_col = "Label")
Arguments
df | 
 A data.frame  | 
outcome_col | 
 Name of the outcome column  | 
Value
A processed data.frame with factor outcome
Train Autoencoder (stub)
Description
Placeholder for future autoencoder integration in BioMoR.
Usage
train_autoencoder(
  data,
  feature_cols = NULL,
  epochs = 10,
  batch_size = 32,
  lr = 0.001
)
Arguments
data | 
 Input data (matrix or data frame)  | 
feature_cols | 
 Columns to use as features  | 
epochs | 
 Number of training epochs  | 
batch_size | 
 Mini-batch size  | 
lr | 
 Learning rate  | 
Value
A placeholder list with class "autoencoder"
Train BioMoR Autoencoder
Description
Train BioMoR Autoencoder
Usage
train_biomor(data, feature_cols, epochs = 100, batch_size = 50, lr = 0.001)
Arguments
data | 
 Dataframe with numeric features + Label  | 
feature_cols | 
 Character vector of feature columns  | 
epochs | 
 Number of training epochs  | 
batch_size | 
 Batch size  | 
lr | 
 Learning rate  | 
Value
list(model, dataset, embeddings)
Train a Random Forest model with caret
Description
Train a Random Forest model with caret
Usage
train_rf(df, outcome_col = "Label", ctrl)
Arguments
df | 
 A data.frame containing predictors and outcome  | 
outcome_col | 
 Name of the outcome column (binary factor)  | 
ctrl | 
 A caret::trainControl object  | 
Value
A caret train object
Train an XGBoost model with caret
Description
Train an XGBoost model with caret
Usage
train_xgb_caret(df, outcome_col = "Label", ctrl)
Arguments
df | 
 A data.frame containing predictors and outcome  | 
outcome_col | 
 Name of the outcome column (binary factor)  | 
ctrl | 
 A caret::trainControl object  | 
Value
A caret train object