# Load necessary libraries
library(MASS) # For generating multivariate normal data
library(simBKMRdata) # For generating skewed Gamma data and estimating moments
This vignette demonstrates how to simulate multivariate normal data and multivariate skewed Gamma data using pre-estimated statistics or datasets.
In this guide, we cover the following steps:
Simulate Multivariate Normal Data: Use pre-estimated statistics (mean vector and covariance matrix) to generate multivariate normal data using the simulate_group_data()
function with MASS::mvrnorm()
data generation function.
Estimate Multivariate Moments: Using a dataset to estimate key statistics such as mean, variance, correlation, and skewness using calculate_stats_gaussian()
function.
Simulate Multivariate Skewed Gamma Data: Use pre-estimated statistics to generate multivariate skewed Gamma data using the simulate_group_data()
function with generate_mvGamma_data
data generation function.
Estimate Multivariate Skewed Gamma Parameters: Estimate skewed Gamma distribution parameters (shape and rate) using the calculate_stats_gamma()
function. Choose either Method of Moments (MoM) or Generalized Maximum Likelihood Estimation (gMLE) methods based on an existing dataset.
We will load the necessary libraries and assume that we already have pre-estimated statistics for the multivariate normal and skewed Gamma data generation.
# Load necessary libraries
library(MASS) # For generating multivariate normal data
library(simBKMRdata) # For generating skewed Gamma data and estimating moments
Given the pre-estimated mean vector and covariance matrix, we can generate multivariate normal data using the MASS::mvrnorm()
function.
# Example using MASS::mvrnorm for normal distribution
<- list(
param_list Group1 = list(
mean_vec = c(1, 2),
sampCorr_mat = matrix(c(1, 0.5, 0.5, 1), 2, 2),
sampSize = 100
),Group2 = list(
mean_vec = c(2, 3),
sampCorr_mat = matrix(c(1, 0.3, 0.3, 1), 2, 2),
sampSize = 150
)
)
<- simulate_group_data(param_list, MASS::mvrnorm, "Group") mvnorm_samples
Let’s visualize the first two variables of the generated multivariate normal data.
# Plot the first two variables of the multivariate normal data
plot(
1], mvnorm_samples[, 2],
mvnorm_samples[, main = "Scatterplot: MV Normal Data",
xlab = "Variable 1", ylab = "Variable 2",
pch = 19, col = "blue"
)
Suppose we already have a dataset and need to estimate key multivariate moments such as mean, variance, skewness, and correlation. We can use the estimate_mv_moments()
function to calculate these statistics.
<- data.frame(
myData GENDER = c('Male', 'Female', 'Male', 'Female', 'Male', 'Female'),
VALUE1 = c(1.2, 2.3, 1.5, 2.7, 1.35, 2.5),
VALUE2 = c(3.4, 4.5, 3.8, 4.2, 3.6, 4.35)
)calculate_stats_gaussian(data_df = myData, group_col = "GENDER")
$Female
$Female$sampSize
[1] 3
$Female$mean_vec
VALUE1 VALUE2
2.50 4.35
$Female$sampSD
VALUE1 VALUE2
0.20 0.15
$Female$sampCorr_mat
VALUE1 VALUE2
VALUE1 1 -1
VALUE2 -1 1
$Female$sampSkew
[1] 0.00000e+00 5.91091e-15
$Male
$Male$sampSize
[1] 3
$Male$mean_vec
VALUE1 VALUE2
1.35 3.60
$Male$sampSD
VALUE1 VALUE2
0.15 0.20
$Male$sampCorr_mat
VALUE1 VALUE2
VALUE1 1 1
VALUE2 1 1
$Male$sampSkew
[1] -4.411766e-15 2.240684e-15
The moment_estimates object contains:
sampSize: The number of observations.
mean_vec: Mean vector for each variable.
sampSD: Standard deviation of each variable.
sampCorr_mat: The correlation matrix.
sampSkew: The skewness for each variable.
We can now generate multivariate skewed Gamma data based on pre-estimated shape and rate parameters for the Gamma distribution. This can be done using the generate_mvGamma_data
function.
# Example using generate_mvGamma_data for Gamma distribution
<- list(
param_list Group1 = list(
sampCorr_mat = matrix(c(1, 0.5, 0.5, 1), 2, 2),
shape_num = c(2, 2),
rate_num = c(1, 1),
sampSize = 100
),Group2 = list(
sampCorr_mat = matrix(c(1, 0.3, 0.3, 1), 2, 2),
shape_num = c(2, 2),
rate_num = c(1, 1),
sampSize = 150
)
)
<- simulate_group_data(
gamma_samples "Group"
param_list, generate_mvGamma_data, )
Let’s plot the density of the first two variables of the generated skewed Gamma data.
# Plot the density of the first and second variable for Gamma data
<- par()[["mfrow"]]
old_par_mfrow par(mfrow = c(2, 1))
plot(density(gamma_samples[, 1]), main = "Gamma Variable 1", col = "blue")
plot(density(gamma_samples[, 2]), main = "Gamma Variable 2", col = "blue")
par(mfrow = old_par_mfrow)
If we have an existing dataset, we can estimate the parameters of the multivariate skewed Gamma distribution using methods such as Method of Moments (MoM) or Generalized Maximum Likelihood Estimation (gMLE).
<- data.frame(
myData GENDER = c('Male', 'Female', 'Male', 'Female', 'Male', 'Female'),
VALUE1 = c(1.2, 2.3, 1.5, 2.7, 1.35, 2.5),
VALUE2 = c(3.4, 4.5, 3.8, 4.2, 3.6, 4.35)
)calculate_stats_gamma(data_df = myData, group_col= "GENDER", using = "MoM")
$Female
$Female$sampSize
[1] 3
$Female$mean_vec
VALUE1 VALUE2
2.50 4.35
$Female$sampCorr_mat
VALUE1 VALUE2
VALUE1 1 -1
VALUE2 -1 1
$Female$shape_num
VALUE1 VALUE2
156.25 841.00
$Female$rate_num
VALUE1 VALUE2
62.5000 193.3333
$Male
$Male$sampSize
[1] 3
$Male$mean_vec
VALUE1 VALUE2
1.35 3.60
$Male$sampCorr_mat
VALUE1 VALUE2
VALUE1 1 1
VALUE2 1 1
$Male$shape_num
VALUE1 VALUE2
81 324
$Male$rate_num
VALUE1 VALUE2
60 90
The list will contain:
sampSize: The sample size.
sampCorr_mat: The sample correlation matrix.
alpha: The estimated shape parameters for each variable.
beta: The estimated rate parameters for each variable.
In this vignette, we demonstrated how to:
Simulate multivariate normal data using pre-estimated mean and covariance parameters.
Estimate multivariate moments (mean, variance, correlation, skewness) from an existing dataset.
Simulate multivariate skewed Gamma data using pre-estimated shape and rate parameters.
Estimate skewed Gamma parameters using existing data with Method of Moments (MoM).
These methods allow you to generate and analyze synthetic multivariate datasets with specific properties based on pre-estimated statistics or available data, which is useful for simulations and statistical analysis in various domains such as finance, healthcare, and engineering.