lm_significant is used to optimize a multiple linear regression model to only those predictor variables that are statistically significant for building that model. The function currently only supports additive model linear regressions.

lm_significant(
  data,
  res,
  preds = NULL,
  p = 0.01,
  verbose = TRUE,
  all = FALSE,
  ...
)

Arguments

data

a data frame object containing the variables to be used as response and predictors in the model.

res

a character vector of length 1 that matches the name of the response variable column in data. Response variable in the data must be of numeric type.

preds

a character vector of predictor variables in the data. When not specified function will take all variables in data other than response variable as input predictors to start with. A default of NULL is given to this argument to provide the flexibility of using either user defined predictor variables or all but response variable as predictors from the data.

p

a numeric value that denotes the threshold for selecting the predictors based on their statistical significance in building the model. Default p-value threshold is 0.01.

verbose

a logical value denoting whether or not to print progress messages as the function is being run. Default is TRUE

all

if TRUE, the function will return a list with two lm model objects. The first one is the original call with all the input predictors and the second is the model with only the significant predictors. The default is FALSE which returns a single lm object with only the significant predictors

...

additional arguments to be passed to the inner lm() function calls. Refer to the documentation of lm() for more details on those arguments

Value

The function will return a list with two lm model objects is all = TRUE. The first one is the original call with all the input predictors and the second is the model with only the significant predictors.

If all = FALSE the function returns a single lm object with only the significant predictors

Examples

#cancer_sample data from datateachr package library(datateachr) sig_mod1 <- lm_significant(cancer_sample[,-2], res = "radius_mean")
#> #> Response Variable: radius_mean #> #> Input Predictors: ID texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave_points_mean symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se area_se smoothness_se compactness_se concavity_se concave_points_se symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave_points_worst symmetry_worst fractal_dimension_worst #> #> Fitting a linear model #> #> Optimization of Predictors #> .... #> #> Final Optimization... #> #> #> Final Optimized Predictors: perimeter_mean compactness_mean radius_worst area_worst concavity_mean perimeter_worst compactness_worst
#Both the original model and optimized model sig_mod2 <- lm_significant(mtcars, res = "mpg", all = TRUE)
#> #> Response Variable: mpg #> #> Input Predictors: cyl disp hp drat wt qsec vs am gear carb #> #> Fitting a linear model #> #> Optimization of Predictors #> . #> #> Final Optimization... #> #> #> Final Optimized Predictors: cyl wt