We will be using the data from the cleaned flu analysis data, so we will need to load the data from the processed_data folder.
dat <-readRDS(here::here("fluanalysis","data","processed_data","flu_processed"))
We’ll then need to find a way to create a dummy data set, called the test data set, from the cleaned data. We will use this data to test the efficacy of the generated model. We will use the remaining data, the training data set, to fit the model.
To attempt this, we will set a seed with set.seed() for randomization to ensure that these processes are reproducible. Further, we use initial_split() from the rsample package to generate a splitting rule for the training and test data sets.
We intend to use the tidymodels workflow to generate our logistic regression model. Within this workflow, we use recipe() and worklfow() to identify the relationships of interest.
# Initialize the interactions we are interested influ_logit_rec <-recipe(Nausea ~ ., data = training_data)# Initialize the logistic regression formulalogit_mod <-logistic_reg() %>%set_engine("glm")# Initialize the workflowflu_wflow1 <-workflow() %>%add_model(logit_mod) %>%add_recipe(flu_logit_rec)flu_wflow1
If we want to assess how well the model makes predictions, we can evaluate this with an ROC curve. roc_curev() and autoplot() will prepare the plot for us to evaluate the model on the training_data and the test_data, separately.
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
ℹ The deprecated feature was likely used in the yardstick package.
Please report the issue at <]8;;https://github.com/tidymodels/yardstick/issueshttps://github.com/tidymodels/yardstick/issues]8;;>.
roc_auc() estimates the area under the ROC curve. An area close to 1 means a good prediction, while an area near 0.5 means the model is of poor predictive quality.
Creating workflow and fitting model using all predictors
# Creating recipe and set up dummy code for all categorical variablesset.seed(123)temp_rec=recipe(BodyTemp~.,data=training_data)%>%step_dummy(all_nominal())# Training linear regression modellm_mod=linear_reg()%>%set_engine("lm")# Creating workflowtemp_workflow=workflow()%>%add_model(lm_mod)%>%add_recipe(temp_rec)temp_workflow
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ────────────────────────────────────────────────────────────────
1 Recipe Step
• step_dummy()
── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: lm
temp_fit=temp_workflow%>%fit(data=training_data)# Checking the parameter estimates and arrange their respective p.valuestemp_fit%>%extract_fit_parsnip()%>%tidy()%>%arrange(p.value)
# A tibble: 3 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 1.27
2 rsq standard 0.0316
3 mae standard 0.957
Creating workflow and fitting model using the main predictor (RunnyNose)
set.seed(234)temp_rec2=recipe(BodyTemp~RunnyNose,data=training_data)# Training linear regression modellm_mod=linear_reg()%>%set_engine("lm")# Creating workflowtemp_workflow2=workflow()%>%add_model(lm_mod)%>%add_recipe(temp_rec2)temp_workflow2
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: lm
temp_fit2=temp_workflow2%>%fit(data=training_data)# Checking the parameter estimates and arrange their respective p.valuestemp_fit2%>%extract_fit_parsnip()%>%tidy()%>%arrange(p.value)