Implementing a Predictive Model Pipeline using R and Microsoft Azure Machine Learning

In this post, I aim to demonstrate the process of building a simple machine learning model in R and implementing it as predictive web-service in Microsoft Azure Machine Learning (Azure ML). One is not limited to the built-in machine learning capabilities of Azure ML since the Azure ML environment enables the use of R scripts, and the ability to upload and utilize R packages.

In practice, this gives the Data Scientist the flexibility they need to use their own carefully R-crafted machine learning models within the Azure ML environment. Once an R-built machine learning model is fully implemented within the Azure ML environment, the web-service provides an API in which calls can be made by external applications. This provides large value to any organization that seeks to automate decision processes using predictive modelling.

This demonstration is subdivided into three sections:

  1. Building a Machine Learning Model in R
  2. Preparing for Model Implementation in Microsoft Azure Machine Learning
  3. Creating a Predictive Web Service in Microsoft Azure Machine Learning

Before we begin, this demonstration assumes all of the following are satisfied:

  1. We have a verified and registered Microsoft Azure Machine Learning account. Microsoft allows you to try the product for free if you do not have a subscription. Click the “Sign-up here” on the far right using the link above and follow the steps to gain access to Azure ML.
  2. We have R and an associated IDE installed. I will be using the free version of R-Studio throughout this process.
  3. We have the following R packages installed: dplyr, dummies, caretcaretEnsemble
  4. We have the Human Resources Analytics data set downloaded as this will be the main source of data for building our predictive model.

Building a Machine Learning Model in R

To keep things simple, I will be building a simple logistic regression model.

Data Pre-processing

First, I pre-process the Human Resources Analytics data set to hot-encode the categorical features sales and salary.


require(dummies)

dataset <- read.csv("HRdata.csv")
dataset <- dummy.data.frame(dataset, names = c("sales", "salary"), sep = "_")
dataset <- dataset[-c(15,19)] 

# Columns 15 and 19 represent sales_RandD and salary_high which are removed to prevent the dummy variable trap

Training the Logistic Regression Model

Next, I train a Logistic Regression Model and check that it can successfully generate predictions for new data.

 

# Create logistic regression model
glm_model - glm(left ~ ., data = dataset)

# Generate predictions for new data
newdata <- data.frame(satisfaction_level = 0.5, last_evaluation = 0.5, number_project = 1, average_montly_hours = 160, time_spend_company = 2, Work_accident = 0, promotion_last_5years = 1, sales_accounting = 0, sales_hr = 0, sales_IT = 0, sales_management = 0, sales_marketing = 0, sales_product_mng = 0, sales_sales = 1, sales_support = 0, sales_technical = 0, salary_low = 0, salary_medium = 1)
prediction <- predict(object = stack.rf, newdata = newdata) 

print(prediction)

Executing the above gives us a probability of 0.193 indicating that this employee has low risk of leaving.

Saving the R-Built Model

Since the goal is to use our very own R-built model in Microsoft Azure Machine Learning, we need to be able to utilize our model without having to generate the above code over again. We run the following code to save our model and all of its parameters:


saveRDS(glm_model, file = "glm_model.rds")

Running this code will save the glm_model.rds file in the active working directory within R-Studio. In this demonstration, the glm_model.rds file is saved to my desktop.

glm_modelrds

Creating Package Project Environment

The next couple of steps are crucial in ensuring that we end up with a package file that can be uploaded to Azure ML. The reason for creating this package is to ensure that Azure ML can call on our logistic regression model to generate predictions.

First, we must initialize the package creation process by starting a new project. In R-Studio this achieved by the following:

  • Click “File” in the top left corner
  • Click “New Project…” and a pop-up screen will appear

    new_project_popup.PNG

  • Click “New Directory”

    new_package

  • Click “R Package”

    glm_createproject

  • Type in a package name. Here I used “azuremlglm” as my package name. Make sure to create the project folder by setting a project subdirectory. Here, I used my desktop as the location for this project folder.
  • After clicking “Create Project”, a new R-Studio working environment will open with the default R file being “hello.R”. Since I saved my project to my desktop, I also noticed that a new folder was created.

    glm_azuremlglm

  • Now we are set to build our package. Within our package environment in R-Studio, we can close the “hello.R” file and create three new R scripts by hitting ctrl + shift + N twice. These three scripts will be needed in the following sections.

Filling the Package with Necessary Items

By successfully setting up the package creation environment, we are now free to fill this package with anything that we may find useful in our predictive modelling pipeline. For the purposes of this demonstration, this package will only include the logistic regression model built from the first section, and a function that Azure ML can use to generate predictions.

Before writing anything in the new R script, we write the following R code in the first script to add the glm_model.rds file to our package. To better accomplish this, we can drag the .rds file to the azuremlglm folder since the R-Studio working directory is that project folder.


# Read the .rds file into the package environment
glm_model_rds <- readRDS("stack_randomforest_model.rds")

glm_workingenvironment_1

By reading in the .rds file that contained our logistic regression model into the package environment, we are now free to utilize the model in any way we wish. It is important that we save this script within the project folder. Here, I saved it as glm_model_rds.R as seen on the tab.

glm_tab_1

The Prediction Function

Since the primary use of this package is to utilize the logistic regression model to produce predictions, we need to create a function that takes in a data frame containing new data and outputs a prediction. This is very similar to the prediction verification procedure we did in the first section after building the model and using the predict function on new data.

In the new R script that we created, we write the following R code:


# Create function that allows Azure ML to generate predictions using logistic regression model

prediction_function <- function(newdata) {
 prediction <- predict(glm_model_rds, newdata = newdata)
 return(data.frame(prediction))
}

Here, I saved this function as prediction_function.rds

glm_predictionfunction

The Decision Function

When Azure ML receives new data and passes its arguments to this function, we expect the resulting predictive web-service to produce the predicted probability. What if our decision process required more than just the predicted probability?

The added benefit of being able to create your own models and packages to use in Azure Machine Learning are tenfold. In many cases, you may want the Azure ML API call to output decision processes as a result of the predictions created by your machine learning model. Consider the following example:


# Create function with decision policy

decision_policy <- function(probability) {
 if (probability < 0.2) {return("Employee is low risk, occassional check-up where necessary.")}
 else if (probability >= 0.2 & probability < 0.6) {return("Employee is medium risk, take action in employee retention where necessary.")}
 else (return("Employee is high risk, notify upper management to ensure risk is mitigated in work environment."))
}

This decision function takes the logistic regression model’s predicted probability of a new observation and applies a Human Resource policy that meets the organization’s needs. As you can see, instead of a predicted probability, this function is recommending some form of action to be taken one the predictive model is used. It is possible that the result of a predictive model can trigger many different company-wide policies, no matter what the industry-specific application.

Here’s another example in the alternative business-financing industry. A predicted probability of risk to a specific business owner can trigger different loan-product pricing policies, and trigger different employee actions to be taken. If the Azure ML API call can output a series of policies and rules, there is huge value in being able to automate decision processes in order to get that loan out faster or rejected faster.

Creating your own models in R and including decision policies within your R packages could be the solution to an automated decision process within any organization.

Now, back to package creation. Given the newly created decision_function, we need to be sure to update our prediction_function to be able to implement these new policies.


# Create function that allows Azure ML to generate predictions using stacked model

prediction_function <- function(newdata) {
 prediction <- predict(glm_model_rds, newdata = newdata)
 return(data.frame(decision_policy(prediction)))
}

With the prediction function and decision function ready to go, it is important that we run these functions so that it is saved within the package environment.

It is also important that we save this R script within the package folder. Here, I saved the decision_policy.R function and re-saved the prediction_function.R as shown in the tabs.

glm_decisionfunction

glm_predictionfunction_2

Once these three separate R scripts are saved, we are ready to build and save our package. To build and save the package, we do the following:

  • Click the “Build” tab in the top right corner

    stack_build

  • Click “Build & Reload”

    glm_buildandreload

  • Verify that the package was built and saved by going to the R library folder, “R/win-library/3.3”. Here, my library is saved in my Documents folder.

    glm_azuremlgml_doc

  • With the package folder from above, you want to create a .zip file of it. You can do this by right-clicking the file, going to “send to”, then selecting “Compressed (zipped) folder”. After doing so, it will create a .zip file of your package. Do not rename this .zip file. I also proceeded to drag this .zip file to my desktop.

    glm_1_azuremlglm

  • This next step is extremely important. With the newly saved .zip file, you want to create ANOTHER .zip file of it. The reason for this is because of the weird way that Azure ML reads in package files. This time I renamed the new .zip file as “2_azuremlglm”.  You should now have a .zip file that contains a .zip file that contains the actual azuremlglm folder package. You can delete the first .zip file created from the previous step as it is no longer needed.

    glm_2_azuremlglm

  • This is the resulting package file that will be uploaded to Azure ML.

Creating a Predictive Web Service in Microsoft Azure Machine Learning

We are in the final stretch of the implementation process! This last section will describe how to configure Microsoft Azure Machine Learning to utilize our logistic regression model and decision rules.

Uploading the Package File and Creating a New Experiment

Once we have logged in, we want to do the following steps:

  • Click the “NEW” button in the bottom left corner, click “DATASET”, and then click “FROM LOCAL FILE” as shown

    stack_azureml_newdataset

  • Upload the .zip file created from the previous section

    glm_uploaddataset

  • When the upload is successful, you should receive the following message at the bottom of the screen

    glm_uploadcomplete

  • Next, we create a new blank experiment. We do this by clicking the “NEW” button at the bottom left corner again, click “EXPERIMENT”, and then click the first option “Blank Experiment”

    stack_blankexperiment

  • Now we are ready to configure our Azure Machine Learning experiment

    stack_azuremlenvironment

Setting up the Experiment Platform

In order for Microsoft Azure Machine Learning to utilize our logistic regression model, we need to set up the platform in such a way that it knows to take in new data inputs and produce prediction outputs. We accomplish this with the following layout.

glm_layout

  • The Execute R Script on the left defines the schema of the inputs. This module will connect to the first input node of the second Execute R Script. The code inputted in this module is as follows

    glm_schema

  • A module was placed for the 2_azuregmlglm.zip package so that it can be installed within the Azure ML environment. This module is inputted into the third node of the second Execute R Script module.
  • The Execute R Script in the center is where we utilize the logistic regression model package. This module contains the following code

    glm_readpackagemodule

  • Once all of the above are satisfied, we are ready to deploy the predictive web service.

Deploying the Predictive Web service

  • At the bottom of the screen, we will deploy the web service by clicking on DEPLOY WEB SERVICE”, then clicking “Deploy Web service (classic)”.

    glm_azuremenu

  • Azure ML will then automatically add the Web service input and Web service output modules to the appropriate nodes as follows

    glm_azuremllayout2

  • The Web service output automatically connected to the second output node of the Execute R Script module. We actually want this to connect to the first output node of the Execute R Script as shown

    glm_firstnode

  • Click the “RUN” button at the bottom of the screen to verify the web service
  • Click the “DEPLOY WEB SERVICE” button once again, and select “Deploy Web service (Classic). The following page will show up

glm_apipage

  • Finally, we are able to test that our predictive model works by clicking the blue “Test” button in the “REQUEST/RESPONSE” row.

    glm_enterdatatopredict

  • After confirming the test, we should get the following result

glm_outputachieved

  • This confirms that our predictive model works and all decision policies have been correctly implemented. The API is ready to go and can be consumed by external applications.

Further Considerations

Throughout this post, I showcased the process of implementing a simple predictive model using R and Microsoft Azure Machine Learning model. Of course, there are much more efficient ways of utilizing predictive models such as directly using the platform of Azure ML  to train, validate and test machine learning models, or directly using the Execute R Script module and doing all the R hard-coding there.

I want to emphasize that the process outlined here may seem less efficient to build and carry out, but I think it offers a good way to organize and automate decision pipelines. By going through the process of building and creating R packages that can then be uploaded to Azure ML, we are able to implement many decision rules within the R package. For example, an organization may choose to implement several product pricing rules  or internal decision policies as a result of what the predictive model outputs. There is plenty of room to automate these decisions for faster turnaround of work. Creating packages also gives us the ability to train, validate, and test more complex machine learning models and saving their results accordingly. I am sure there are plenty of other reasons and uses than the ones I stated here in which building your own machine learning R packages and then uploading it to Azure ML is highly beneficial.

In the future, I look to implement this process by using more complex machine learning models rather than the simple logistic regression. I also look to learn some more software application development as this is clearly not the end of the data science pipeline. With Azure ML producing an API, it would be nice to be able to see the full extent of this pipeline by utilizing the API through my own created applications. Finally, some important takeaways from this post are the abilities to organize and automate an operational data science pipeline and the thought-process behind automating company-related decisions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s