Patient flow in a hospital is an extremely complex and interrelated process with many critical factors impacting it. Analyzing and documenting delays is a time consuming and manual task.
The root cause analysis for the delays in the patient flow process in this example were developed by a cross-functional multidisciplinary team of physicians, nurses, care management, transport, housekeeping, registration, finance and information technology over a six-month period. This group was tasked with determining the reason for delay in the inpatient admission process originating from the emergency department by analyzing each step of the process for top 20 longest admissions daily.
However, the investment in time can be leveraged by using a machine-learning model to predict future bottlenecks. Eliminating these bottlenecks will improve key performance indicators such as left without being seen (LWBS) in the ED, lower the length-of-stay (LOS) for inpatients and increased patient satisfaction. Improving patient satisfaction is hard to measure but is invaluable. Improving LWBS and LOS translates to better patient care and increased revenue.
The process begins when a patient presents in the emergency department and ends when they are checked into an inpatient bed. The figure on the left is a list of the steps in the process with their definition and data source that chose to use.
The data for the model was extracted from two hospital systems, the emergency department information system (Tsystem) and the patient transport system (TeleTrack).
The DayOfWeek is derived from the date that the patient presented at the emergency department. The day of the week is a significant variable because ED volume may be higher on certain days as contribute to a delay in care.
The TotalTime is calculated by adding the EDTotalTime to BedRequestTotalTime.
Selecting the data to build your model is subjective. There were other data available but these are the core data elements I determined were necessary to build the model.
Download the model training data by clicking the button below or create your own. The data is in excel format and will need to be converted to a comma delimited file (.csv) before it can be used in theSwift Playground in the next section to build the model.
There are 345 training records in this file. We are developing a supervised training model. Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. In this case the input of time steps has been provided by two hospital systems, Tsystem and TeleTracking and the output or reason for the delay was provided by the cross-functional multidisciplinary team of subject matter experts referred to above.
There are 17 fields that the model will use to determine the best "Result". These time value fields have been converted to numeric values. The "Result" is a pointer to the full admission training record for each patient without any personally identifiable information.
We are going to use Apples CoreML framework to build our model in a Swift playground. You will need to use an Apple Mac with Xcode and the Core ML framework to build your model and integrate machine learning model into your app.
A model is the result of applying a machine learning algorithm to the training data. The first step in building the model with CoreML is selecting the best algorithm for the data.
Create a Swift Playground on your Mac in Xcode. We are going to use the code below to determine the best machine learning algorithm to build the machine-learning model. Select the following code below, copy and paste it into your Swift Playground. Run it.
//
// PatientFlowModel.playground
//
//Create a data table from a CSV file.
let trainingCSV = URL(fileURLWithPath: "/Users/jburke/Developer/Machine Learning/PatientFlow/Data/Training Data/Top20Data-Training-V2.csv")
let top20Data = try MLDataTable(contentsOf: trainingCSV)
//Randomly split the training data to use some of the data to test the model
let (trainingData, testData) = top20Data.randomSplit(by: 0.8, seed: 0)
//create Model
//The targetColumn "Result" is a sequential integer number that will be a pointer to the
//full time sequence of the training record for comparison to the test record
let patientFlowML = try MLClassifier(trainingData: top20Data, targetColumn: "Result")
//Evaluate Model
let metrics = patientFlowML.evaluation(on: testData)
let trainingMedtrics = patientFlowML.trainingMetrics
let validataionMetrics = patientFlowML.validationMetrics
//Save the model in the ML Model directory
var outputURL = URL(fileURLWithPath: "/Users/jburke/Developer/Machine Learning/PatientFlow/ML Model/PatientFlowML.mlmodel")
var modelMetadata = MLModelMetadata(author: "John Burke",
shortDescription: "MLLogisticRegressionClassifier from patient flow dataset /Users/jburke/Developer/Machine Learning/PatientFlow/Data/Training Data/Top20Data-Training-V2.csv",
license: nil,
version: "2.0",
additional: nil)
try patientFlowML.write(to: outputURL, metadata: modelMetadata)
Using the MLClassifier, the code will walk through each classifier to determine the best classifier for the tabular training data in our CSV file. MLClassifier stepped through the following algorithms to identify the best to use to build the model:
The following is the output from running the code in the Swift Playground:
column_type_hints = {}
Finished parsing file /Users/jburke/Developer/Machine Learning/PatientFlow/Data/Training Data/Top20Data-Training-V2.csv
Parsing completed. Parsed 100 lines in 0.0292 secs.
Finished parsing file /Users/jburke/Developer/Machine Learning/PatientFlow/Data/Training Data/Top20Data-Training-V2.csv
Parsing completed. Parsed 345 lines in 0.006635 secs.
Using 17 features to train a model to predict Result.
Automatically generating validation set from 5% of the data.
Boosted trees classifier:
--------------------------------------------------------
Number of examples : 322
Number of classes : 322
Number of feature columns : 17
Number of unpacked features : 17
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| Iteration | Elapsed Time | Training Accuracy | Validation Accuracy | Training Log Loss | Validation Log Loss |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| 1 | 0.316986 | 0.158385 | nan | 5.446630 | nan |
| 2 | 0.632418 | 0.440994 | nan | 5.120226 | nan |
| 3 | 0.951824 | 0.695652 | nan | 4.795928 | nan |
| 4 | 1.290456 | 0.838509 | nan | 4.473857 | nan |
| 5 | 1.657392 | 0.906832 | nan | 4.153214 | nan |
| 10 | 3.559111 | 0.993789 | nan | 2.567377 | nan |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
Random forest classifier:
--------------------------------------------------------
Number of examples : 322
Number of classes : 322
Number of feature columns : 17
Number of unpacked features : 17
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| Iteration | Elapsed Time | Training Accuracy | Validation Accuracy | Training Log Loss | Validation Log Loss |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| 1 | 0.293015 | 0.083851 | nan | 4.958216 | nan |
| 2 | 0.573376 | 0.167702 | nan | 4.963799 | nan |
| 3 | 0.859441 | 0.251553 | nan | 4.955386 | nan |
| 4 | 1.139366 | 0.260870 | nan | 4.956903 | nan |
| 5 | 1.420277 | 0.282609 | nan | 4.955885 | nan |
| 10 | 2.837098 | 0.434783 | nan | 4.944054 | nan |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
Decision tree classifier:
--------------------------------------------------------
Number of examples : 322
Number of classes : 322
Number of feature columns : 17
Number of unpacked features : 17
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| Iteration | Elapsed Time | Training Accuracy | Validation Accuracy | Training Log Loss | Validation Log Loss |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| 1 | 0.315552 | 0.158385 | nan | 4.719383 | nan |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
Logistic regression:
--------------------------------------------------------
Number of examples : 322
Number of classes : 322
Number of feature columns : 17
Number of unpacked features : 17
Number of coefficients : 11877
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes | Step size | Elapsed Time | Training Accuracy | Validation Accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
Warning: Reached max step size.
| 0 | 5 | 25.000000 | 0.031883 | 0.195652 | 0.000000 |
| 1 | 10 | 1.941160 | 0.048351 | 0.512422 | 0.000000 |
| 2 | 13 | 0.504503 | 0.056284 | 0.726708 | 0.000000 |
| 3 | 15 | 0.344625 | 0.062461 | 0.785714 | 0.000000 |
| 4 | 16 | 0.430781 | 0.066019 | 0.819876 | 0.000000 |
| 9 | 21 | 1.000000 | 0.087242 | 0.919255 | 0.000000 |
+-----------+----------+-----------+--------------+-------------------+---------------------+
Trained model successfully saved at /Users/jburke/Developer/Machine Learning/PatientFlow/ML Model/PatientFlowML.mlmodel.
Congratulations you just created your first machine-learning model using Apples CoreML tools. The MLClassifier identified the logistic regression as the best algorithm to use to build the patient flow machine learning model. Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable.
The current training accuracy for the new model is 91.9%. While this is good, it will not yield a high confidence level its predictions because we only have 345 training records. The next step is to improve the training accuracy of the model which we will do by tweaking the MLLogisticRegressionClassifier model parameters.
The model achieved its training accuracy through 9 iterations. We are going to increase the model iteration count significantly to 500 through the model parameters and assess the impact on the training accuracy. Insert the following code above prior to the model creation:
let modelParameters = MLLogisticRegressionClassifier.ModelParameters.init(validation: MLLogisticRegressionClassifier.ModelParameters.ValidationData.split(strategy: .automatic), maxIterations: 500, l1Penalty: 0.0, l2Penalty: 0.01, stepSize: 1.0, convergenceThreshold: 0.01, featureRescaling: true)
//create Model
Next modify the model creation statement to directly call the MLLogisticRegressionClassifier and pass the new model parameters. Select the following code, copy and paste it into your Swift Playground replacing the previous model creation statement.
//create Model
//The targetColumn "Result" is a sequential integer number that will be a pointer to the full time sequence of the training record for comparison to the test record
let patientFlowML = try MLLogisticRegressionClassifier (trainingData: top20Data, targetColumn: "Result", featureColumns: nil, parameters: modelParameters)
Run the code in your Swift Playground and create an updated model. Let's look at the output to see if we have been able to update the accuracy.
column_type_hints = {}
Finished parsing file /Users/jburke/Developer/Machine Learning/PatientFlow/Data/Training Data/Top20Data-Training-V2.csv
Parsing completed. Parsed 100 lines in 0.036646 secs.
Finished parsing file /Users/jburke/Developer/Machine Learning/PatientFlow/Data/Training Data/Top20Data-Training-V2.csv
Parsing completed. Parsed 345 lines in 0.005321 secs.
Using 17 features to train a model to predict Result.
Logistic regression:
--------------------------------------------------------
Number of examples : 317
Number of classes : 317
Number of feature columns : 17
Number of unpacked features : 17
Number of coefficients : 11376
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes | Step size | Elapsed Time | Training Accuracy | Validation Accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
Warning: Reached max step size.
| 0 | 5 | 25.000000 | 0.012445 | 0.198738 | 0.000000 |
| 1 | 10 | 1.902339 | 0.024261 | 0.369085 | 0.000000 |
| 2 | 13 | 0.561964 | 0.031674 | 0.652997 | 0.000000 |
| 3 | 15 | 0.369213 | 0.036885 | 0.668770 | 0.000000 |
| 4 | 16 | 0.461516 | 0.040503 | 0.823344 | 0.000000 |
| 9 | 21 | 1.000000 | 0.060122 | 0.962145 | 0.000000 |
| 49 | 86 | 0.991060 | 0.305215 | 0.996845 | 0.000000 |
| 99 | 143 | 0.909512 | 0.547187 | 0.996845 | 0.000000 |
| 370 | 496 | 0.679607 | 1.859748 | 1.000000 | 0.000000 |
+-----------+----------+-----------+--------------+-------------------+---------------------+
Trained model successfully saved at /Users/jburke/Developer/Machine Learning/PatientFlow/ML Model/PatientFlowML.mlmodel.
By increasing the maxIteration count to 500 we have improved the training accuracy for the model from 91.9% to 100%. after 370 iterations. Remember we are testing the model with 5% of the data we used to build the model (or 17 records) so just because we have achieved a 100% training accuracy does not mean that there will not be fluctuation in the confidence level of the models predictions.
One of the key lessons learned from this project is the relationship between the amount of data needed to train the model and the accuracy. More training data improves model accuracy with a higher confidence levels in the prediction. The more training data you have the better your model will be.
The machine-learning model is now ready to use in an app.
This video demonstrates how to add the machine-learning model to your iOS app.
The test data is a collection of patients that have experienced an admission delay for an unknown reason. We are going to use the machine learning model we built to predict the reason for the delay based in a supervised training model.
It is important that no patients from the training data are included in the test set.
Select a test patient to invoke the machine-learning model to predict the admission delay reason. The confidence level is the performance metric of the model in the result.
The result is a numeric pointer to the training data record containing the details and the reason for the admission delay.
The detail view is a side-by-side comparison of the test patient compared to the training patient record that the machine-learn model predicted.
This view allows you to further evaluate the prediction and reason for the admission delay.