Justin Smith, PhD, Director of Data Analytics, Sanford Health
Let us separate from reality for a moment and embark upon a thought experiment. Suppose you have a business problem to solve and your data is perfectly curated and readily available. Your Data Science team loads your data and goes to work. They train, test, and tune multiple machine learning algorithms to build models, one of which produces an outstanding predictive result. Armed with this new information you begin to make decisions that directly impact your business. Moving forward with this nearly infinite possibility for predictive output, how do you and your team decide which results are worthy of taking action upon?
Presently, there is no indication that we will collect less data in the future. In all aspects of life, sensors are smaller, less expensive to produce, and generate large volumes of data. Take for example that a Boeing 787 aircraft generates over half a terabyte of data per flight. Pairing the enormous increase in data and connectivity with machine learning driven probabilistic decision-making, humans will be tasked with deciding to take an action or not.
Continuing with the aviation example, knowing a certain mechanical part is due for service based on data collected from in-flight sensors (and machine learning techniques to process those data) is an easy choice.
The near-future state of machine learning is imploring leaders to consider not only the quality of predictive models, but also the consequences of knowing an event has a certain probability of occurring.
The consideration of what threshold marks the “must maintain” moment via an algorithm is still up for debate. If the part has a 1% chance of failure, should it be replaced? What if the chance of failure is 20%? Determining the threshold for action or inaction must be done proactively as a part of the development process, prior to the moment of decision-making. Machine learning allows for an increase in specificity of knowing when an event may occur and as the use of machine learning permeates further into daily life, leaders must understand that once something is known, it cannot be unknown.
Rarely, if ever, is data perfectly ready for data scientists to start applying machine learning algorithms, most of the time some cleaning or re-organizing or “Data Munging” is required. One trend that has dramatically increased the speed of development is the availability of “out of the box” algorithms. Previously, bleeding edge algorithms were only readily accessible in academic or tech giant research laboratories. The cultural shift to make much of machine learning open source has helped to decrease the cost of implementation. In addition to tools being open to all with an internet connection, high-quality training for machine learning has become accessible too. In short – predictions are cheap. Combining open source tools and training, married with open source data, leads one to think there is an army of machine learning experts (engineers, data scientists, citizens) solving pressing issues and making meaningful predictions to shape the world. While this is partially true, there remains a technical barrier similar to watching a non-tech native use a smartphone. As a result of the varied levels of formal training required to enter this field, we are witnessing a blurry dividing line between individuals creating ethical, well thought out models with maintenance plans and those creating flash in the pan models without longevity that tell leadership what they are wanting to hear. The near-future state of machine learning is imploring leaders to consider not only the quality of predictive models, but also the consequences of knowing an event has a certain probability of occurring.