The Machine Takeover
We’ve been using machines for longer than we’ve even had a word for them. They’ve been wheelbarrows, spinning wheels, wagons, even abaci. When you break it down, machines are essentially different technologies that we, as the human race, have created to help make our day-to-day lives a little easier. It wasn’t actually until fairly recently that we began to associate “machines” with “electronics.” And when we do talk about machines today, we almost exclusively talk about them in terms of automobiles or electronics we use in our daily lives.
The use of machines as a replacement for people in the workplace is a growing trend, and in the world of business and data, it’s become crucial to business analysis. Every day, more and more business analysts are turning to something called automated machine learning to help them sift through mounds of data relevant to their business models.
What is automated machine learning?
Automated machine learning is the method of building a system, process, or application for the purpose of automatically creating, maintaining, and testing machine learning models with the use of as little human input as possible.
Okay, but what is machine learning?
It’s the ability of a machine to make predictions based on patterns in data and the theory that computers don’t necessarily have to be programmed to learn how to perform different tasks. As new data comes in, machine learning models are able to adapt from their previous computations to produce reliable results. If you’ve ever wondered how Netflix is able to suggest the perfect show for you to binge watch next, or how Amazon is able to offer products similar to the ones you’ve searched, all of that is based on machine learning.
How does it work?
Computers use the CRISP-DM cycle to find patterns in large data sets. CRISP-DM stands for cross-industry standard process for mining data, which is just a fancy way of telling you exactly what the cycle does: find patterns. It does this by breaking data mining down into 6 major phases:
Phase 1: Business Understanding
This phase is broken down into four tasks: determining the business objectives, assessing the situation, determining the data mining goals, and producing a project plan.
Business Objectives
Try to understand what the customer is trying to accomplish from the perspective of your business. Knowing what your customers are looking for can help you to better predict what the outcomes of your project will be. In order to accomplish this, you should set your goals, produce a plan for your project, and lay out the criteria for success in your business.
Assessing the Situation
You’ll need to take an inventory of your resources; list out the requirements, assumptions, constraints, risks and contingencies; put together a glossary of key terms; and construct a cost-benefit analysis for the project.
Determining Data Mining Goals
This is, in short, presenting your project goals using technical terms.
Producing a Project Plan
Discuss how you intend to reach your goals. This includes the stages of the project and their durations, and an initial assessment of the tools and techniques you are hoping to implement.
Phase 2: Data Understanding
The second phase of the CRISP-DM cycle is broken up into three sections: collecting data, describing data, and exploring data. Each step of this phase is finished out with a report.
Initial Data Collection Report
This first report lists the data that was acquired along with its locations, the methods that were used to acquire the data, and any complications that you may have encountered. This step is most important for making sure any future replications of the project can happen.
Data Description Report
This second report describes the data by its format, its quantity, and the identities of the fields. You should use these results to evaluate whether the data you’ve acquired satisfies your requirements.
Data Exploration Report
This final report should describe the results of the data you’ve explored, including your initial findings or hypotheses and their potential impact on the remainder of the project.
Once you’ve completed all your reports, you should also take steps to measure the quality of your data and make one more brief report on that quality. If there are any problems, be sure to offer up some solutions.
Phase 3: Data Preparation
Stage three is all about deciding what data you will actually use for your analysis. It’s broken up into four segments: selecting data, cleaning data, constructing the required data, and integrating your data.
Selecting Data
The best practice for selecting your data is usually deciding how relevant it is to your initial data mining goals and understanding the quality and technical constraints of your data. Once you’ve determined which data you’ll be using, create a list of the data that will be included and excluded include a rationale for your decisions.
Cleaning Data
Cleaning your data is all about making sure the quality of your data matches the level of quality you will need for the analysis techniques you’ve selected. Once you’ve adjusted the quality of your data, draft up a report describing what actions you took to address any quality issues.
Constructing Required Data
Constructing your data means preparing operations such as the production of new attributes that are constructed from one or more existing attributes in the same record or creating an entirely new record.
Integrating Data
Integration is the process of combining information from multiple databases, tables, or records in order to create new records. This can be done either by merging or aggregating information.
Phase 4: Modelling
For the fourth stage, you need to select a modelling technique to use for your analysis. To do this, you need to consider what assumptions each model will make about your data, and choose at least one modelling technique accordingly. If you are using more than one technique, you will need to perform this task separately for each one. Once you’ve chosen your technique, you can move on to generating a test design for your model, building your model, and assessing you model.
Phase 5: Evaluation
The fifth stage of the cycle is all about assessing the results of your model according to business success criteria. To do this, you should summarize your assessment and include a final statement about whether your project already meets the intended business objectives or not. After this, you should review the data mining process to determine if anything has been overlooked. Finally, you should make a list of your potential next steps, including the pros and cons of each option, and describe the decision you’ll be making to proceed, along with your rationale.
Phase 6: Deployment
The final stage is all about determining a strategy for the deployment of your evaluation results. In order to do this, you’ll need to plan the deployment, plan the monitoring and maintenance of your data mining strategy, produce your final report, and review the project as a whole.
There is not a fixed sequence of these phases, and moving between them is actually necessary.
Why use automated machine learning?
The purpose of automated machine learning is to empower business analysts to train a large number of models and produce the best one with as little configuration as possible. Automation doesn’t just apply to the training and selection of models - more are starting to appear for data wrangling and data visualization, too.
Can automated machine learning fully automate the Data Science Cycle?
The answer to this question isn’t so clear-cut. For the sake of simplicity: sometimes. According to expert Simon Schmid, automated machine learning can fully automate for standard data science problems. In standard data science problems, the data is general and there are no unbalanced classes. You can generally expect there to be no surprises in the end.
When it comes to more complex data science problems, however, it’s not this simple. Though the goal of automated machine learning is to eliminate human error, more complex data science problems can turn out to be impossible for the machine to compute, which means human input is necessary. It may even be beneficial because it can allow for customization of the phases.
Most business analysts will do a combination of both of these through Guided Analytics. With Guided Analytics, you can intersperse your workflow with interaction points to steer the application in different directions, as you need.
What is Guided Analytics?
Guided analytics is the addition of interaction points in the data pipeline, usually placed between the sequence of steps data goes through during analysis. Data processing or analytics applications are for a wide range of people - not just the ones developing them. Interaction points along the pipeline give others the opportunity to tweak the model as needed for their own data.
What is Guided Automation?
It is a combination of Guided Analytics and Automated Machine Learning. With it, you can ask the business analyst to add their expertise when you need it, but still automate the standard parts of the analysis.
Automated learning is an important tool that allows data analysts to sift through information more efficiently than ever before. Not sure how to leverage the power of automation to streamline your business operations? Reach out today for a little friendly advice -- it's one of our specialties 😎