The Democratization of Data: Modern AI and Automated Machine Learning

16 min read

Recognition of emotions is one of the great white whales of modern AI.

It’s innate for humans. We can look at someone’s face and understand how they’re feeling. Transferring that skill to machines has proven far trickier. Companies and academics worldwide are laboring to refine their data science techniques and produce more accurate emotion-recognition software.

So one can perhaps imagine Angela Beasley’s surprise when a group of her undergraduate students handed in a class project with their own, novel approach to emotion recognition. An assistant professor at the University of Texas at Austin, she was teaching an Intro to Data Mining course at the time. Many of the enrolled students had no previous data science experience, and none of the members of this particular group did.

Yet they’d produced work akin to that being published in research journals. How could this happen?

In truth, Beasley had been expecting to see unusual results from this particular project. Her class had been working with the Darwin automated machine learning (or autoML) software and was competing to see which group could use this technology to produce the most interesting problem—and most elegant solution.

Still, even armed with autoML, the students’ capabilities had exceeded her expectations. “This was their first data-science class,” she says. “Two-thirds of the students were computer-science minors, so they weren’t even majoring in this. For the non-majors, this was their first exposure to machine learning at all. But you could take an undergrad with no experience, and they could do this sophisticated emotion recognition work.”

Birth of the Citizen Data Scientist

Traditionally, only fully trained data scientists have been able to create machine-learning models. But data scientists are in short supply. As recently as 2016, it was reported that there were just 6,500 people listing themselves as data scientists on LinkedIn, but 6,600 job listings for data scientists in San Francisco alone. Since then, the demand for data scientists has continued to outstrip the supply.

Even companies that have data scientists often aren’t able to make full use of their talents. Instead, these teams are bogged down with the constant updating, tuning, and retraining of models, rather than being free to work on novel, creative problems that could create new value for the organization.

When data scientists are available, building an AI model by hand is a lengthy and difficult process. Man-made models often struggle to scale across large operations and are too brittle to handle edge cases or minor changes in asset variables.

This is precisely why automated machine learning matters: because it makes data science genuinely accessible—and valuable—to people and to organizations.

At its core, automated machine learning is software that allows non-data scientists to create machine learning models. It does so by automating and accelerating many of the time-consuming tasks involved in model creation, including data cleaning, feature generation, and architecture search. In this way, autoML has given rise to what many are calling the “citizen data scientist,” which is another way of saying that data science is now in the hands of subject matter experts, non-technical personnel—even undergraduate students.

“The feedback [from the students] was that this is super easy,” says Beasley of the autoML contest. “They were like, wow, I can really do this in one line of code? They were very impressed with that.

“There was no difference in performance between majors and non-majors,” she adds. “The non-majors did just as well as the computer science majors.” In fact, of the three winning groups, all were composed of students not majoring in computer science.

The Human Element

None of this is to say that automated machine learning can do all the work on its own, and data scientists certainly aren’t going to be rendered obsolete any time soon.

“This is not going to put data scientists out of their job,” Beasley says. “It’s going to help them do their job. This will enable them to build their models faster, maybe with a smaller team.”

After all, the data science process is more than just creating and optimizing models. To start with, there needs to be an actual business use case—a problem for the model to solve. Otherwise, what need is there for autoML at all? Naturally, no AI is going to be able to come up with a valuable project on its own. This requires human expertise.

Next, the problem needs to be translated into a machine-learning model. How will the data be explored? What needs to be predicted, and what insights need to be produced?

“Darwin actually does some basic feature engineering, like scaling,” Beasley says. “So some of that can be automated, but not all of it. A lot of it requires human knowledge of the data and application. You definitely need domain expertise.
“If you have automated machine learning building a model, the real thing you need to understand is the data, and what’s important about the data. Whatever’s important about the data, that’s what needs to get passed into the autoML.”

The students who created a model for emotion recognition, for instance, did so by mapping out the major coordinates of the face and then measuring distances between those coordinates. The distance between the top lip and bottom lip was found to be surprisingly useful, connecting how far open someone’s mouth is with what kind of emotion they’re likely to be expressing.

“That’s what’s meant by feature engineering,” says Beasley. “It’s pulling out the important information you need.”

A Revolution in Data Science

What makes autoML exciting, then, isn’t that it eliminates the need for data scientists. Rather, it allows data scientists, subject matter experts, and other skilled personnel to shine.

Beasley believes that autoML is going to allow for more creative approaches to problems going forward. “It could be used for anything. That was one thing that was interesting about using Darwin, was the variety of projects that came out of it.”

The first-place winner of her class competition used autoML to create a model that could scan any online article and determine the level of political bias in the text. The students pulled comments left on Reddit in November 2018, specifically from the dedicated forums for various political ideologies. Using this as, essentially, pre-labeled data, they were able to train their model on the type of language used by each political group. The finished product was able to predict the political bias of any web article with 84.5 percent accuracy, and the students envision turning their work into a browser plug-in to help users be more aware of biased information sources. “I don’t know of anyone else working on that,” Beasley says.

In second place was the group who worked on emotion recognition. Meanwhile, the third-place winners built a model to predict the likelihood of getting a right swipe on Tinder.
“I think [the students] were able to do these more advanced problems because they didn’t have to write the code to build the model, so they had time to focus on the feature engineering, and think about ‘How do I write the code for a face?’”

That kind of critical and creative thinking is the future of data science, where humans will come up with big ideas, and machines will execute on them. Beasley’s hope is that this is the lesson her students take away from their experience—that considering how, and why, to build a model is just as important as choosing the right model architecture. It may just be the single most important skill for data scientists, “citizen” or otherwise, in the years to come.

“I want [the students] to think about what they’re modeling and the ethical implications of what their model is going to do. How can you use this for good? So I really enjoyed seeing a couple of the teams do these feel-good projects.

“One team took data from the Austin animal shelter and built a model predicting how likely a dog was to get adopted … They proposed the shelter could market the less likely dogs more, like taking them to more events,” she recalls.
The question she asks of all her students: “What’s the ‘so what’ about what you just did?”

2020 Vision: Artificial Intelligence and CybersecurityPrevious Article2020 Vision: Artificial Intelligence and Cybersecurity Hard Reset: Why the world needs an operating system reboot for the AI era—led by AmericaNext ArticleHard Reset: Why the world needs an operating system reboot for the AI era—led by America