Bringing AI to MLB

22 min read

Ingrained in every player from Little League to the MLB is the idea that baseball is a mental game.

Yogi Berra famously proclaimed, “Baseball is 90% mental and the other half is physical.”

Of course, games are won on the backs of the incredibly talented players residing in Major League Baseball. But series, divisions, and ultimately World Series championships are won in the minds of the players, managers, and general managers of baseball organizations.

In a 162-game season, managers and general managers make countless decisions factoring in the opposing teams, upcoming schedule, and numerous other data points that affect the game of baseball.

The decision-making ability of the front office is the main factor of success, and strings of bad decisions often result in a termination (e.g., the Seattle Mariners’ disastrous front office from 2004–2015).

Ending the Mariners’ Playoff Drought

The MLB is at the forefront of the data analytics revolution in professional sports by housing entire analytics departments within front offices, hiring Ivy League data scientists, and replacing traditional, gut-feeling GMs with their analytical counterparts (like the Mariners replacing Jack Zduriencik with Jerry Dipoto after the 2015 season).

Collecting everything from Aaron Judge’s league-leading maximum batted ball exit velocity of 121.1 mph to Albert Pujols cellar-dwelling sprint speed of 23.0 ft/sec, the MLB has hordes of information analyzed by their team of data analysts, statisticians, and mathematicians to dive into the insights and patterns in the data.

The next step in the analytics progression of Major League Baseball is surely AI-powered technology to support the front office in reaching better data-driven decisions. Artificial intelligence is being used in similar instances across a wide variety of industries like oil and gas and utilities so the time is as opportune as ever for Major League Baseball.

A Data Game

To put it frankly, utilizing these technologies is all about the data. Whether it’s temperature, pump pressure, and vibration monitoring in an oil rig or batted ball exit velocity and player sprint speed, data is data to a machine powered by AI technology.

Artificial intelligence excels at finding subtle patterns and hidden insights in data sets of all shapes and sizes, particularly under complex or changing conditions.

Specifically, machine learning, deep learning and automated model building are all AI-powered technologies able to influence baseball and the decisions made in America’s pastime. These changes would assist in delivering a model that could learn complex patterns, incorporate human feedback, derive rules, and display reasoning for proposals given by the cognitive system.

My Oh My, Help The M’s

A successful proof of concept with an MLB team would be needed before breaking into the entire MLB.

Like the Athletics in the mid-to-late ’90s, the Seattle Mariners could use a breakthrough from a string of disappointing seasons.

While Moneyball is no longer a novel principle, introducing cognitive analytics and AI technology into their front office to break free of their playoff drought could be the perfect fit.

A team plagued by historically bad luck in everything from injuries to blockbuster trades, the Seattle Mariners have enjoyed few successes since their record-tying 2001 season.

Since the 2000s, the Mariners have gone through four GMs, 10 managers, and nine losing seasons.

Although they were slated to break their playoff drought in three of their last four seasons, the 2017 Mariners once again fell below expectations.

With fears of stretching the drought to 17 seasons, the M’s are the perfect example of a team that could benefit from the incorporation of AI technologies in their front office’s decision-making.

Specifically, three areas that stand to be improved by AI include scouting potential players, preventing player injuries/optimizing player productivity, and player management.

Moneyball 2.0

Scouting has a lot of moving parts and certainly is not an exact science. For the Mariners, current GM Jerry Dipoto integrates a heavy dose of objective statistical analysis and sabermetrics (Moneyball 1.0) with subjective evaluations from scouts to constitute their scouting strategy.

When building an AI model for the M’s, Dipoto and the front office could develop ideas and parameters of what their ideal players look like for each position, a baseline that the model will try to attain.

For all MLB teams, scouts are held accountable for delivering reports on prospects’ current and potential mental, physical, and medical factors and advise if a player is worth the team’s resources.

Although a subjective analysis from the scouts, natural language processing, an AI-powered technology, could pull insights from their information and enable a model to provide in-context answers regarding a specific player.

This combination would present a smooth integration between the subjective and objective aspects of the Mariners’ strategy.

Utilizing the wide range of databases that aggregate MLB data today such as Baseball Reference, Baseball Prospectus, or FanGraphs, an AI-powered model could also pull insights from databases that include player statistics, sabermetrics, medical information, injury reports, MLB payroll information, and more to give the Mariners players that fit their parameters.

As teams chase the best players for the best prices to work under their salary cap, this AI model could garner insights and patterns across data flows rather than using undervalued statistics, essentially creating Moneyball on steroids.

Foreseeing the Season

For example, after digesting historical data on past and present MLB players and comparing it to the data gathered on a prospect, an AI model could find that prospect A shares a similar throwing motion, stature, and Fielding Independent Pitching rating as player C.  These similarities could also be recognized by the model as indicators of similar future performance.

With this in mind, player C excelled his first three years in the MLB but ultimately struggled with arm injuries and ineffectiveness until retiring two years later.

The Mariners would then have to decide if the string of similarities between the two is enough to warrant them away from offering a contract to the prospect.

While analysis of these similarities are currently conducted by humans, AI-powered techniques can observe the data from every angle and provide insights at a much higher level.

A Farewell to Tommy John?

Although scouting for quality players is important for building a good team, mitigating the effects of the biggest inefficiency in baseball — player health — deserves more attention from teams.

Teams such as the Mariners currently rely on medical personnel to predict and prevent the risk factors associated with major injuries. However, through the utilization of Statcast data and sabermetrics, AI could provide better prediction and prevention of injuries as well as performance optimization on an MLB team’s greatest assets: the players.

AI technologies are already being applied to the most important assets in industrial settings, so why should it not be applied to the multi-million dollar assets that MLB players are to their respective teams? The average salary for a Mariners’ player is over $2 million per year, with Robinson Cano exceeding $24 million a year as the team’s top earner.

With this type of payout, the M’s should be exhausting every tool and technology to ensure their players are at optimum performance and are not at risk for injury.

For AI-powered technologies to be used in injury prevention and performance optimization, access to Statcast data would be needed. Statcast, MLB’s relatively new analytics tool, tracks players on the field through a combination of radar technology and cameras.

The system generates roughly seven terabytes of data per game (the size of approximately 3,000 Netflix movies), and much of this data is accessible only to teams. The sheer amount of data generated means it would be impossible for a single analyst or even a team of analysts to sift through the entirety of player-movement data available over the course of a season.

A Little Leaguer wouldn’t be expected to hit Clayton Kershaw’s curveball; relying on humans to analyze one season’s worth (approximately 17,010 TBs for those interested) of player-movement data is equally ridiculous.

The Secret Stats

Uncovering hidden insights in player movement data would give teams information on possible connections between the anomalies in movements that lead a player to be more at risk of an injury.

Similarly, data on players’ movements would allow an AI model to understand how to maximize the effectiveness of players and give teams a prescription on where the greatest inefficiencies lie.

For example, a model would potentially uncover that Jarrod Dyson, the Mariners’ current center fielder, could improve his batting average and on-base percentage by taking advantage of the opposing team’s defensive alignment with a drag bunt to a specific place on the infield.

While Dyson might not make the perfect bunt each time and teams may catch on to the strategy, the model will continue to learn and adjust based on the data.

Conventional Thinking Goes Out the Window

On top of making decisions that affect the team long term like scouting and injury prevention, an MLB manager must make numerous day-to-day decisions. Anything from the upcoming schedule to the historical matchups of a hitter versus the other team’s pitcher needs to be analyzed before deciding who to play and where they provide the biggest value add for the team on that given day.

Many factors can affect the outcome of a baseball game — rarely is there just one decision that can chalk up a win or be blamed for a loss. An AI model in player management could realistically factor in all analytical aspects of the game to provide a manager with an enhanced scouting report.

To level out the unpredictability in baseball, accounting for any and all analytics allows insight into the predictable factors that can impact an MLB game.

An AI-powered model could conduct matchup analysis and prepare for different scenarios, allowing a manager to make better data-driven decisions such as delivering the optimum lineup against that day’s opponent.

Other game-day decisions that could benefit from a data-focused approach could be ensuring a rested lineup, efficiently planning pitching assignments with insight on the upcoming schedule of games, and player positioning within the game.

Saving the Team

For this aspect, an AI-powered model could benefit the 2017 Mariners by allowing team manager Scott Servais to see that Felix Hernandez is slated to pitch against the Los Angeles Angels, a team that he has a history of bad performance against.

While the Mariners are short on quality starting pitchers due to injuries, the model could prescribe an unconventional strategy to start a relief pitcher against the Angels and switch relief pitchers every two or so innings to keep the Angels off-balance.

The model could provide insight into the predicted success of multiple pitchers versus King Felix against the Angels, and the manager could then make a better data-driven decision to give the M’s the best probability of winning.

Where winning is in the front of everyone’s minds, MLB front offices are pulling out all the stops to help their teams to victory.

And if the Seattle Mariners are looking to finally end their 16-year playoff drought, reaching better data-driven decisions by bringing AI to the MLB on scouting, performance optimization, and player management will help them come out on top.

China and its Growing AI AdvantagePrevious ArticleChina and its Growing AI Advantage Boeing Looks Beyond the Horizon in its Second CenturyNext ArticleBoeing Looks Beyond the Horizon in its Second Century