Connor Heaton, Prasenjit Mitra
This paper draws upon recent advances in Natural Language Processing (NLP) and Computer Vision (CV) to learn to describe the way in which players impact the game in the MLB. In particular, this work views the game as a sequence of events - instead of a set of summary statistics describing said events - and trains machine learning models to describe the impact that a given sequence of events has on the game. The models describe a sequence of events for a single player over a relatively small time period; so we refer to the model output as player form embeddings - descriptions of how they have impacted the game in the short term. We demonstrate how these embeddings can be used to describe players over the short- and long-term, and contain signals useful for predicting the outcome of games.