What Is the Role of Machine Learning in Databases?

Machine Learning

What is the role of machine learning in the design and implementation of a modern system? This question has sparked considerable recent introspection in the data management community, and the epicenter of this debate is the core problem of query optimization, where the system finds the best physical execution path for an query.

The au courant research direction, inspired by trends in Computer Vision, Natural Language Processing, and Robotics, is to apply deep learning; let the database learn the value of each execution strategy by executing different query plans repeatedly (an homage to ’s robot “arm farm”) rather through a pre-programmed analytical cost model. This proposal is not as radical as it seems: relational database management systems have always used statistical estimation machinery in query optimization such as using histograms, sampling methods for cardinality estimation, and randomized query planning algorithms. Similarly, learning from prior planning instances is not new either.

These techniques may not “feel” like modern AI, but are, in fact, statistical inference mechanisms that carefully balance generality, ease of update, and separation of modeling concerns. [read more]

Source: Sanjay Krishnan, Zongheng Yang, Joe Hellerstein, and Ion Stoica – RISE Lab at UC Berkeley