What is the role of machine learning in the design and implementation of a modern database system? This question has sparked considerable recent introspection in the data management community, and the epicenter of this debate is the core database problem of query optimization, where the database system finds the best physical execution path for an SQL query.
The au courant research direction, inspired by trends in Computer Vision, Natural Language Processing, and Robotics, is to apply deep learning; let the database learn the value of each execution strategy by executing different query plans repeatedly (an homage to Google’s robot “arm farm”) rather through a pre-programmed analytical cost model. This proposal is not as radical as it seems: relational database management systems have always used statistical estimation machinery in query optimization such as using histograms, sampling methods for cardinality estimation, and randomized query planning algorithms. Similarly, learning from prior planning instances is not new either.
These techniques may not “feel” like modern AI, but are, in fact, statistical inference mechanisms that carefully balance generality, ease of update, and separation of modeling concerns. [read more]