Augmenting the PostgreSQL Planner with Machine Learning

On 27th May, we gave a talk to the Manchester PostgreSQL Meetup group on the work we have been doing as part of the AXLE project. The PostgreSQL users and enthusiasts in the North West of England gave some of their evening to hear the talk which was organised by 2ndQuadrant.

The talk was well received and an interesting discussion followed mostly around the issues of applying Machine Learning to this challenging problem, especially the existence of edge cases. A really useful suggestion from the audience was that the system could make use of system downtime to run its own training queries, targeting areas of the feature space that were sparsely populated.

The talk started with an introduction to AXLE, describing the overall aim, the work being done by each of the project partners and how the different parts will all fit together. The main body of the talk was focused on the problem of Query Performance Prediction – predicting how long a query will take to run before it actually runs. The past approaches to this problem were considered which required a brief introduction to the ideas of Machine Learning. I then presented an outline of the approach we are using as part of AXLE and the plugin we are producing to implement our solution.

The second part of the talk concentrated on the problem of augmenting the planner and helping it select the best plan more often. This is in part based on the need for the planner to know when to offload some of the execution of a query to GPUs or other processing units. In part, though, it is based on the existence of cases where the planner selects a “bad” plan after rejecting one that would have actually been faster. Robert Haas found that 5% of the complaints on the pgsqlperformance mailing list were the result of the planner rejecting a faster plan and selecting a significantly slower one – and that is after excluding cases where the planner cost variables are tuned poorly.

The solution we are proposing as part of AXLE is to augment the planner with cost hooks that allow us to write plugins to predict the execution time of operations with more accuracy than the analytical model used by PostgreSQL at the moment. The planner will then be better placed to select the fastest plan and not reject it because of faulty costing.

Leave a Reply

Your email address will not be published. Required fields are marked *

19 − eleven =