EDF 2015

The AXLE Project exhibited at the European Data Forum 2015 in Luxembourg on the 16th and 17th November 2015. The conference was an opportunity to meet researchers, industry and community parties interested in the data economy and the challenges we face with Big Data.

EDF stand

Evert Jan Hoijtink (Portavita) and Simon Riggs (2ndQuadrant Ltd), the driving forces behind the two SMEs involved in the AXLE project “manned the stand”, promoting the project’s results.

We were also invited to partake in the poster session, with a presentation entitled “Big Data, Can it be tamed?”

Although this was our last formal dissemination event for the project which concluded on the 31st October 2015, the partners will continue to share the results of the projects at various academic and industry conferences in the future.

Advertisement

Final General Assembly – Barcelona, 1st – 2nd October 2015

The AXLE Project partners visited Barcelona again for the 3rd time for our last project face-to-face meeting. BSC proved to be excellent hosts during our final visit, and we took the opportunity to discuss evaluation (the focus of the last 6 months has been to test the enhancements we have made to hardware and software during the lifetime of the project) and to evaluate our progress.

Lan and Anze, University of Ljubljana, produced a great image of how they see the AXLE project partners collaborating – this has been redrawn from their flipchart sketch, but hopefully gets the idea across just as well…

AXLE_The_UL_PerspectiveThe story in a nutshell… Portavita produces extremely large data through their synthetic dataset generator (anonymised and de-identified) and dumps this into the PostgreSQL database. 2ndQuadrant order this data through the new features they have written for the PostgreSQL code (the majority of this is available in the latest release 9.5) to greatly enhance security, scalability and performance. At the same time, Barcelona SuperComputing Center and the University of Manchester have experimented with and modified hardware to improve query speeds. And the University of Ljubljana updated the data mining and visualization tool, Orange to a more user-friendly version (Orange 3 was released in early October 2015) to produce immediate respresentations of sampled data.

Big Data Analytics: Tablesample, Orange, 2UDA

Umair Shahid, PostgreSQL and Business Intelligence expert, has been instrumental in the development of 2UDA – a data analytics application suite that unifies databases, spreadsheets, data mining and data visualization in one seamless desktop installer – exploiting the collaboration between PostgreSQL and Orange 3. Read the blog here: http://blog.2ndquadrant.com/big-data-analytics-tablesample-orange-2uda/

ARMOR – a solution to prevent Row-Hammer data corruption in DRAMs

PhD student Mohsen Ghasempour has devised a solution to a problem that has been exercising industry specialists over the last couple of years.

Mohsen’s research (supervised by Dr Mikel Lujan and Dr Jim Garside): “ARMOR: A Run-time Memory hot-row detector to prevent Row-Hammer data corruption in DRAMs” is now the subject of a patent application filed on behalf of the University by UMIP, the University’s commercialisation arm. UMIP is also seeking partners to develop the technology into a full commercial offering.

What is ARMOR?
ARMOR is a hardware-solution to prevent Row Hammer Errors in DRAMs, designed and developed in the School of Computer Science. Row hammering can occur when a specific wordline of a DRAM cell is activated repeatedly within a refresh interval (Row-Aggressor). In this situation the neighboring cells leak charge at a faster rate than expected. Thus, the retention time of such cells becomes less than refresh cycle (e.g. 64 ms) which means that these cells may lose their data (charge) before the refresh happens (Row-Victims). Therefore, during refreshing process the corrupted data will be read and written back again to the DRAM cell. The main challenge to mitigate the Row-Hammer effect is to monitor the number of activations for each row in the DRAM, which imposes a significant storage overhead to the memory system. ARMOR monitors the activation stream at the memory interface level and detects which specific rows (i.e. hot rows) are at risk of being “hammered” at run-time. ARMOR is capable of detecting all the possible hot-rows in a system with a minimal storage overhead.

Why ARMOR is a Promising Solution?
• capable of detecting all the possible Row Hammer errors with a high level of confidence
• provides precise information about the hammered rows (addresses) and the number of activations with a high level of accuracy (e.g. 99.99%)
• it does not need to know about the logical to physical mapping of DRAMs in order to mitigate Row Hammer error (ARMOR Cache Solution)
• scalable according to the size of memory
• technology independent and can easily support future device technologies.

For more information see: http://apt.cs.manchester.ac.uk/projects/ARMOR/RowHammer/index.html

ARMOR

Data Processing on FPGAs

by Geoffrey Ndu, University of Manchester

FPGAs (Field-Programmable Gate Arrays) are reprogrammable silicon chips that can be used to implement custom execution engines. FPGAs are becoming mainstream. For instance, IBM POWER8 processor has the Coherent Accelerator Processor Interface (CAPI) which allows an FPGA accelerator to coherently attach to the fabric of a POWER8 chip and access up to 1 TB of system memory [1]. Microsoft is using FPGAs to accelerate workloads in the datacenter [2].

In AXLE, we are looking into how the potential power of FPGAs could be harnessed by everyone, especially small and medium firms without the resources of the technology giants, to   speed up analytics on relational databases. We have identified programming FPGAs with software-like languages instead of traditional hardware description languages as a route for database kernel developers to access the power of FPGAS. Programming FPGAs using high level software languages allows a designer to work more productively at a higher level of abstraction and achieve faster time-to-market.

We have explored the trade-offs involved in using high level languages using to program FPGAs in the context of relational databases. The results of our study is detailed in this paper [3]. We are developing CHO [4], [5]; a benchmark suite that could be used to evaluate FPGAs that support programming with OpenCL. OpenCL is an open, royalty-free, parallel programming framework for writing applications that execute across heterogeneous platforms consisting of Central Processing Units (CPUs), Graphics Processing Units (GPUs), FPGAs and other processors. This paper [4] provides more information about our benchmark suite as well as a gentle introduction to computing with FPGAs.

[1]. OpenPOWER CAPI Developer Kit for POWER8

[2]. A. Putnam, A. Caulfield, et al. Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In Proceedings of the 41st International Symposium on Computer Archiecture (ISCA 2014).

[3]. O. Abella, G. Ndu, et al. An Empirical Evaluation of High-level Synthesis Languages and Tools for Database Acceleration. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL 2014).

[4]. G.Ndu, J.Navaridas and M. Lujan. CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators. University Manchester Technical Report UNIMAN-COMP-APT-TR-02-05-2014.

[5]. G.Ndu, J.Navaridas and M. Lujan. CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators. In Proceedings of the 3rd International Workshop on OpenCL.

Alpha Release of 2ndQuadrant Unified Data Analytics (2UDA)

2ndQuadrant is pleased to announce the alpha release of a new client platform for business intelligence – 2UDA (pronounced ‘tudor’).

2UDA is a data analytics application suite that unifies databases, spreadsheets, data mining and data visualisation in one seamless desktop installer.

Available to download now, 2UDA is packaged and ready to install with a choice of operating systems: Linux, OSX and Windows at http://2ndquadrant.com/2uda

Although primarily aimed at Data Analysts and Data Scientists, we encourage trial participation from developers and DBAs at this stage of the product development, to help us improve the function and user experience for all users.

More details on 2UDA at http://2ndquadrant.com/2uda

Augmenting the PostgreSQL Planner with Machine Learning

by Anthony Kleeroper, the University of Manchester

On 27th May, we gave a talk to the Manchester PostgreSQL Meetup group on the work we have been doing as part of the AXLE project. The PostgreSQL users and enthusiasts in the North West of England gave some of their evening to hear the talk which was organised by 2ndQuadrant.

The talk was well received and an interesting discussion followed mostly around the issues of applying Machine Learning to this challenging problem, especially the existence of edge cases. A really useful suggestion from the audience was that the system could make use of system downtime to run its own training queries, targeting areas of the feature space that were sparsely populated.

The talk started with an introduction to AXLE, describing the overall aim, the work being done by each of the project partners and how the different parts will all fit together. The main body of the talk was focused on the problem of Query Performance Prediction – predicting how long a query will take to run before it actually runs. The past approaches to this problem were considered which required a brief introduction to the ideas of Machine Learning. I then presented an outline of the approach we are using as part of AXLE and the plugin we are producing to implement our solution.

The second part of the talk concentrated on the problem of augmenting the planner and helping it select the best plan more often. This is in part based on the need for the planner to know when to offload some of the execution of a query to GPUs or other processing units. In part, though, it is based on the existence of cases where the planner selects a “bad” plan after rejecting one that would have actually been faster. Robert Haas found that 5% of the complaints on the pgsqlperformance mailing list were the result of the planner rejecting a faster plan and selecting a significantly slower one – and that is after excluding cases where the planner cost variables are tuned poorly.

The solution we are proposing as part of AXLE is to augment the planner with cost hooks that allow us to write plugins to predict the execution time of operations with more accuracy than the analytical model used by PostgreSQL at the moment. The planner will then be better placed to select the fastest plan and not reject it because of faulty costing.

A copy of the presentation can be seen here: Augmenting the PostgreSQL Planner with Machine Learning

Progress on online upgrade

In last couple of months I’ve been working on online upgrade for very large databases as part of the AXLE project and I would like to share my thoughts on the topic and what progress we have made recently.

Before joining 2ndQuadrant I used to work in Skype where the business would not allow a maintenance window for our databases. This meant no downtime was allowed for deployments, upgrades, etc. That kind of rule makes you change the way you do things. Most changes are small, you don’t do any heavy locks, you have replicas to allow for fast fail-over. But while you can make your releases small and non-blocking, what happens when you need to do a major version upgrade of the PostgreSQL database?

You might be in a different situation, as most companies do have an upgrade window, and so you might afford some downtime during the upgrade. This however brings two problems. For one, no company actually likes the downtimes even if they are allowed. And more importantly once your database grows beyond gigabytes in size into the range of terabytes or hundreds of terabytes, the downtime can take days or even weeks and nobody can afford to stop their operations for that long. The result is many companies often skip important upgrades, making the next one actually even more painful. And the developers are missing new features, performance improvements. They (the companies) sometime even risk running a PostgreSQL version that is no longer supported and has known data corruption or security problems. In the following paragraphs I will talk a little about my work on making the upgrades less time consuming and as result less painful and hopefully more frequent.

Let me start with a little history first. Before PostgreSQL 9.0 the only way to do a major version upgrade was to run pg_dump and restore the dump into an instance running a newer version of PostgreSQL. This method required the structure and all data to be read from the database and written into a file. Then read from the file and inserted into a new database, indexes have to be rebuilt, etc.

As you can imagine this process can take quite some time. Improvements in performance were made in 8.4 for pg_restore with the -j option added where you could specify how many parallel jobs to be run. This makes it possible to restore several tables (indexes, etc) in parallel making the restore process faster for custom format dumps. The 9.3 version added similar option to pg_dump, improving the performance even further. But given how fast data volumes are growing, the parallelization itself is not enough to make any serious gain in the time required for upgrade.

Then in PostgreSQL 9.0 a utility called pg_upgrade arrived. Pg_upgrade dumps just the structures and restores them into the new cluster. But it copies the data files as they are on disk which is much faster than dumping them into logical format and then reinserting. This is good enough for small databases because it means a downtime in range of minutes or hours, a time acceptable for many scenarios. There is also the link mode which just creates hard links (junction points on Windows) which makes this process even faster. But from my personal point of view it is too dangerous to run such setup on a production master server. I will briefly explain why. If something goes wrong, once you start your new server that was upgraded using the link mode, you are suddenly without production database and have to fail-over, or worse, you have to restore from backup. That means you not only failed to upgrade but you just caused additional downtime! Good luck getting approval next time.

Now many people who can’t afford long downtimes for upgrades use the trigger based replication solutions like Slony or Londiste to do the upgrade. This is a good solution because you can replicate your data while the original server is running and then switch with minimal downtime. In practice there are several problems however. One of them is that the trigger based solutions are often clunky to setup, especially if you are doing it only once every couple of years and only to do the upgrade. It is also easy to miss a table or to add tables in wrong order and thus not getting the full copy. I have witnessed this in practice and people doing the upgrade were working with the trigger based replication on daily basis. Another issue is that the trigger based solutions add considerable load on the source database, sometimes making the upgrade impossible due to the database server becoming overloaded once the replication is activated. And last but often not least, it can take very long time for the trigger based replication to actually move the data to the new server. On the last occasion I was involved with an upgrade project, the trigger based solution took around a month to copy the database and catch up with changes. Yes, one month.

With PostgreSQL 9.4 arrives the logical decoding feature which offers a fresh start for designing a new and better online upgrade problem solution. What we did, as part of AXLE project, is to create a tool which combines the logical decoding with the techniques described above. The solution solves most of the problems of previous approaches. The Uni-Directional Replication PostgreSQL extension (UDR for short) does logical replication using logical decoding of the write ahead log (WAL). Thanks to this, the impact on the master server is almost on par with the physical streaming replication, so the additional load caused by ongoing upgrade is minimal on the running system. Also it provides tools to initialize new nodes, both using physical and logical backup. You can even turn existing physical slave to UDR slave. And because it is a logical replication system, it is possible to design it in a way that supports cross-version replication.

What all this means is we can now use UDR in combination with pg_upgrade to do an online upgrade of major PostgreSQL version with minimal downtime, in short amount of absolute time and with minimal impact on the running system.

online-upgrade-stepsAn example how this can look in practice:

  • Do pg_basebackup of existing instance.
  • Setup the UDR replication between original instance and the one created by basebackup.
  • Pg_upgrade the new instance.
  • Let UDR replay the changes that happened in meantime.
  • Switch the traffic to the new instance.

For howto with more detailed instructions see the UDR Online Upgrade guide on PostgreSQL wiki. The UDR sources are available in the 2ndquadrant_bdr repository on PostgreSQL git server (bdr-plugin/next branch).

Finally, since UDR is not just an online upgrade tool but also a replication solution, it can be used for your normal replication needs, instead of the physical streaming replication. Furthermore it provides several advantages like ability to create temporary tables, replicating from multiple OLTP databases into one big data warehouse database, or replicating just part of the database.

My hope is that this effort will mean that downtime considerations are no longer a problem when it comes to upgrading from PostgreSQL 9.4 and above to a new major version.

The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.