Community Articles
Find and share helpful community-sourced technical articles

PostgreSQL extension PG-Strom, allows users to customize the data scan and run queries faster. CPU-intensive work load is identified and transferred to the GPU to take advantage of the powerful GPU parallel execution ability to complete the data task. The combination of few number of core processors, RAM bandwidth, and the GPU has a unique advantage. GPUs typically have hundreds of processor cores and RAM bandwidths that are several times larger than CPUs. They can handle large numbers of computations in parallel, so their operations are very efficient.


PG-Storm based on two basic ideas:

  1. On-the-fly native GPU code generation.
  2. Asynchronous pipeline execution mode.

Below figure shows how query is submitted to execution engine and during query optimization phase, PG-Storm detects whether a given query is fully or partially executable on the GPU, and then determines whether the query can be transferred. If the query can be transferred, PG-Storm creates the source code for the GPU native binaries on the fly, starting the real-time compilation process before the execution phase. Next, PG-Storm loads the extracted rowset into the DMA cache (the size of a buffer is defaulted to 15MB) and asynchronously starts DMA transfers and GPU core execution. The CUDA platform allows these tasks to be executed in the background, so PostgreSQL can run the current process ahead of time. Through GPU acceleration, these asynchronous correlation slices also hide the general delay.


After loading PG-Strom, running SQL on the GPU does not require special instructions. It allows the user to customize the way PostgreSQL is scanned, and provides additional workarounds for scan/join logic that can be run on the GPU. If the expected cost is reasonable, Task Manager places the custom scan node instead of the built-in query execution logic.

The graph below shows the benchmark results for PG-Strom and PostgreSQL. The abscissa is the number of tables, and the ordinate is the query execution time. In this test, all relevant internal relations can be loaded into the GPU RAM on a one-time basis, pre-aggregation greatly reduces the number of rows the CPU needs to process. For more details, test code can be viewed

As can be seen from this figure, PG-Strom is much faster than PostgreSQL alone.

Here are a few ways you can improve the performance of PostgreSQL:

1. Similar vertical expansion

2. Heterogeneous vertical expansion

3. Horizontal expansion


PG-Strom uses a heterogeneous longitudinal extension approach that maximizes hardware benefits for workload characteristics. In other words, the PG-Strom allocates simple, large numbers of numerical calculations on GPU devices before running on the CPU core.

Evolution, Right...

New Contributor

Do you have any idea regarding this GPU database Planner?

Basically in GPU Database operations, the planner checks for different scan and join methods, and then find the cheapest one and creates a query plan tree. While going for same thing in GPU, the checks should also be made for, whether it is device executable or not and the query plan tree from CPU has been updated.

I just wanted to know in higher level and just for knowledge.

  1. I am just curious about this planning factors in GPU.
  2. There can be more than one appropriate paths in query plan tree. How the decision for particular path has been made considering those planning factors?
Don't have an account?
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 09:14 AM
Updated by:
Top Kudoed Authors