Only this pageAll pages
Powered by GitBook
1 of 23

RBC reference docs

Loading...

Getting started

Loading...

Loading...

Loading...

User Manual

Loading...

Loading...

Loading...

Loading...

Loading...

OmniSciDB Integration

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Developers corner

Loading...

Loading...

Signature Specification

Compiling code with RBC decorators

RBC Documentation

Gitbook View, Documentation Sources, Project homepage.

Overview

The purpose of the RBC project is to implement the concept of Remote Backend Compiler (RBC). The concept of RBC is about splitting the compilation of a user-provided program source code to machine-executable instructions in between two different computer systems - an RBC client and a JIT server - using the following workflow:

  • In the RBC client, the user-provided source code of a program is compiled to a LLVM IR string.

  • The LLVM IR string together with the program metadata is sent to a server where it will be registered and made available for execution in the JIT server.

The RBC concept can be applied in various situations. For example, the RBC enables executing client programs for analyzing or processing stored in a remote server when retrieving the data over the network would not be feasible due to the large size or be too inefficient.

  • LLVM IR is an intermediate representation of a compiled program used in the . The low-level is based on representation that many high-level languages can be compiled into. The LLVM IR can be an input to a compiler which will complete the compilation process resulting in a machine-executable program.

In the RBC project, the client software is implemented in Python and uses for compiling Python functions into LLVM IR. In addition, the RBC client software can use for compiling C/C++ functions into LLVM IR as well. The RBC project provides a Python/Numba based JIT server as a prototype of the RBC concept.

As an application, the RBC client software can be used in connection with - for run-time registration of custom SQL functions: User-Defined Functions (UDFs) and User-Defined Table Functions (UDTFs). OmniSciDB uses JIT technology that enables compiling SQL queries into machine-executable programs to be run on modern CPU and GPU hardware.

Structure of RBC documentation

Big data
LLVM compiler toolchain
LLVM IR language
Static Single Assignment (SSA)
Just-in-time (JIT)
Numba
Clang compilers
OmniSciDB - an analytical database and SQL engine
Getting Started
Developers Corner

Getting started

In this section we cover the following topics:

  • How to install RBC and OmnisciDB software to Conda environments

  • How to defined UDFs and UDTFs and register these to OmniSciDB server

User Defined Table Functions (UDTF)

Installation

In the following, we'll describe how to install the RBC and related software.

  • Since the RBC project is under active development and to get the latest updates and bug fixes fastest, one should use the approach described in Developers Corner.

Install RBC using conda

conda install -c conda-forge rbc

# or to install rbc to a new environemnt, run
conda create -n rbc -c conda-forge rbc
conda activate rbc

# check if rbc installed succesfully
python -c 'import rbc; print(rbc.__version__)'

Install RBC using pip (alternative)

Install OmniSciDB using conda

  • It is recommended to install the rbc and omniscidb conda packages to separate conda environments.

To check that omniscidb package is installed successfully, make sure that the omniscidb environment is activated and run:

To start omniscidb server with runtime UDF/UDTFs support enabled, run the omnisci_server command with --enable-runtime-udf and --enable-table-functions flags:

Install OmniSciDB using docker

Calling External Functions

pip install rbc-project
# check if rbc installed succesfully
python -c 'import rbc; print(rbc.__version__)'
conda create -n omniscidb omniscidb
conda activate omniscidb

# or for CUDA enabled omniscidb, use

conda create -n omniscidb-cuda omniscidb=*_cuda
conda activate omniscidb-cuda
omnisci_server --version
# Create DB, run only once
mkdir -p omnisci_data
omnisci_initdb omnisci_data

# Start server:
omnisci_server --data=omnisci_data --enable-runtime-udf --enable-table-functions

# CPU version
# https://hub.docker.com/r/omnisci/core-os-cpu
docker run \
  -d \
  --name omnisci \
  -p 6274:6274 \
  -v /home/username/omnisci-storage:/omnisci-storage \
  omnisci/core-os-cpu
  
# GPU version
# https://hub.docker.com/r/omnisci/core-os-cuda
docker run \
  --runtime=nvidia \
  -d \
  --name omnisci \
  -p 6274:6274 \
  -v /home/username/omnisci-storage:/omnisci-storage \
  omnisci/core-os-cuda

Simple example (OmniSciDB)

The RBC package implements support for defining and registering user-defined functions for the OmniSciDB SQL engine. The following types of user-defined functions are supported:

  • UDFs that are applied to a DB table row-wise,

  • UDTFs (table functions) that are applied to the DB table columns.

In the following, we explain how to use the RBC Python package rbc for connecting to an OmniSciDB server, defining custom UDFs and a UDTF, registering these to the OmniSciDB server, and finally, how to use the user-defined functions in a SQL query using rbc tools as an example.

Connecting to OmniSciDB server

Assuming that an OmniSciDB server is running, registering a new user-defined function requires establishing a connection to the server. This can be done directly:

or using an existing connection session id (not implemented, see ):

For the sake of having a complete example, let's create a sample table:

Defining UDFs using Python functions

We create two new UDFs that increment row values by 1, one UDF for FLOAT columns and another for INT columns:

Notice that the two UDFs can be defined as a single Python function myincr because RBC/OmniSciDB supports overloading user-defined function names.

Registering UDFs - row-wise function

To register these UDFs to OmnisciDB, one can call

but when using the omnisci object for making SQL queries then the registration of any new UDFs is triggered automatically.

That's it! Now anyone connected to the OmniSciDB server can use the SQL function myincr in their queries.

SQL query using omnisci.sql_execute

For example, one can use the RBC provided omnisci object to send queries:

that will output:

Defining UDTFs - table functions

Table functions act on database table columns and their results are stored in so-called output columns of temporary tables. Let's implement a new SQL table function that computes a new table with all columns incremented by user-specified value:

Before trying it out, let's explain some of the details here:

  • Return value - The return value of a UDTF definition defines the length of the output columns. The output columns arguments memory is pre-allocated for the size m * len(x) where column sizer parameter m is a literal constant specified by the user in a SQL query and len(x) represents the size of the first input column. In case the UDTF definition returns the output column size value smaller than m * len(x), the memory of output columns will be re-allocated accordingly. The return type of a UDTF definition must be 32-bit integer and the type of column sizer parameters can be RowMultiplier, ConstantParameter, or Constant.

One can call the new table function to increment the FLOAT column x by value 2.3 from a SQL query as follows:

that will output

Cursor - The Cursor<...> represents the cursor over input table columns. For instance, Cursor<float, int> would correspond to two arguments of the UDTF definition, one being the input column containing float values and another being input column containing int values.

rbc issue 180
from rbc.omniscidb import RemoteOmnisci
omnisci = RemoteOmnisci(user='admin', password='HyperInteractive',
                        host='127.0.0.1', port=6274, dbname='omnisci')
omnisci = RemoteOmnisci(connection=con)
omnisci.sql_execute('drop table if exists simple_table')
omnisci.sql_execute('create table if not exists simple_table (x FLOAT, i INT);');
omnisci.load_table_columnar('simple_table',
                            x = [1.1, 1.2, 1.3, 1.4, 1.5],
                            i = [0, 1, 2, 3, 4])
@omnisci('float(float)', 'int(int)')
def myincr(x):
    return x + 1
omnisci.register()
descr, result = omnisci.sql_execute('SELECT x, myincr(x) FROM simple_table')
for x, x1 in result:
    print(f'x={x:.4}, x1={x1:.4}')
x=1.1, x1=2.1
x=1.2, x1=2.2
x=1.3, x1=2.3
x=1.4, x1=2.4
x=1.5, x1=2.5
@omnisci('int(Cursor<float>, float, RowMultiplier, OutputColumn<float>)')
def incrby(x, dx, m, y):
    for i in range(len(x)):
        y[i] = x[i] + dx
    return len(x)

omnisci.register()
descr, result = omnisci.sql_execute('''
  SELECT * FROM TABLE(INCRBY(CURSOR(SELECT x FROM simple_table),
                             CAST(2.3 AS FLOAT), 1))
''')
for y, in result:
    print(f'y={y:.4}')
y=3.4
y=3.5
y=3.6
y=3.7
y=3.8

Untitled

ColumnList

ColumnList support requires OmniSciDB 5.6 or newer

The Signature Class

User Defined Aggregate Function (UDAF)

Not supported

Column

Developing RBC and OmniSciDB

In this section we cover the following topics:

  • Getting software sources and setting up development environments

User Defined Functions (UDF)

Runtime UDF Support

The Remote Backend Compiler (RBC) package implements the OmniSciDB client support for defining so-called Runtime UDFs. That is, while OmniSciDB server is running, one can register new SQL functions to Omnisci Calcite server as well as provide their implementations in LLVM IR string form. The RBC package supports creating Runtime UDFs from Python functions.

A User-Defined Function brings the capability of defining new SQL functionalities that work in a rowwise fashion manner. The figure below illustrates how a UDF works:

function add1 is called for every row and produce a new row

Example

First, we need to connect RBC to Omnisci server using the RemoteOmnisci remote class.

One can define UDF functions using omnisci as a decorator:

Supported Types and Data Structures

OmniSciDB supports many data types but not all of them are supported by the Remote Backend Compiler.

Scalar types

Datatype

Size (bytes)

Notes

BOOLEAN

Array

RBC also supports Array<T> where T is one of the following scalar types seen above. Both fixed and variable length arrays are supported in RBC. For more information, see the Array page.

Column

ColumnList

Cursor

Array

In OmniSciDB, an array has the following internal representation:

typedef struct {
    T* data;  // contiguous memory block
    int64_t size;
    int8 is_null;  // boolean values in omniscidb are represented as int8 variables
} Array;

Creating an Array programmatically

*Notice that returning an empty array is an invalid operation that might crash the server.

One can also use one of the array creation functions as specified in the python Array API standard: full, full_like, zeros, zeros_like, ones and ones_like:

Accessing a pointer member

Getting the size of an array

Checking for null

You can either check if an array is null or if an array value is null

from numba import types as nb_types
from rbc.omnisci_backend import Array

@omnisci('double[](int64)')
def create_array(size):
    array = Array(size, nb_types.double)
    for i in range(size):
        array[i] = nb_types.double(i)
    return array

1

TRUE: 'true', '1', 't'. FALSE: 'false', '0', 'f'. Text values are not case-sensitive.

TINYINT

1

Minimum value: -127; maximum value: 127

SMALLINT

2

Minimum value: -32,767; maximum value: 32,767

INT

4

Minimum value: -2,147,483,647; maximum value: 2,147,483,647.

BIGINT

8

Minimum value: -9,223,372,036,854,775,807; maximum value: 9,223,372,036,854,775,807.

from numba import types as nb_types
import rbc.omnisci_backend as omni

@omnisci('double[](int64)')
def zero_array(size):
    return omni.zeros(size, nb_types.double)
@omnisci('double(double[], int64)')
def get_array_member(array, idx):
    return array[idx]
@omnisci('int64(double[])')
def get_array_size(array):
    return len(array)
@omnisci('int8(double[])')
def is_array_null(array):
    return array.is_null()

@omnisci('int8(double[], int64)')
def is_array_null(array, idx):
    return array.is_null(idx)
from rbc.omniscidb import RemoteOmnisci
omnisci = RemoteOmnisci(user='admin', password='HyperInteractive',
                        host='127.0.0.1', port=6274, dbname='omnisci')
@omnisci('int32(int32)')
def incr(i):
    return i + 1

Setup development environments

We are developing and testing the RBC software under Linux using Conda packaging system as it conveniently provides all the necessary dependencies for the RBC package as well as for the OmniSciDB software.

In the following, we explain how to setup Conda environments for developing RBC as well as OmniSciDB software, how to get the software, and how to run and test the software.

Setting up a development environments

To create the needed development environments, follow the instructions below.

  • Install and setup conda (unless it is already installed):

  • Create rbc-dev environment:

  • Create omniscidb-dev environment:

It is recommended to keep the rbc and omniscidb development environments separate to minimize the risk for any software version conflict.

CUDA

OmniSciDB can be built for CPU-only or CUDA-enabled mode.

For CUDA-enabled OmniSciDB development, create omniscidb-cuda-dev environment:

In addition, make sure that your system has installed and the CUDA driver functional (run nvidia-smi to verify). The highest CUDA Toolkit version that OmniSciDB supports is currently 11.0

Here follow the instructions for installing the CUDA Toolkit version 11.0.3:

Getting software

  • Checkout rbc sources:

  • Checkout omniscidb sources:

  • Although the omniscidb-dev environment contains all the prerequisites for building and running OmniSciDB server within the conda environment, we slightly adjust the conda environment to make the management of the building process for different build targets easier. For that, use the following script for activating the omniscidb-dev environment:

The script activate-omniscidb-internal-dev.sh must be sourced (not run via bash or another shell) to the existing terminal session. The activate script will show various information about how to develop the OmniSciDB software within a Conda environment.

  • The activate script above uses a custom script /usr/local/cuda/env.sh for setting CUDA environment variables for conda environment. Use the following commands to install the env.sh script:

Development

RBC

The basic workflow for developing and testing RBC package contains the following commands:

The omniscidb related rbc tests are run only when the omniscidb server is running. The server must be started with the options --enable-runtime-udf --enable-table-functions , see OmniSciDB development section below.

By default, it is assumed that the omniscidb server can be accessed behind the localhost port 6274. Only in the case, the server is running elsewhere, one needs to configure the rbc testing environment as follows:

  • Create a configuration file with the following content (update credentials as needed):

  • The default location for the configuration file depends on the system but one can set an environment variable OMNISCI_CLIENT_CONF that should contain a full path to the configuration file. Default locations of the configuration file are shown below:

OmniSciDB

The basic workflow for developing and testing OmniSciDB software in the context of RBC development is as follows:

that will build CUDA-enabled omniscidb server. See the instruction from the activate script about how to build CPU-only omniscidb server, for instance.

To execute the omniscidb test-suite, make sure that the current working directory is the build directory and run:

To start omniscidb server with runtime UDF/UDTFs support enabled, run:

CUDA Toolkit
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
source $HOME/miniconda3/bin/activate
conda init bash  # or zsh when on macOS Catalina
conda env create -n rbc-dev \
  -f https://raw.githubusercontent.com/Quansight/pearu-sandbox/master/conda-envs/rbc-dev.yaml
conda env create -n omniscidb-dev \
  -f https://raw.githubusercontent.com/Quansight/pearu-sandbox/master/conda-envs/omniscidb-dev.yaml
conda env create -n omniscidb-cuda-dev \
  -f https://raw.githubusercontent.com/Quansight/pearu-sandbox/master/conda-envs/omniscidb-dev.yaml
sudo mkdir -p /usr/local/src/cuda-installers
cd /usr/local/src/cuda-installers
sudo wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.06_linux.run
sudo bash cuda_11.0.3_450.51.06_linux.run --toolkit --toolkitpath=/usr/local/cuda-11.0.3/ --installpath=/usr/local/cuda-11.0.3/ --override --no-opengl-libs --no-man-page --no-drm --silent

sudo sh /usr/local/src/cuda-installers/cuda_11.0.3_450.51.06_linux.run
┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer                                                               │
│ - [X] Driver                                                                 │
│      [X] 450.51.06                                                           │
│ + [ ] CUDA Toolkit 11.0                                                      │
│   [ ] CUDA Samples 11.0                                                      │
│   [ ] CUDA Demo Suite 11.0                                                   │
│   [ ] CUDA Documentation 11.0                                                │
│   Options                                                                    │
│   Install                                                                    │
...
#   ^--- X-select only Driver and PRESS Install
mkdir -p ~/git/xnd-project
cd ~/git/xnd-project

# If you are a member of xnd-project organization, use

git clone git@github.com:xnd-project/rbc.git

# else fork https://github.com/xnd-project/rbc and clone your rbc fork,
# or use

git clone https://github.com/xnd-project/rbc.git
mkdir -p ~/git/omnisci
cd ~/git/omnisci

# If you are a member of omnisci organization, use

git clone git@github.com:omnisci/omniscidb-internal.git

# Otherwise, fork https://github.com/omnisci/omniscidb and clone the fork.
#
# Or use

git clone https://github.com/omnisci/omniscidb.git
cd ~/git/omnisci/
wget https://raw.githubusercontent.com/Quansight/pearu-sandbox/master/working-envs/activate-omniscidb-internal-dev.sh
cd ~/git/omnisci/
wget https://raw.githubusercontent.com/Quansight/pearu-sandbox/master/set_cuda_env.sh
sudo ln -s ~/git/omnisci/set_cuda_env.sh /usr/local/cuda/env.sh
cd ~/git/xnd-project/rbc
conda activate rbc-dev
python setup.py develop

# Possible commands for running rbc tests

pytest -sv -r A rbc/tests
pytest -sv -r A rbc/tests -x -k <test function>
# File: client.conf

[user]
# OmniSciDB user name
name: admin
# OmniSciDB user password
password: HyperInteractive

[server]
# OmniSciDB server host name or IP
host: localhost
# OmniSciDB server port
port: 6274
# Linux, macOS:
OMNISCI_CLIENT_CONF=$HOME/.config/omnisci/client.conf
# Windows:
OMNISCI_CLIENT_CONF=%UserProfile/.config/omnisci/client.conf
OMNISCI_CLIENT_CONF=%AllUsersProfile/.config/omnisci/client.conf
cd ~/git/omnisci/omniscidb-internal  # or git/omnisci/omniscidb
source ~/git/omnisci/activate-omniscidb-internal-dev.sh

mkdir -p build && cd build
cmake -Wno-dev $CMAKE_OPTIONS_CUDA ..
make -j $NCORES
mkdir tmp && bin/initdb tmp
make sanity_tests
mkdir data && bin/initdb data
bin/omnisci_server --enable-runtime-udf --enable-table-functions

Cursor