splicemachine/splice-helm

Chart version: 0.1.7
Api version: v1
App version: 3.0
Splice Machine data platform is a scale-out SQL RDBMS, data war...
application
Chart Type
Active
Status
Unknown
License
178
Downloads
https://splicemachine.github.io/charts
Set me up:
helm repo add center https://repo.chartcenter.io
Install Chart:
helm install splice-helm center/splicemachine/splice-helm
Versions (0)

Splice Machine Database Helm Chart

Deploy a single instance of Splice Machine Database

This Helm Chart requires the use of Helm v2.x.

TL;DR

At a minimum you’ll want to have a domain name you can create DNS records for, and a working Kubernetes cluster.

DOMAIN_NAME={your domain name}  # dev.example.io
helm install --name splicedb --namespace splice splicemachine/splice-helm \
  --set-string global.certificateName=${DOMAIN_NAME}

Read the installation notes section below while your Splice Machine database is being provisioned.

Prerequisites

  • Kubernetes 1.14, 1.15, 1.16, 1.17
  • Kubernetes cluster that provides PV/PVC with read write access
  • Apply storage classes to the Kubernetes cluster
  • SSL Certificate (tls.crt and tls.key)

Installation Notes

While it IS possible to simply install Splice Machine (splice-helm) Chart without providing any parameters, there are some caveats that need to be taken into consideration:

  • The default domain name will be dev.splicemachine-dev.io
  • Web UI endpoints will not have a valid SSL certificate
  • All storage disks will be provisioned using the default storage class
    • Storage access speeds have a direct correlation to the performance of a Splice Machine database.

Please stop by our Slack channel #splice-community. We would be happy to assist in the deployment of our Helm chart.

Configuration

This table lists the most used configuration parameters from the various values.yaml files.

Parameter Description Default Value
global.environmentName Environment name for the cluster: dev/qa/prod dev1
global.dnsPrefix Value to prefix all DNS entries with splicedb
global.certificateName The domain of the certificate being used dev.splicemachine-dev.io
global.tls.enabled Set if you want to provide a certificate false
global.tls.crt The contents of a tls.crt file user provided
global.tls.key The contents of a tls.key file user provided
hadoop.dataNode.persistent.size Size of the hadoop volumes .2T
hadoop.dataNode.replicas Number of hadoop data nodes 3
hbase.region.replicas Number of hbase region nodes, recommend same as hadoop data nodes 3
jupyterhub.singleuser.extraEnv.SPARKEXECUTOR_COUNT Spark executors for JupyterHub notebooks 2
global.splice.password Password for the default splice account admin

Installation

Add Splice Machine Helm Repository

This needs to be run one time.

helm repo add splicemachine https://splicemachine.github.io/charts
helm repo update

Command line parameters method

See the Advanced Installation section for more configuration scenarios.

DOMAIN_NAME=dev.splicemachine-dev.io  # change this to a domain name you own and have access to create/update DNS records
ENVIRONMENT=dev                       # identifier for the environment, reflected in DNS host names
DNSPREFIX=splice                      # name to prefix DNS host name with "splice-...."
helm install --name splicedb --namespace splice splicemachine/splice-helm \
  --set global.environmentName=${ENVIRONMENT} \
  --set-string global.dnsPrefix=splice \
  --set-string global.certificateName=${DOMAIN_NAME}

The command to install using a valid SSL certificate.

DOMAIN_NAME=dev.splicemachine-dev.io  # change this to a domain name you own and have access to create/update DNS records
ENVIRONMENT=dev                       # identifier for the environment, reflected in DNS host names
DNSPREFIX=splice                      # name to prefix DNS host name with "splice-...."
helm install --name splicedb --namespace splice splicemachine/splice-helm \
  --set-string global.environmentName=${ENVIRONMENT} \
  --set-string global.dnsPrefix=splice \
  --set-string global.certificateName=${DOMAIN_NAME} \
  --set global.tls.enabled=true \
  --set-string global.tls.crt=$(cat tls.crt) \
  --set-string global.tls.key=$(cat tls.key)

Certificate Files

The tls.crt file will have contents that contain

-----BEGIN CERTIFICATE-----
{certificate encoded data}
-----END CERTIFICATE-----

And the tls.key file will have contents that contain

-----BEGIN PRIVATE KEY-----
{encoded private key}
-----END PRIVATE KEY-----

Edit values file method

There is a values_required.yaml file, this can be edited and each of the keys needs to have a value provided. Once the files is edited, the installation can be performed using:

# this will fetch and untar the chart into a directory named `splice-helm`
helm fetch splicemachine/splice-helm --untar
cd splice-helm
# edit the values_required.yaml file
helm install --name splicedb --namespace splice -f values_required.yaml .

Rancher Method

When choosing Launch from the Apps menu, locate the splice-helm application from the Rancher library.

Supply the release name and namespace, we generally make this the same.

In the General section, the environment name and dns prefix are both used to build host names that are used to route traffic via nginx and haproxy. The format can be found below in the information about Nginx Ingress Controller.

If you would like the Rancher application to install the TLS certificate, choose to Install TLS Certificate and paste the contents of your tls.crt and tls.key files into the Certificate Data and Certificate Private Key fields.

In the Database Credentials section, enter the password to be assignd to the default splice user.

The Optional section allows you to specify the persistent volume size for each of the Hadoop Data Nodes. Also selected here is the number of Hadoop Data Nodes and Hbase Region Servers. These latter two are usually set to the same as they have affinity rules to pair Data Nodes and Region Server Statefulset Pods on the same Kubernetes nodes.

Once the data is provided, click Launch and wait for the application to become active.

Post Install Configuration

DNS

There are multiple services that will have public facing IP addresses.

  • splicedb-haproxy-controller
  • splicedb-kafka-external-0
  • splicedb-nginx-ingress-controller

HA Proxy Controller

This service provides JDBC connectivity to the database on port 1527. Generally we configure the DNS host entry using the following format:

jdbc-{dnsPrefix}-{environmentName}.{domain}

Where {domain} is the domain specified in the global.certificateName setting.

Kafka External

This provides Kafka streaming on port 19092. Generally we configure the DNS host entry using the following format:

kafka-broker-0-{dnsPrefix}-{environmentName}.{domain}

Nginx Ingress Controller

This is our primary ingress for UI administrative pages. Unlike the JDBC and Kafka DNS host names, this format is required in order to properly route traffic to the correct pages.

Host Name Path Backend UI
{dnsPrefix}-{environmentName}.{domain} /splicejupyter/ Jupyter Hub Notebooks
/mlflow/ ML Flow UI
{dnsPrefix}-{environmentName}admin-hdfs.{domain} / HDFS Admin UI
{dnsPrefix}-{environmentName}admin-hbase.{domain} / HBASE Admin UI
{dnsPrefix}-{environmentName}-spark.{domain} / Spark UI
{dnsPrefix}-{environmentName}-jobtracker.{domain} / Job Tracker UI
{dnsPrefix}-{environmentName}-kafka.{domain} / Kafka UI

Using Splice Machine Database

The documentation for Splice Machine, doc.splicemachine.com. The documentation for Connecting to the database. Our CLI, sqlshell, used to connect to the database can be downloaded here

Our Jupyter Notebooks contain a walkthrough of our database functionality. Connect to the splicejupyter URL as described above, logging in with the splice user and database password provided during the provisioning of the database.

Uninstall

helm delete splicedb --purge

Advanced Setup

Like any advanced database system, there is advanced configuration that will allow you to get the most out of a Splice Machine database. Here we will cover the use of Node Selectors and Storage Classes.

Node Selector Definitions

The splice-helm Helm chart supports assigning specific workloads to Kubernetes Nodes that match a label. Our label name is components. Using this configuration we can specify workloads that support reading and writing data to the database to be placed on high-powered Nodes, and workloads that are in support of running the database engine to less expensive Nodes.

Our platform supports two different node groups in order to provide the correct sizing to each sub-component of our platform.

components Size of Node
db Large Sized Nodes
meta Medium Sized Nodes

Nodes can be tagged with a the components label using the following command:

NODE_NAME=node-name
# add the "db" label
kubectl patch node ${NODE_NAME} -p '{"metadata":{"labels":{"components":"db"}}}'
# add the "meta" label
kubectl patch node ${NODE_NAME} -p '{"metadata":{"labels":{"components":"meta"}}}'

We can enable the use of Node Selectors by enabling Node Selectors.

DOMAIN_NAME=dev.splicemachine-dev.io  # change this to a domain name you own and have access to create/update DNS records
ENVIRONMENT=dev                       # identifier for the environment, reflected in DNS host names
DNSPREFIX=splice                      # name to prefix DNS host name with "splice-...."
helm install --name splicedb --namespace splice splicemachine/splice-helm \
  --set global.environmentName=${ENVIRONMENT} \
  --set-string global.dnsPrefix=splice \
  --set-string global.certificateName=${DOMAIN_NAME} \
  --set global.nodeSelector.enabled=true

Storage Class Definitions

We have defined storage classes to help manage cost/performance balance.

Storage Class Name Usage Detail
splicedb-premium High IOPS, Retain upon PVC deletion
splice-file-storage General Use, Retain upon PVC deletion

See the Storage Class Examples section below for some example definitions. Create a yaml file storage_classes.yaml from one of the examples below and modify to fit your Kubernetes implementation. Create the new Kubernetes resources:

kubectl create -f storage_classes.yaml

Once your Kubernetes cluster has the storage classes added, you can activate their use by either including the storageclass_values.yaml file using the -f option or by specifying each option on the command line via --set-string options.

Additional values file option

helm fetch splicemachine/splice-helm --untar
cd splice-helm
DOMAIN_NAME=dev.splicemachine-dev.io  # change this to a domain name you own and have access to create/update DNS records
ENVIRONMENT=dev                       # identifier for the environment, reflected in DNS host names
DNSPREFIX=splice                      # name to prefix DNS host name with "splice-...."
helm install --name splicedb --namespace splice . \
  --set global.environmentName=${ENVIRONMENT} \
  --set-string global.dnsPrefix=splice \
  --set-string global.certificateName=${DOMAIN_NAME} \
  -f storageclass_values.yaml

set-string option

helm fetch splicemachine/splice-helm --untar
cd splice-helm
DOMAIN_NAME=dev.splicemachine-dev.io  # change this to a domain name you own and have access to create/update DNS records
ENVIRONMENT=dev                       # identifier for the environment, reflected in DNS host names
DNSPREFIX=splice                      # name to prefix DNS host name with "splice-...."
helm install --name splicedb --namespace splice . \
  --set global.environmentName=${ENVIRONMENT} \
  --set-string global.dnsPrefix=splice \
  --set-string global.certificateName=${DOMAIN_NAME} \
  --set-string hadoop.dataNode.persistence.storageClassName=splicedb-premium \
  --set-string hbase.master.persistence.storageClassName=splice-file-storage \
  --set-string hbase.region.persistence.storageClassName=splice-file-storage \
  --set-string hbase.olap.persistence.storageClassName=splice-file-storage \
  --set-string kafka.persistence.storageClassName=splicedb-premium

Storage Class Examples

Example Storage Classes for AWS

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splicedb-premium
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splice-file-storage
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Example Storage Classes for Azure

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splicedb-premium
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/azure-disk
parameters:
  cachingmode: ReadOnly
  kind: Managed
  storageaccounttype: Premium_LRS
reclaimPolicy: Retain
volumeBindingMode: Immediate
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splice-file-storage
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/azure-file
parameters:
  skuName: Standard_LRS
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - mfsymlinks
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Example Storage Classes for GCP

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splicedb-premium
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  replication-type: none
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splice-file-storage
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: none
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Example Storage Classes for Openstack/Cinder

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splicedb-premium
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/cinder
reclaimPolicy: Retain
volumeBindingMode: Immediate
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: splice-file-storage
  labels:
    app: splice-storageclass
provisioner: kubernetes.io/cinder
reclaimPolicy: Retain
volumeBindingMode: Immediate