CRD reference

Below are listed the CRD fields that can be defined by the user:

CRD field Remarks

apiVersion

spark.stackable.tech/v1alpha1

kind

SparkApplication

metadata.name

Application name

spec.version

Application version

spec.mode

cluster or client. Currently only cluster is supported

spec.image

User-supplied image containing spark-job dependencies that will be copied to the specified volume mount

spec.sparkImage

Spark image which will be deployed to driver and executor pods, which must contain spark environment needed by the job e.g. docker.stackable.tech/stackable/spark-k8s:3.3.0-stackable0.3.0

spec.sparkImagePullPolicy

Optional Enum (one of Always, IfNotPresent or Never) that determines the pull policy of the spark job image

spec.sparkImagePullSecrets

An optional list of references to secrets in the same namespace to use for pulling any of the images used by a SparkApplication resource. Each reference has a single property (name) that must contain a reference to a valid secret

spec.mainApplicationFile

The actual application file that will be called by spark-submit

spec.mainClass

The main class i.e. entry point for JVM artifacts

spec.args

Arguments passed directly to the job artifact

spec.s3connection

S3 connection specification. See the S3 resources for more details.

spec.sparkConf

A map of key/value strings that will be passed directly to spark-submit

spec.deps.requirements

A list of python packages that will be installed via pip

spec.deps.packages

A list of packages that is passed directly to spark-submit

spec.deps.excludePackages

A list of excluded packages that is passed directly to spark-submit

spec.deps.repositories

A list of repositories that is passed directly to spark-submit

spec.volumes

A list of volumes

spec.volumes.name

The volume name

spec.volumes.persistentVolumeClaim.claimName

The persistent volume claim backing the volume

spec.job.resources

Resources specification for the initiating Job

spec.driver.resources

Resources specification for the driver Pod

spec.driver.volumeMounts

A list of mounted volumes for the driver

spec.driver.volumeMounts.name

Name of mount

spec.driver.volumeMounts.mountPath

Volume mount path

spec.driver.affinity

Driver Pod placement affinity. See Pod Placement for details

spec.driver.logging

Logging aggregation for the driver Pod. See Logging for details

spec.executor.resources

Resources specification for the executor Pods

spec.executor.instances

Number of executor instances launched for this job

spec.executor.volumeMounts

A list of mounted volumes for each executor

spec.executor.volumeMounts.name

Name of mount

spec.executor.volumeMounts.mountPath

Volume mount path

spec.executor.affinity

Driver Pod placement affinity. See Pod Placement for details.

spec.executor.logging

Logging aggregation for the executor Pods. See Logging for details

spec.logFileDirectory.bucket

S3 bucket definition where applications should publish events for the Spark History server.

spec.logFileDirectory.prefix

Prefix to use when storing events for the Spark History server.