Graceful shutdown

You can configure the graceful shutdown as described in Graceful shutdown.

Graceful shutdown only works if you enabled authorization using OPA. See Authorization requirements for details.

Coordinators

As a default, coordinators have 15 minutes to terminate gracefully.

The coordinator process receives a SIGTERM signal when Kubernetes wants to terminate the Pod. After the graceful shutdown timeout runs out, and the process still did not exit, Kubernetes issues a SIGKILL signal.

When a coordinator gets restarted, all currently running queries fail and cannot be recovered after the restart process is finished. As of Trino version 451 this can not be prevented (e.g. by using multiple coordinators).

Workers

As a default, workers have 60 minutes to terminate gracefully.

Trino supports gracefully shutting down workers. This operator always adds a PreStop hook to gracefully shut them down. No additional configuration is needed, this guide is intended for users that need to tweak this mechanism.

The default graceful shutdown period is 1 hour, but it can be configured as follows:

apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
metadata:
  name: trino
spec:
  # ...
  workers:
    config:
      gracefulShutdownTimeout: 1h
    roleGroups:
      default:
        replicas: 1

Implementation

Once a worker Pod is asked to terminate, the PreStop hook is executed and the following timeline occurs:

The worker goes into SHUTTING_DOWN state.
The worker sleeps for 30 seconds to ensure that the coordinator has noticed the shutdown and stops scheduling new tasks on the worker.
The worker now waits till all tasks running on it complete. This takes as long as the longest running query takes.
The worker sleeps for 30 seconds to ensure that the coordinator has noticed that all tasks are complete
The PreStop hook will never return, but the JVM will be shut down by the graceful shutdown mechanism.
If the graceful shutdown doesn’t complete quick enough (e.g. a query runs longer than the graceful shutdown period), after <graceful shutdown period> + 30s of step 2 + 30s of step 4 + 10s safety overhead the Pod gets killed, regardless if it has shut down gracefully or not. This is achieved by setting terminationGracePeriodSeconds on the worker Pods. Currently running queries on the worker will fail and cannot be recovered.

Shutdown triggers

A shutdown operation can be triggered by multiple factors but a particularly important one is the renewal of the TLS certificates. This is important because it’s a recurring event at regular intervals in the lifecycle of a Trino cluster. It is also an automatic process initiated by the commons operator when the TLS certificates issued by the secret operator are about to expire. Historically the lifetimes of these certificates have changed but they are always kept short (around one day) by default.

It is recommended to set an explicit lifetime that is appropriate for your situation as described in Temporary credentials lifetime.

To set the certificate lifetime to a fortnight, thus having the Trino pods restarted every two weeks, use the example below:

spec:
  workers:
    config:
      requestedSecretLifetime: 14d

Implications

All queries that take less than the minimal graceful shutdown period of all roleGroups (1 hour as a default) are guaranteed to not be disturbed by regular termination of Pods. They can obviously still fail when, for example, a Kubernetes node dies or gets rebooted before it is fully drained.

Because of this, the operator automatically restricts the execution time of queries to the minimal graceful shutdown period of all roleGroups using the Trino configuration query.max-execution-time=3600s when fault tolerant execution is not configured. This causes all queries that take longer than 1 hour to fail with the error message Query failed: Query exceeded the maximum execution time limit of 3600s.00s.

However, when fault tolerant execution is enabled, the query.max-execution-time restriction is not applied since queries can be automatically retried in case of failures, allowing them to run indefinitely without being cancelled by worker restarts.

In case you need to execute queries that take longer than the configured graceful shutdown period and do not want to configure fault tolerant execution, you can increase the query.max-execution-time property as follows:

spec:
  coordinators:
    configOverrides:
      config.properties:
        query.max-execution-time: 24h

Keep in mind, that queries taking longer than the graceful shutdown period are now subject to failure when a Trino worker gets shut down.

Authorization requirements

When you are not using OPA for authorization, the user graceful-shutdown-user is not allowed to gracefully shut down workers. If you need graceful shutdown you need to use OPA or need to make sure graceful-shutdown-user is allowed to gracefully shut down workers (e.g. having you own authorizer or patching Trino).

In case you use OPA to authorize Trino requests, you need to make sure the user graceful-shutdown-user is authorized to trigger a graceful shutdown of the workers.

If you use rules provided by Stackable, this permission is automatically granted. If you use your own custom rego rules, you can achieve this by adding the following rule to grant graceful-shutdown-user the permissions to issue a graceful shutdown.

# Allow graceful shutdown, see https://docs.stackable.tech/home/stable/trino/usage-guide/operations/graceful-shutdown/#_authorization_requirements
allow if {
  input.action.operation == "WriteSystemInformation"
  input.context.identity.user == "graceful-shutdown-user"
}

In case the user graceful-shutdown-user does not have the permission to gracefully shut down a worker, the error message curl: (22) The requested URL returned error: 403 Forbidden will be shown in the worker log and the worker will shut down immediately.