Graceful shutdown
You can configure the graceful shutdown as described in Graceful shutdown.
Graceful shutdown only works if you enabled authorization using OPA. See Authorization requirements for details. |
Coordinators
As a default, coordinators have 15 minutes
to terminate gracefully.
The coordinator process receives a SIGTERM
signal when Kubernetes wants to terminate the Pod.
After the graceful shutdown timeout runs out, and the process still did not exit, Kubernetes issues a SIGKILL
signal.
When a coordinator gets restarted, all currently running queries fail and cannot be recovered after the restart process is finished.
As of Trino version 451
this can not be prevented (e.g. by using multiple coordinators).
Workers
As a default, workers have 60 minutes
to terminate gracefully.
Trino supports gracefully shutting down workers.
This operator always adds a PreStop
hook to gracefully shut them down.
No additional configuration is needed, this guide is intended for users that need to tweak this mechanism.
The default graceful shutdown period is 1
hour, but it can be configured as follows:
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
metadata:
name: trino
spec:
# ...
workers:
config:
gracefulShutdownTimeout: 1h
roleGroups:
default:
replicas: 1
Implementation
Once a worker Pod is asked to terminate, the PreStop
hook is executed and the following timeline occurs:
-
The worker goes into
SHUTTING_DOWN
state. -
The worker sleeps for
30
seconds to ensure that the coordinator has noticed the shutdown and stops scheduling new tasks on the worker. -
The worker now waits till all tasks running on it complete. This takes as long as the longest running query takes.
-
The worker sleeps for
30
seconds to ensure that the coordinator has noticed that all tasks are complete -
The
PreStop
hook will never return, but the JVM will be shut down by the graceful shutdown mechanism. -
If the graceful shutdown doesn’t complete quick enough (e.g. a query runs longer than the graceful shutdown period), after
<graceful shutdown period> + 30s of step 2 + 30s of step 4 + 10s safety overhead
the Pod gets killed, regardless if it has shut down gracefully or not. This is achieved by settingterminationGracePeriodSeconds
on the worker Pods. Currently running queries on the worker will fail and cannot be recovered.
Shutdown triggers
A shutdown operation can be triggered by multiple factors but a particularly important one is the renewal of the TLS certificates. This is important because it’s a recurring event at regular intervals in the lifecycle of a Trino cluster. It is also an automatic process initiated by the commons operator when the TLS certificates issued by the secret operator are about to expire. Historically the lifetimes of these certificates have changed but they are always kept short (around one day) by default.
It is recommended to set an explicit lifetime that is appropriate for your situation as described in Temporary credentials lifetime. |
To set the certificate lifetime to a fortnight, thus having the Trino pods restarted every two weeks, use the example below:
spec:
workers:
config:
requestedSecretLifetime: 14d
Implications
All queries that take less than the minimal graceful shutdown period of all roleGroups (1
hour as a default) are guaranteed to not be disturbed by regular termination of Pods.
They can obviously still fail when, for example, a Kubernetes node dies or gets rebooted before it is fully drained.
Because of this, the operator automatically restricts the execution time of queries to the minimal graceful shutdown period of all roleGroups using the Trino configuration query.max-execution-time=3600s
.
This causes all queries that take longer than 1 hour to fail with the error message Query failed: Query exceeded the maximum execution time limit of 3600s.00s
.
In case you need to execute queries that take longer than the configured graceful shutdown period, you need to increase the query.max-execution-time
property as follows:
spec:
coordinators:
configOverrides:
config.properties:
query.max-execution-time: 24h
Keep in mind, that queries taking longer than the graceful shutdown period are now subject to failure when a Trino worker gets shut down.
Running into this issue can be circumvented by using Fault-tolerant execution, which is not supported natively yet.
Until native support is added, you will have to use configOverrides
to enable it.
Authorization requirements
When you are not using OPA for authorization, the user graceful-shutdown-user is not allowed to gracefully shut down workers.
If you need graceful shutdown you need to use OPA or need to make sure graceful-shutdown-user is allowed to gracefully shut down workers (e.g. having you own authorizer or patching Trino).
|
In case you use OPA to authorize Trino requests, you need to make sure the user graceful-shutdown-user
is authorized to trigger a graceful shutdown of the workers.
If you use rules provided by Stackable, this permission is automatically granted.
If you use your own custom rego rules, you can achieve this by adding the following rule to grant graceful-shutdown-user
the permissions to issue a graceful shutdown.
allow {
input.action.operation == "WriteSystemInformation"
input.context.identity.user == "graceful-shutdown-user"
}
In case the user graceful-shutdown-user
does not have the permission to gracefully shut down a worker, the error message curl: (22) The requested URL returned error: 403 Forbidden
will be shown in the worker log and the worker will shut down immediately.