As a default, coordinators have
15 minutes to terminate gracefully.
The coordinator process will receive a
SIGTERM signal when Kubernetes wants to terminate the Pod.
After the graceful shutdown timeout runs out, and the process still didn’t exit, Kubernetes will issue a
When a coordinator gets restarted, all currently running queries will fail and cannot be recovered after the restart process is finished.
As of Trino version
428 this can not be prevented (e.g. by using multiple coordinators).
As a default, Coordinators have
60 minutes to terminate gracefully.
Trino supports gracefully shutting down workers.
This operator always adds a
PreStop hook to gracefully shut them down.
No additional configuration is needed, this guide is intended for users that need to tweak this mechanism.
The default graceful shutdown period is
1 hour, but it can be configured as follows:
Once a worker Pod is asked to terminate, the
PreStop hook is executed and the following timeline occurs:
The worker goes into
The worker sleeps for
30seconds to ensure that the coordinator has noticed the shutdown and stops scheduling new tasks on the worker.
The worker now waits till all tasks running on it complete. This will take as long as the longest running query takes.
The worker sleeps for
30seconds to ensure that the coordinator has noticed that all tasks are complete
PreStophook will never return, but the JVM will be shut down by the graceful shutdown mechanism.
If the graceful shutdown doesn’t complete quick enough (e.g. a query runs longer than the graceful shutdown period), after
<graceful shutdown period> + 30s of step 2 + 30s of step 4 + 10s safety overheadthe Pod gets killed, regardless if it has shut down gracefully or not. This is achieved by setting
terminationGracePeriodSecondson the worker Pods. Currently running queries on the worker will fail and cannot be recovered.
As of SDP version
The TLS certificate lifetime can be configured using
All queries that take less than the minimal graceful shutdown period of all roleGroups (
1 hour as a default) are guaranteed to not be disturbed by regular termination of Pods.
They can obviously still fail when, for example, a Kubernetes node dies or gets rebooted before it is fully drained.
Because of this, the operator automatically restricts the execution time of queries to the minimal graceful shutdown period of all roleGroups using the Trino configuration
This causes all queries that take longer than 1 hour to fail with the error message
Query failed: Query exceeded the maximum execution time limit of 3600s.00s.
In case you need to execute queries that take longer than the configured graceful shutdown period, you need to increase the
query.max-execution-time property as follows:
Please keep in mind, that queries taking longer than the graceful shutdown period are now subject to failure when a Trino worker gets shut down.
Running into this issue can be circumvented by using Fault-tolerant execution, which is not supported natively yet.
Until native support is added, you will have to use
configOverrides to enable it.
When you are not using OPA for authorization, the user
admin is not allowed to gracefully shut down workers.
If you need graceful shutdown you need to use OPA or need to make sure
admin is allowed to gracefully shut down workers (e.g. having you own authorizer or patching Trino).
In case you use OPA to authorize Trino requests, you need to make sure the user
admin is authorized to trigger a graceful shutdown of the workers.
You can achieve this e.g. by adding the following rule, which grants
admin the permissions to do anything - including graceful shutdown.
input.context.identity.user == "admin"
In case the user
admin does not have the permission to gracefully shut down a worker, the error message
curl: (22) The requested URL returned error: 403 Forbidden will be shown in the worker log and the worker will shut down immediately.
|We plan to add CustomResources, so that you can define your Trino ACLs via Kubernetes objects. In this case the trino-operator will generate the rego-rules for you, and will add the needed rules for graceful shutdown for you. Until then, you need to grant the permission yourself.