Logging with a Vector log aggregator

This tutorial teaches you how to deploy a Vector aggregator together with a product - in this case ZooKeeper - and how to configure both of them so the logs are sent from the product to the aggregator. Logging on the Stackable Data Platform is always configured in the same way, so you can use this knowledge to configure logging in any product that you want to deploy.

Prerequisites:

  • a k8s cluster available, or kind installed

  • stackablectl installed

  • Helm installed to deploy Vector

  • basic knowledge of how to create resources in Kubernetes (i.e. kubectl apply -f <filename>.yaml) and inspect them (kubectl get or a tool like k9s)

Install the ZooKeeper operator

Install the Stackable Operator for Apache ZooKeeper and its dependencies, so you can deploy a ZooKeeper instance later.

stackablectl release install -i secret -i commons -i listener -i zookeeper 23.11

Install the Vector aggregator

Install the Vector aggregator using Helm. First, create a vector-aggregator-values.yaml file with the Helm values:

role: Aggregator
customConfig:
  sources:
    vector:  (1)
      address: 0.0.0.0:6000
      type: vector
      version: "2"
  sinks:
    console:  (2)
      type: console
      inputs:
        - vector
      encoding:
        codec: json
      target: stderr
1 define a source of type vector which listens to incoming log messages at port 6000.
2 define a console sink, logging all received logs to stderr.

Deploy Vector with these values using Helm:

helm install \
  --wait \
  --values vector-aggregator-values.yaml \
  vector-aggregator vector/vector

This is a minimal working configuration. The source should be defined in this way, but you can configure different sinks, depending on your needs. You can find an overview of all sinks in the Vector documentation, specifically the Elasticsearch sink might be useful, it also works when configured with OpenSearch.

To make the Vector aggregator discoverable to ZooKeeper, deploy a discovery ConfigMap called vector-aggregator-discovery. Create a file called vector-aggregator-discovery.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-aggregator-discovery
data:
  ADDRESS: vector-aggregator:6000

and apply it:

kubectl apply -f vector-aggregator-discovery.yaml

Install ZooKeeper

Now that the aggregator is running, you can install a ZooKeeper cluster which is configured to send logs to the aggregator.

Create a file called zookeeper.yaml with the following ZookeeperCluster definition:

---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
spec:
  image:
    productVersion: 3.8.0
    stackableVersion: "0.0.0-dev"
  clusterConfig:
    vectorAggregatorConfigMapName: vector-aggregator-discovery  (1)
  servers:
    roleGroups:
      default:
        replicas: 3
        config:
          logging:  (2)
            enableVectorAgent: true
            containers:
              vector:
                file:
                  level: WARN
              zookeeper:
                console:
                  level: INFO
                file:
                  level: INFO
                loggers:
                  ROOT:
                    level: INFO
                  org.apache.zookeeper.server.NettyServerCnxn:
                    level: NONE
1 This is the reference to the discovery ConfigMap created in the previous step.
2 This is the logging configuration, where logging is first enabled and then a few settings are made.

and apply it:

kubectl apply -f zookeeper.yaml
You can learn more about how to configure logging in a product at the logging concept documentation.

Watch the logs

During startup, ZooKeeper already prints out log messages. Vector was configured to print the aggregated logs to stderr, so if you look at the logs of the Vector pod, you will see the ZooKeeper logs:

kubectl logs vector-aggregator-0 | grep "zookeeper.version=" | jq

You should see a JSON object per ZooKeeper replica printed that looks like

{
  "cluster": "simple-zk",
  "container": "zookeeper",
  "file": "zookeeper.log4j.xml",
  "level": "INFO",
  "logger": "org.apache.zookeeper.server.ZooKeeperServer",
  "message": "Server environment:zookeeper.version=3.8.0-5a02a05eddb59aee6ac762f7ea82e92a68eb9c0f, built on 2022-02-25 08:49 UTC",
  "namespace": "default",
  "pod": "simple-zk-server-default-0",
  "role": "server",
  "roleGroup": "default",
  "source_type": "vector",
  "timestamp": "2023-11-06T10:30:40.223Z"
}

The JSON object contains a timestamp, the log message, log level and some additional information.

You can see the same log line in the log output of the ZooKeeper container:

kubectl logs \
  --container=zookeeper simple-zk-server-default-0 \
  | grep "zookeeper.version="
2023-11-06 10:30:40,223 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:zookeeper.version=3.8.0-5a02a05eddb59aee6ac762f7ea82e92a68eb9c0f, built on 2022-02-25 08:49 UTC

Congratulations, this concludes the tutorial!

What’s next?

Look into different sink configurations which are more suited to production use in the sinks overview documetation or learn more about how logging works on the platform in the concepts documentation.