Kubernetes Metrics Server + jtop on Jetson (Resource Monitoring)

Install Metrics Server

The Metrics Server is a Kubernetes add-on that collects CPU and memory usage data from nodes and pods. This data can then be viewed with kubectl top and is also used by Kubernetes for autoscaling decisions.

Step 1. Deploy Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

This command applies the official Metrics Server deployment manifest from the Kubernetes GitHub repository. It creates all the necessary resources (Deployment, Service, RBAC permissions) in the kube-system namespace.

Step 2. Check Node Metrics

kubectl top nodes

Displays CPU and memory usage for each node in the cluster. This helps you see the overall resource consumption at the node level.

Step 3. Check Pod Metrics

kubectl top pods --all-namespaces

Shows CPU and memory usage for each pod across all namespaces. Useful for identifying which workloads are consuming the most resources.

Troubleshooting: "error: Metrics API not available"

After installing Metrics Server, kubectl top may show Metrics API not available. Use the steps below to verify readiness and apply common fixes.

1) Check Metrics APIService status

kubectl get apiservice v1beta1.metrics.k8s.io -o wide
kubectl describe apiservice v1beta1.metrics.k8s.io

Expected: Available=True. If it is False or Unknown, the API aggregation layer cannot reach the Metrics Server service (pod not ready, DNS issue, or TLS issue).

2) Check the Deployment & Pods

kubectl -n kube-system get deploy,svc,pod -l k8s-app=metrics-server -o wide
kubectl -n kube-system describe deploy metrics-server
kubectl -n kube-system logs deploy/metrics-server --tail=200

Look for errors like x509: certificate signed by unknown authority, no such host, or connection refused. These indicate TLS or reachability issues towards kubelets.

3) Common fix: allow insecure kubelet TLS (lab/dev clusters)

If logs show x509 errors, add the flag below. This is acceptable for lab/dev; prefer proper certs in production.

kubectl -n kube-system edit deploy metrics-server

Under spec.template.spec.containers[0].args, ensure you have:

- --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
- --kubelet-insecure-tls

Explanation: Uses node InternalIP first (often required on SBCs/Jetson) and skips kubelet cert validation so scraping succeeds.

4) (Optional) Patch via one-liners

kubectl -n kube-system patch deploy metrics-server \
  --type='json' -p='[
    {"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP"},
    {"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}
  ]'

This adds the two args without opening an editor.

5) Wait for readiness and re-check

kubectl -n kube-system rollout status deploy/metrics-server
kubectl get apiservice v1beta1.metrics.k8s.io -o wide
kubectl top nodes
kubectl top pods --all-namespaces

Note: Metrics can take ~30–60 seconds to populate after the pod becomes Ready.

After adding the required flags and waiting for the rollout, you should see that the Metrics API is marked as Available=True. Running kubectl top nodes should now display CPU and memory usage for each node:

6) If still failing, verify cluster pre-reqs

DNS/Service networking: CoreDNS must be healthy. Check with kubectl -n kube-system get pods -l k8s-app=kube-dns.
Clock skew: Ensure node times are in sync (timedatectl). Large skew breaks TLS.
Firewall: Control-plane must reach node kubelets on their serving port (usually 10250).
Proxy vars: If cluster uses HTTP(S)_PROXY, set --kubelet-preferred-address-types as above and consider excluding node CIDRs in NO_PROXY.

Reference: fresh install steps

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl get apiservice v1beta1.metrics.k8s.io
kubectl top nodes
kubectl top pods --all-namespaces

If you immediately see the x509 error on a new cluster, apply Step 3's flags and try again.

Step 4. Detailed Troubleshooting: x509 Certificate Error

If you see an error like x509: cannot validate certificate for <node IP>, it means the Metrics Server cannot validate the kubelet's TLS certificate. To fix this, add the --kubelet-insecure-tls flag to the Metrics Server deployment.

kubectl edit deployment metrics-server -n kube-system

This opens the Metrics Server deployment manifest in your default editor. In the spec: section for the container, make sure the arguments look like this:

spec:
  containers:
  - name: metrics-server
    image: k8s.gcr.io/metrics-server/metrics-server:v0.6.3
    args:
    - --cert-dir=/tmp
    - --secure-port=4443
    - --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
    - --kubelet-insecure-tls   # <-- Add this line

Explanation of args:

--cert-dir=/tmp → Directory to store generated certificates.
--secure-port=4443 → Port for secure communication with the API server.
--kubelet-preferred-address-types → Tells Metrics Server which kubelet addresses to try (InternalIP, then Hostname, then ExternalIP).
--kubelet-insecure-tls → Skips kubelet certificate validation (useful for lab setups; not recommended in production).

Save and exit (press Esc, then type :wq! and hit Enter).

Step 5. Verify Metrics Again

kubectl top nodes

Run the command again. If configured correctly, you should now see resource metrics instead of TLS errors.

Install jtop (Jetson Monitoring Tool)

jtop is a monitoring tool specifically designed for NVIDIA Jetson devices. It provides real-time insights into CPU, GPU, memory, power, temperature, and processes.

Step 1. Install pip (if not already installed)

sudo apt-get install python3-pip

Installs pip, the Python package manager, which is required to install jtop.

Step 2. Install jtop

sudo -H pip3 install -U jetson-stats

Installs the jetson-stats package (which provides jtop). The -U flag ensures the package is upgraded to the latest version. The -H flag sets the HOME environment variable to avoid permission issues during installation with sudo.

Step 3. Launch jtop

sudo jtop

This starts the jtop terminal UI. It displays a live dashboard with key hardware stats:

CPU usage
GPU usage
Memory usage
Power consumption
Temperature
Active processes

Step 4. Monitor System Performance

jtop gives you real-time monitoring of Jetson hardware. This is especially useful for:

Diagnosing performance bottlenecks.
Detecting thermal throttling.
Ensuring workloads make optimal use of CPU/GPU resources.

Since jtop is tailored for Jetson devices, it offers more detailed hardware information than standard Kubernetes monitoring tools.

Conclusion

Metrics Server provides Kubernetes with cluster-level CPU and memory metrics, enabling resource-aware scheduling and autoscaling. jtop gives Jetson-specific insights into hardware usage. Using both together helps ensure your workloads are well-balanced and your Jetson device runs efficiently without overheating.