Skip to content

Commit 968e5bb

Browse files
committed
NO-JIRA: add Loki readiness check and restart logic to observability tests
Loki's ingester can enter a shutdown state over time, causing it to reject all writes with HTTP 503 while still responding to queries. This adds a readiness check before running observability tests and automatically restarts the Loki container if it's unhealthy. Also removes unnecessary retry wrappers around Loki queries since the suite setup now ensures Loki is ready before tests begin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> pre-commit.check-secrets: ENABLED
1 parent e2d7416 commit 968e5bb

3 files changed

Lines changed: 34 additions & 2 deletions

File tree

test/bin/manage_loki.sh

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ DEFAULT_HOST_PORT="3100"
1414

1515
usage() {
1616
cat - <<EOF
17-
${BASH_SOURCE[0]} (start|stop) [port]
17+
${BASH_SOURCE[0]} (start|stop|restart) [port]
1818
1919
-h Show this help.
2020
@@ -25,6 +25,8 @@ start [port]: Start Loki.
2525
stop: Stop Loki.
2626
The container name is assumed to be loki.
2727
28+
restart: Restart the Loki container.
29+
2830
EOF
2931
}
3032

@@ -52,6 +54,12 @@ action_start() {
5254
"${LOKI_IMAGE}" > /dev/null
5355
}
5456

57+
action_restart() {
58+
local container_name="loki"
59+
echo "Restarting Loki container ${container_name}"
60+
podman restart "${container_name}"
61+
}
62+
5563
if [ $# -eq 0 ]; then
5664
usage
5765
exit 1
@@ -60,7 +68,7 @@ action="${1}"
6068
shift
6169

6270
case "${action}" in
63-
start|stop)
71+
start|stop|restart)
6472
"action_${action}" "$@"
6573
;;
6674
-h)

test/resources/loki.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,15 @@ def _print_results(results: dict) -> None:
6868
_log(f"{log_line}")
6969

7070

71+
def check_loki_ready(host: str, port: int) -> None:
72+
url = f"http://{host}:{port}/ready"
73+
_log(f"Checking Loki readiness at {url}")
74+
response = requests.get(url, timeout=5)
75+
_log(f"Loki readiness response: {response.status_code} {response.text.strip()}")
76+
if response.status_code != 200:
77+
raise Exception(f"Loki is not ready: {response.status_code} {response.text.strip()}")
78+
79+
7180
def check_loki_query(host: str, port: int, query: str, limit: int = 10) -> None:
7281
try:
7382
from robot.libraries.BuiltIn import BuiltIn

test/suites/optional/observability.robot

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ Setup Suite And Prepare Test Host
7272
Command Should Work sudo firewall-cmd --reload
7373
# Configure observability settings
7474
Check Required Observability Variables
75+
Ensure Loki Is Ready
7576
Set Test OTEL Configuration
7677
# We need to do something to the cluster to generate new kube events
7778
Create Hello MicroShift Pod
@@ -88,6 +89,20 @@ Check Required Observability Variables
8889
${string_value} Convert To String ${LOKI_HOST}
8990
Should Not Be Empty ${string_value} LOKI_HOST variable is required
9091

92+
Ensure Loki Is Ready
93+
[Documentation] Check if Loki's ingester is healthy, restart the container if not.
94+
... Loki's ingester can enter a shutdown state over time, causing it to
95+
... reject all writes with HTTP 503 while still responding to queries.
96+
FOR ${attempt} IN RANGE 5
97+
${status}= Run Keyword And Return Status
98+
... Check Loki Ready ${LOKI_HOST} ${LOKI_PORT}
99+
IF ${status} RETURN
100+
Log Loki is not ready (attempt ${attempt}), restarting container console=True
101+
Local Command Should Work ./bin/manage_loki.sh restart
102+
Sleep 2s
103+
END
104+
Fail Loki did not become ready after 5 restart attempts
105+
91106
Set Test OTEL Configuration
92107
[Documentation] Set Test OTEL Configuration
93108

0 commit comments

Comments
 (0)