Skip to content

docs+fix: bug_chat_long_conv dogfood artifacts + dashboard truncation fix#73

Merged
SoundMindsAI merged 3 commits into
mainfrom
docs/dogfood-bug-chat-long-conv
May 13, 2026
Merged

docs+fix: bug_chat_long_conv dogfood artifacts + dashboard truncation fix#73
SoundMindsAI merged 3 commits into
mainfrom
docs/dogfood-bug-chat-long-conv

Conversation

@SoundMindsAI

@SoundMindsAI SoundMindsAI commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

Two work products landed together:

  1. Dogfood artifacts from /idea-preflight + /bug-fix (Investigation mode) on bug_chat_long_conversation_truncation/.
  2. Fix for the pre-existing dashboard truncation bug that Gemini surfaced via F3/F4 — folded into this PR per the calibration discussion (~30-LOC bounded fix doesn't deserve idea-file deferral).

Files

File Source
idea.md (M) /idea-preflight Audit & Patch
bug_fix.md (A) /bug-fix Investigation mode
scripts/build_mvp1_dashboard.py (M) Dashboard truncation fix
backend/tests/unit/scripts/test_dashboard_truncation.py (A) + __init__.py (A) 13 unit tests
MVP1_DASHBOARD.md + mvp1_dashboard.html (M) Regenerated with new truncator
chore_mvp1_dashboard_truncation/idea.md (D) Removed — fix is no longer deferred

Dashboard fix detail

Root cause: _extract_idea_problem at scripts/build_mvp1_dashboard.py:127-139 was capping prose at 240 chars via raw para[:237] + "..." with no awareness of markdown link / inline-code / word boundaries. Most cuts landed in plain prose; after /idea-preflight on bug_chat_long_conversation_truncation the cut shifted to land mid-[label](url) and broke the markdown.

Fix: two new helpers — _safe_truncate_markdown(text, max_len) and _strip_unclosed_markdown(text) — replace the raw character cut with: sentence-boundary preference (. , ! , ? within the last 50 chars) → word-boundary fallback (last space) → strip unclosed [/]/(/)/`` markdown → append single-char ellipsis …`.

Before / after on the visible row:

-| bug_chat_long_conversation_truncation | Bug | ...messages ([age | — | — |
+| bug_chat_long_conversation_truncation | Bug | ...messages… | — | — |

Why expanded scope instead of separate PR

Conversation calibration: the original capture was a chore_mvp1_dashboard_truncation idea file. The user pointed out we should lean toward inlining fixes when discovered if the work is bounded (~30 LOC + tests) rather than defer behind idea files. The rubric is being formalized in a separate CLAUDE.md PR.

Test plan

  • 13 unit tests pass locally (pytest backend/tests/unit/scripts/)
  • Dashboard regenerates cleanly with new truncator (verified visually)
  • No regressions on other rows (most are now slightly cleaner; none worse)
  • CI runs (PR no longer in docs-only paths-ignore — script + tests trigger backend job)

🤖 Generated with Claude Code

…n-mode bug_fix.md

Captures the work products from this session's dogfood runs:

* idea.md — /idea-preflight Audit & Patch (7 edits across 1 file):
  - Refreshed §Problem to accurately describe the tool-group-preserving
    truncation helper and added ~5K fixed-overhead from system prompt
    + 19 tool definitions to the token-budget math
  - Removed Story 5.1 docs-sweep deferral rationale (shipped in PR #60)
  - Locked the JSONB-vs-table fork in §Proposed scope
  - Added tool-call group invariant requirement +
    chat_history_summarization_failed WARN fallback
  - New §Open questions for /spec-gen with recommended defaults
  - New §CLAUDE.md rule touchpoints (Rules #3, #5, #8, #10)
  - Refreshed §Related work

* bug_fix.md — Investigation-mode /bug-fix output (149 lines):
  - Problem / Reproduction / Root cause filled in with file:line
    citations against agent_chat.py
  - Owning layer locked: service; fix is additive (wrap existing
    helper with summarization, don't replace)
  - Fix design / Regression test / Rollout TBD pending user calls
    on the 3 open forks

* MVP1_DASHBOARD.md + mvp1_dashboard.html — regenerated by the
  mvp1-dashboard-regen pre-commit hook to reflect the new bug_fix.md
  sibling (41 features total).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request documents the investigation and design for resolving chat history truncation issues. It introduces a detailed bug fix investigation report and updates the feature idea with technical specifications for implementing rolling summarization in MVP2. Feedback indicates that the auto-generated dashboard files contain truncated descriptions that break markdown links and HTML formatting, necessitating improvements to the generation script.

Comment thread docs/00_overview/MVP1_DASHBOARD.md Outdated
| [chore_test_both_engines](../02_product/planned_features/chore_test_both_engines/idea.md) | Chore | `backend/tests/integration/test_clusters_api.py` only registers an **Elasticsearch** cluster in every test: | — | Idea (deferred from `infra_adapter_elastic` — refactor sweep, 2026-05-09) |
| [chore_trial_summary_single_query](../02_product/planned_features/chore_trial_summary_single_query/idea.md) | Chore | [`backend/app/db/repo/trial.py:aggregate_trials_summary`](../../../../backend/app/db/repo/trial.py) currently issues two SQL statements: | — | — |
| [bug_chat_long_conversation_truncation](../02_product/planned_features/bug_chat_long_conversation_truncation/idea.md) | Bug | `backend/app/services/agent_chat.send_user_message` defensively caps the OpenAI history at the most recent `HISTORY_MAX_MESSAGES = 100` messages and emits a `chat_history_truncated` WARN structlog lin | — | — |
| [bug_chat_long_conversation_truncation](../02_product/planned_features/bug_chat_long_conversation_truncation/idea.md) | Bug | [`backend/app/services/agent_chat.send_user_message`](../../../../backend/app/services/agent_chat.py) defensively caps the OpenAI history at the most recent `HISTORY_MAX_MESSAGES = 100` messages ([age | — | — |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The one-liner for this bug appears to be truncated, which breaks the markdown link and leaves the description incomplete. Since this file is auto-generated by the mvp1-dashboard-regen pre-commit hook, the generation script likely needs to be adjusted to handle long lines more gracefully, for example by truncating at a word boundary without breaking markdown syntax.

Comment thread docs/00_overview/mvp1_dashboard.html Outdated

</div>
<div class="one-liner">`backend/app/services/agent_chat.send_user_message` defensively caps the OpenAI history at the most recent `HISTORY_MAX_MESSAGES = 100` messages and emits a `chat_history_truncated` WARN structlog lin</div>
<div class="one-liner">[`backend/app/services/agent_chat.send_user_message`](../../../../backend/app/services/agent_chat.py) defensively caps the OpenAI history at the most recent `HISTORY_MAX_MESSAGES = 100` messages ([age</div>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The content for this one-liner div is truncated, which makes the description incomplete and includes broken markdown syntax. As this is an auto-generated file, the script that creates it should be updated to prevent truncating content in a way that breaks formatting and readability.

…s mid-markdown

Idea file for the pre-existing bug in scripts/build_mvp1_dashboard.py
that Gemini surfaced via F3 + F4 on PR #73. _extract_idea_problem
caps prose at 240 chars via raw `para[:237] + "..."` with no awareness
of markdown link balance, inline-code spans, or word boundaries.

Includes regenerated MVP1_DASHBOARD.md + mvp1_dashboard.html (42
features total now that this folder is added).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SoundMindsAI

Copy link
Copy Markdown
Owner Author

Gemini Code Assist review — adjudication

# Finding Severity Verdict Notes
F3 MVP1_DASHBOARD.md:79 one-liner truncated, breaks markdown link Medium Defer Pre-existing on main: every long idea description on main was already cut mid-sentence (verified across at least 5 other rows). My commit only shifted where the cut lands for one row — now mid-[label](url) instead of mid-text, which surfaces the bug more visibly. The actual fix is in scripts/build_mvp1_dashboard.py:127-139 (_extract_idea_problem does raw para[:237] + "..." with no markdown awareness). Captured as chore_mvp1_dashboard_truncation (commit b6f6504).
F4 mvp1_dashboard.html:623 same truncation in HTML output Medium Defer Same root cause as F3. Idea file covers both the markdown and HTML codepaths.

Net: 2 findings → 0 accept + 0 reject + 2 defer (both captured in chore_mvp1_dashboard_truncation/idea.md).

Not blocking this PR's merge — the truncation behavior is pre-existing on main; this PR didn't introduce it.

🤖 Generated with Claude Code

Fold the chore_mvp1_dashboard_truncation idea into this PR per the
calibration discussion: ~30-LOC bounded fix + 13 unit tests is small
enough to land inline rather than defer behind an idea file.

Root cause: `_extract_idea_problem` was capping prose at 240 chars
via raw `para[:237] + "..."` with no awareness of markdown link /
inline-code / word boundaries.

Fix: two new helpers — `_safe_truncate_markdown(text, max_len)` and
`_strip_unclosed_markdown(text)` — replace the raw character cut
with sentence-boundary preference + word-boundary fallback + strip
unclosed [/]/(/)/backtick markdown + single-char ellipsis `…`.

Tests: 13 cases in backend/tests/unit/scripts/test_dashboard_truncation.py
(all pass locally). Regenerated MVP1_DASHBOARD.md + mvp1_dashboard.html
with the new truncator. Deletes chore_mvp1_dashboard_truncation/ since
the fix is no longer deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SoundMindsAI SoundMindsAI changed the title docs(bug-chat-long-conv): land /idea-preflight patches + Investigation-mode bug_fix.md docs+fix: bug_chat_long_conv dogfood artifacts + dashboard truncation fix May 13, 2026
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…coveries"

The existing rule ("either fix it now if inline-cheap or capture an
idea file") has been operating without a sharp definition of
"inline-cheap." Result: default lean toward capture, even for
medium-sized fixes that would have been faster to inline.

Adds a rubric table covering five discovery shapes — ≤20 LOC in same
file, ≤50 LOC scope-compatible, design-surface, CI-gate-protecting,
cross-subsystem — each mapped to one of four actions: inline / same-
branch adjacent commit / adjacent PR off main / idea file.

Plus a default-lean clarification: when borderline, lean inline. The
original rule was written when the failure mode was forgetting to
capture; this addresses the opposite failure mode.

Origin: dogfood discussion during PR #71/#72/#73 around the
chore_mvp1_dashboard_truncation discovery — captured as an idea
file initially, then user pushed back asking whether inlining would
have been faster. Folded the fix into PR #73 and codifying the
lesson here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SoundMindsAI

Copy link
Copy Markdown
Owner Author

Updated adjudication — F3 + F4 now resolved by inline fix

The earlier verdict on F3/F4 was Defer (capture as chore_mvp1_dashboard_truncation/idea.md). Per the calibration discussion that followed, we folded the fix directly into this PR instead (commit 61cc59f).

# Original verdict Updated verdict What changed
F3 Defer (idea file) Resolved _safe_truncate_markdown + _strip_unclosed_markdown in scripts/build_mvp1_dashboard.py produce balanced markdown / single-char ellipsis. The bug_chat_long_conversation_truncation row now reads ...messages… (clean) instead of ...messages ([age (broken).
F4 Defer (idea file) Resolved Same fix — both the markdown and HTML codepaths use the new truncator.

Tests: 13 unit cases in backend/tests/unit/scripts/test_dashboard_truncation.py covering boundary detection, unclosed-link strip, unclosed-code-span strip, ellipsis form, pathological no-space input. All pass.

The chore_mvp1_dashboard_truncation/idea.md file is deleted in this PR since the fix is no longer deferred.

🤖 Generated with Claude Code

@SoundMindsAI SoundMindsAI merged commit 0dc1714 into main May 13, 2026
7 of 8 checks passed
@SoundMindsAI SoundMindsAI deleted the docs/dogfood-bug-chat-long-conv branch May 13, 2026 12:55
SoundMindsAI added a commit that referenced this pull request May 13, 2026
* docs: add inline-fix vs idea-file rubric to CLAUDE.md "Tangential discoveries"

The existing rule ("either fix it now if inline-cheap or capture an
idea file") has been operating without a sharp definition of
"inline-cheap." Result: default lean toward capture, even for
medium-sized fixes that would have been faster to inline.

Adds a rubric table covering five discovery shapes — ≤20 LOC in same
file, ≤50 LOC scope-compatible, design-surface, CI-gate-protecting,
cross-subsystem — each mapped to one of four actions: inline / same-
branch adjacent commit / adjacent PR off main / idea file.

Plus a default-lean clarification: when borderline, lean inline. The
original rule was written when the failure mode was forgetting to
capture; this addresses the opposite failure mode.

Origin: dogfood discussion during PR #71/#72/#73 around the
chore_mvp1_dashboard_truncation discovery — captured as an idea
file initially, then user pushed back asking whether inlining would
have been faster. Folded the fix into PR #73 and codifying the
lesson here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: tighten inline-fix rubric per Gemini F1+F2 on PR #74

F1 Accept: row 1 "no new tests needed" could be misread as a
shortcut to skip the Bug Fix Protocol's regression-test requirement.
Add explicit "NOT a bug fix; bug fixes always need a regression test"
carve-out so agents don't game the rubric.

F2 Accept: row 2 said ≤50 LOC, row 3 said >100 LOC — a 75 LOC
change fell into an undefined zone. Extend row 2 to ≤100 LOC so
the thresholds are continuous. (The principle is "bounded enough
that capture+resume costs exceed inline cost" — 100 LOC is
comfortably in that range for most scope-compatible fixes.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
1. Dashboard KPI label "Features done" → "Scoped items done" in
   scripts/build_mvp1_dashboard.py (HTML and Markdown outputs).
   Sub-text explains the feat_/infra_/chore_/epic_ scope. Gemini
   flagged this on PR #76. Regenerated dashboards now show 17/17
   (was 14/14 — incidental drift from Wave 1 archives).

2. Smoke test backend/tests/smoke/test_tutorial_path.py judgment-
   list deadline bumped 120s → 240s after two flake hits in one
   dev session (PRs #73 + #78, both passed on re-run). Doubled
   headroom, still well within the smoke job's 15-minute total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
* chore: CI hygiene sweep — dashboard KPI label + smoke timeout

1. Dashboard KPI label "Features done" → "Scoped items done" in
   scripts/build_mvp1_dashboard.py (HTML and Markdown outputs).
   Sub-text explains the feat_/infra_/chore_/epic_ scope. Gemini
   flagged this on PR #76. Regenerated dashboards now show 17/17
   (was 14/14 — incidental drift from Wave 1 archives).

2. Smoke test backend/tests/smoke/test_tutorial_path.py judgment-
   list deadline bumped 120s → 240s after two flake hits in one
   dev session (PRs #73 + #78, both passed on re-run). Doubled
   headroom, still well within the smoke job's 15-minute total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(smoke): use time.monotonic for deadline; accept Gemini PR #79

Wall-clock time.time() could shift mid-test if NTP adjusts the system
clock between deadline calculation and the while-loop check, causing
early termination or extended hang. monotonic is the right primitive
for duration measurement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…skill (#81)

Records the 2026-05-13 backlog sweep + new skill in state.md per
the standard "update state.md whenever a feature lands or a priority
shifts" rule:

- Updates "Last updated" line to reflect the sweep
- Adds 4 entries to "Most recent meaningful changes":
  - the 9-item backlog sweep (PRs #75-#80)
  - the /bug-fix skill (PRs #71-#72) + Investigation-mode dogfood on
    bug_chat_long_conversation_truncation (PR #73)
  - the CLAUDE.md inline-fix vs idea-file rubric (PR #74)
  - the docs-only paths-ignore CI filter (PR #70)
- Expands "Active feature" line with the remaining-backlog breakdown
  (6 inline + 3 /bug-fix + 7 needs-second-pass + 4 keep-deferred)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 3, 2026
…HSA-qx2v-qp2m-jg93) (#430)

* chore(security): pin CI actions/images by digest + fix postcss vuln (GHSA-qx2v-qp2m-jg93)

Resolves OSSF Scorecard supply-chain findings from the public code-scanning
surface (security/code-scanning).

Vulnerabilities (#72): postcss < 8.5.10 (moderate XSS via unescaped </style>
in CSS stringify) was pulled in transitively as next@16.2.6's bundled
postcss@8.4.31. Add a pnpm `overrides` for `postcss@<8.5.10` -> ^8.5.15 so the
whole tree (including Next's copy) resolves to 8.5.15; regenerate the lockfile.
`pnpm build` + full vitest (1008) green.

PinnedDependencies (~60 alerts): pin every GitHub Action `uses:` ref to its
40-char commit SHA with a trailing `# vX` comment (56 refs across all 5
workflows), pin the 4 pr.yml service-container images (postgres, redis,
elasticsearch, opensearch) by manifest digest, and pin the Dockerfile base
images (node:26-bookworm-slim x3, python:3.14-slim via an ARG PYTHON_DIGEST,
ghcr.io/astral-sh/uv:0.5.7) by digest. Both Dockerfiles pass `buildx --check`.

Dependabot already covers github-actions + docker weekly, so it keeps the
`uses:` SHA pins and Dockerfile FROM digests fresh.

Deliberately left (impractical to hash-pin, low value, not "images"):
- npmCommand `npm install -g pnpm@9` in ui/Dockerfile (#64/#65) — global tool
  install; corepack swap is a build-behavior change out of scope here.
- pipCommand `pip install -r website/requirements.txt` in deploy-docs.yml (#73)
  — docs-site only; needs a hash-locked requirements file (--require-hashes).
- Workflow `services.*.image` digests aren't auto-bumped by Dependabot
  (github-actions ecosystem updates `uses:` only) — manual refresh.

Tier-3 Scorecard findings (branch protection, code-review ratio, project age,
fuzzing, OpenSSF badge, SAST) are intrinsic/intentional and untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* chore(security): consolidate digest-pinned base images into single BASE_IMAGE ARG

Adjudicate both Gemini Code Assist findings on PR #430 (both accepted):

- Dockerfile (high): a separate ARG PYTHON_VERSION + ARG PYTHON_DIGEST is a
  silent footgun — `--build-arg PYTHON_VERSION=3.13` would still pull 3.14
  because the digest wins at pull time. Collapse to one ARG BASE_IMAGE
  (tag+digest together) so overrides are unambiguous.
- ui/Dockerfile (medium): the node digest was triplicated across the deps/
  builder/runner FROM lines. Declare it once as a top-level ARG BASE_IMAGE
  and reference ${BASE_IMAGE} in each stage.

Both pass `docker buildx build --check`. Dependabot's docker ecosystem updates
the `ARG ...=image:tag@sha256` form, so the pins stay fresh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs(state): record chore_scorecard_pin_deps_postcss merge (PR #430)

Prepend the PR #430 one-liner to state.md "Last 5 merges" (drop the now-6th
chore_template_library_expansion into the older-entries line) + branch/Last-
updated refresh; full narrative appended to state_history.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

---------

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 5, 2026
…card) (#456)

Resolves two OSSF Scorecard Pinned-Dependencies findings (code-scanning
#89, #73): the deploy-docs and build-guides-freshness workflows ran
`pip install -r website/requirements.txt`, which pins exact versions but
not by sha256 hash.

- Add website/requirements.lock — a fully hash-pinned (sha256) lock
  compiled from the curated website/requirements.txt via
  `uv pip compile --generate-hashes`, covering the full transitive tree.
- Both workflows now `pip install --require-hashes -r
  website/requirements.lock` (and key the pip cache on the lock).
- website/requirements.txt stays the human-curated top-level source with
  a regenerate-the-lock pointer in its header; the lock carries a
  GENERATED / do-not-edit-by-hand header with the same regen command.

requirements.txt is hand-bumped (Dependabot's pip ecosystem only watches
the repo root, not website/), so there is no auto-update drift between
the two files. Verified the require-hashes install succeeds in a clean
venv and resolves mkdocs-material 9.7.6 / pymdown-extensions 10.21.3 /
mkdocs-glightbox 0.5.2.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant