-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Checking mergeability…
Don’t worry, you can still create the pull request.
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: apache/answer
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: apache/answer
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: dev
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
- 5 commits
- 24 files changed
- 6 contributors
Commits on May 30, 2026
-
feat: add reasoning content to AI conversation records and update rel…
…ated components (#1530) Fix #1524 Root cause DeepSeek's reasoning models stream reasoning_content alongside content. Answer ignored it, so follow-up requests failed with 400: The reasoning_content in the thinking mode must be passed back to the API, and the thinking text was never shown or saved. Fix - Capture reasoning_content from the stream and pass it back to theAPI on subsequent rounds. - Persist it with the conversation (new DB column via migrationv2.0.2). - Render it in the chat UI as a collapsible "Thinking…/Thoughts"panel above the answer. Compatibility Nullable column, omitempty field, UI hides the panel when empty — old conversations and non-reasoning models behave exactly as before. Demo https://github.com/user-attachments/assets/49b1a2a1-9133-4ac2-bbeb-860215a50285
Configuration menu - View commit details
-
Copy full SHA for 7c210a4 - Browse repository at this point
Copy the full SHA 7c210a4View commit details
Commits on Jun 3, 2026
-
Configuration menu - View commit details
-
Copy full SHA for 68085ab - Browse repository at this point
Copy the full SHA 68085abView commit details -
fix: avoid topic fallback for non-Latin titles via pragmatic ASCII tr…
…ansliteration (#1526) # fix: avoid `topic` fallback for non-Latin titles via pragmatic ASCII transliteration > **Scope update (in response to review):** this PR is intentionally broader than its original "Arabic-only" framing. The implementation changes URL slug generation for **every non-Latin, non-CJK script** that `slugify` previously stripped — see *Scope* below for the explicit list. The goal is *not* linguistically correct romanization; it is "avoid collapsing to `/topic` by producing a usable ASCII slug." ## What this PR is (and isn't) **Goal:** when a question title contains characters outside Basic Latin / Latin Extended / CJK Han, generate a URL slug that is a deterministic ASCII approximation instead of letting `slugify` strip everything and falling back to the literal `"topic"`. **Non-goal:** this is *not* a linguistically correct multi-language romanizer. The output is a machine-acceptable ASCII slug, not what a native speaker would choose. For example, `こんにちは` → `konnichiha` (not the more natural `kon'nichiwa`), `ไทย` → `aithy` (not `thai`). Treat the slug as an opaque, stable, indexable identifier — the path-after-`/questions/<id>/` is for SEO and shareability, the canonical reference is always the ID. ## The bug Pure non-Latin titles previously got stripped by `slugify.Slugify`, hit the empty-result fallback in `htmltext.UrlTitle`, and collapsed to the literal slug `"topic"`. On a live multilingual site, every Arabic / Thai / Japanese-hiragana / Korean / Hebrew / Cyrillic question ended up at `/questions/<id>/topic`. ## The fix `UrlTitle()` gets a `convertNonLatin` pre-step that mirrors the existing `convertChinese` pre-step pattern, using `github.com/mozillazg/go-unidecode` (same author as `go-pinyin` already in the repo, to minimise new-dep friction). ``` UrlTitle(title) → convertChinese(title) // pre-existing: Han-block → pinyin → convertNonLatin(title) // NEW: detect non-Latin letters → unidecode to ASCII → clearEmoji / slugify / url.QueryEscape / cutLongTitle (unchanged) ``` The non-Latin detector skips ASCII, Latin-1 Supplement, Latin Extended-A/B, and CJK Han. Inputs that hit none of those non-Latin letter categories short-circuit and return unchanged, so Latin-only and Chinese-only inputs remain byte-identical (pinned by tests). ## Scope — what scripts are affected This PR changes behavior for **any** title containing letters in scripts that `slugify` doesn't handle. Confirmed by tests in `pkg/htmltext/htmltext_test.go`: | Script | Example title | Before | After | | --- | --- | --- | --- | | Arabic | `كيف حالك` | `topic` | `kyf-hlk` | | Mixed Latin + Arabic | `مرحبا hello` | `hello` | `mrhb-hello` | | Thai | `ไทย ไทย` | `topic` | `aithy-aithy` | | Japanese hiragana | `こんにちは` | `topic` | `konnichiha` | | Korean | `안녕하세요` | `topic` | `annyeonghaseyo` | | Hebrew | `שלום עולם` | `topic` | `shlvm-vlm` | | Cyrillic | `Привет мир` | `topic` | `privet-mir` | **Unchanged:** | Case | Behavior | | --- | --- | | Pure Latin (`hello world`) | unchanged → `hello-world` | | Pure Chinese (`这是一个,标题,title`) | unchanged → `zhe-shi-yi-ge-biao-ti` (pinyin path) | | Japanese with Han-block kanji (`日本`) | unchanged → `ri-ben` (caught by pre-existing pinyin path; treated as Chinese reading, not Japanese — a pre-existing limitation, **not** introduced by this PR) | | Emoji only (`😂😂😂`) | unchanged → `topic` | | Empty / whitespace | unchanged → `topic` | ## Transliteration quality — explicit acknowledgement `go-unidecode` is a generic Unicode → ASCII approximation. It is **not** a per-language romanization library. Specifically: - It will pick *one* approximation per codepoint regardless of language context. `ใ` → `ai` (Thai romanization is `i` or `ai` depending on standard), `한` → `han`, `語` → `Yu` (Chinese pinyin reading even when used in Japanese), etc. - The result is *good enough* to be a stable, URL-safe, human-recognizable handle, but speakers of the source language will not consider it "correct." - It is deterministic, so the same title always produces the same slug — important since `url_title` is recomputed on every request. If maintainers prefer to scope this PR more narrowly (e.g. Arabic only, and reject Thai/Hebrew/Cyrillic/etc.), the detector in `containsNonLatin` can be tightened to specific Unicode blocks — but that means the other scripts continue to collapse to `topic`, which is the bug we're trying to fix. I'd argue the broader fix is preferable to a piecemeal one, but happy to narrow if you want. ## Live deployment / real-world verification This patch has been running in production on **[ask.namasoft.com](https://ask.namasoft.com)** (an Apache Answer instance we operate) since deployment, built directly from this branch via `docker compose build`. The site hosts Arabic-language questions, so the fix exercises the affected code path on every page load. Sample question URL on the deployed instance: > `https://ask.namasoft.com/questions/10010000000000115` The slug in the URL is the transliterated Arabic title rather than `topic`. No data migration was needed since `url_title` is computed on every request from `Title` and never persisted (see *Why this is safe to ship* below). ## Admin-configurable The transliteration is gated by a package-level `atomic.Bool` (default **on**, since the current behavior is objectively broken for affected users): - `htmltext.SetTransliterateNonLatin(enabled bool)` - `htmltext.IsTransliterateNonLatinEnabled() bool` This is deliberately the minimum surface needed to satisfy "the setting must be readable from `UrlTitle()`". A follow-up PR can add an admin UI section that calls `SetTransliterateNonLatin` on save and on startup, without having to re-plumb every `htmltext.UrlTitle` call site through `context.Context`. **Default choice — please confirm:** I picked **default-on** because the existing `topic` behavior is a bug for affected users. If you'd prefer default-off for strict backward compat on existing installs, flip the `init()` in `pkg/htmltext/htmltext.go` to `Store(false)` and surface the toggle as opt-in. ## Why this is safe to ship - `url_title` is **not** a persisted column. It's not on the `Question` entity in `internal/entity/question_entity.go`, no migration has ever added/dropped it, and every call site (`question_service.go`, `revision_service.go`, `vote_service.go`, search/report/review/rank/comment services, controllers, repos) recomputes it from `Title` at response-build time via `htmltext.UrlTitle(...)`. - That means the fix is read-only: existing rows light up with correct slugs on the next request, with no migration and no data rewrite. - Rollback is just redeploying the prior image; nothing on disk changes. ## Test coverage `pkg/htmltext/htmltext_test.go`: - **`TestUrlTitleTable`** — table-driven, one case per affected script (the full matrix above), plus: - `empty` → `topic` - `pure latin unchanged` → byte-identical to pre-fix - `pure chinese unchanged` → byte-identical to pre-fix (pins existing pinyin behavior) - `japanese kanji goes through pinyin path unchanged` → documents the pre-existing Han-block limitation - `emoji only falls back to topic` → unchanged - `long arabic truncates at cutLongTitle boundary` → exercises the 150-byte cap and UTF-8 boundary safety - **`TestUrlTitleTransliterationToggle`** — with the toggle off, non-Latin titles collapse to `topic` (pre-fix behavior); with it on, they transliterate. - Existing `TestUrlTitle` left untouched. Test plan for reviewers: - [ ] `go test ./pkg/htmltext/...` — all pass - [ ] Visit the live sample URL above and confirm slug is transliterated, not `topic` - [ ] Verify Chinese / Latin / emoji-only / empty behavior is byte-identical to `main` (covered by table tests) ## Out of scope (intentionally) - No admin UI / site setting plumbing in this PR — see *Admin-configurable* above. Happy to do the React `Non-Latin Languages Handling` admin page + `SiteType` + service / controller / migration in a follow-up if maintainers want it. - No change to the `"topic"` empty-result fallback. - No plugin interface for slug generation — mirrored the existing `convertChinese` pre-step pattern instead. - No per-language romanization library — this is an explicit non-goal; see *Transliteration quality* above. ## Issues / discussion I didn't find an existing upstream issue covering this — happy to be pointed at one if there is. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: LinkinStars <linkinstar@foxmail.com>
Configuration menu - View commit details
-
Copy full SHA for e884bb6 - Browse repository at this point
Copy the full SHA e884bb6View commit details
Commits on Jun 5, 2026
-
Configuration menu - View commit details
-
Copy full SHA for cece87f - Browse repository at this point
Copy the full SHA cece87fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 682811f - Browse repository at this point
Copy the full SHA 682811fView commit details
Loading
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff main...dev