NovaMed System Workflow

👤

User

Browser

→

⚛️

React App

Vite + TypeScript

→

🔄

React Router

Route matching

→

📦

AppContext

Global State

⇄

🌐

WordPress API

REST + WooCommerce

→

📄

Page Components

Render UI

→

✅

Hiển thị kết quả

User sees result

🛒 Shop / Product Flow

Navigate /shop

React Router render ShopPage component

fetchProducts()

GET /wp-json/wc/v3/products → parse JSON

applyFilters()

category / price / sort → filteredProducts state → re-render

addToCart()

CartItem[] → AppContext → localStorage persist

💳 Checkout Flow

Navigate /checkout

Load cart items từ AppContext

Điền form

firstName / lastName / phone / address / city...

Chọn thanh toán

COD / SePay QR / Chuyển khoản

POST order

/wp-json/wc/v3/orders → nhận Order ID

🔐 Auth Flow

POST /jwt-auth/v1/token

username + password → WordPress JWT

Chọn 2FA method

OTP (email) hoặc Voice ID

Xác thực 2FA

OTP 6 chữ số / Voice biometrics matching

JWT Token stored

userProfile set → Dashboard access

① Audio Input

② STT

③ AI Processing

④ Intent Parsing

⑤ Action Execution

⑥ Output

🎙️Microphone

getUserMedia()

VAD Service

• Web Audio API FFT
• 85–3400 Hz band
• Wind/click filter
• Spectral analysis

Wake Word

"Nova ơi" → activate
Fuzzy matching + tonal
Exponential backoff
Auto resume

📝Web Speech API

SpeechRecognition

Noise Filter

isNoiseTranscript()
• ≤1 ký tự → skip
• Filler words skip
• Repeat pattern skip

Error Handling

no-speech → retry
Max 6 retries
Exp. backoff
Auto-resume

🤖Modal / Qwen

Qwen3.5-35B-A3B

MAIN

SGLang Server

POST /v1/chat/completions
tool_choice: auto
SSE streaming
FastAPI endpoint

Modal Infra

GPU: L40S 48GB
max 5 containers
Idle: 30 phút
Conc: 10 req/pod

🗺️Intent Parser

mapToolCallToIntent()

Tool Call → Intent

finish_reason: tool_calls
→ ActionName + args
→ ParsedIntent
→ AgentIntent

Multi-action

multi_action tool
→ actions[] array
→ sequential steps
(e.g. navigate+search)

⚡Action Executor

actionExecutor.ts

30+ Actions

navigate / search
add_to_cart / filter
fill_form / submit
login / input_otp

ExecutorContext

navigate(), addToCart()
applyFilters()
logout(), showToast()
Fresh per step

🔊TTS + UI Update

speak() + render

TTS Chain

ElevenLabs (AI voice)
→ fallback Web Speech
vi / en / jp langs
Rate: 1.0x

UI Feedback

Chat panel update
Element highlight
Toast toast
→ resume listening

🔁

Sau khi TTS hoàn thành → tự động resume startListening() sau 500ms delay → vòng lặp mới. Nếu user nói "dừng lại" → isStoppedRef = true → vào idle, dừng hoàn toàn.

🖥️

Modal Container

GPU: L40S 48GB
CUDA 12.1 + Python 3.11
Startup timeout: 600s
Max containers: 5

🤖

Qwen3.5-35B-A3B

SGLang server
127.0.0.1:30000
MoE: 3B active params
GPTQ-Int4 + moe_wna16

🗄️

ChromaDB

Persistent Volume
collections: products
collections: knowledge
Cosine similarity index

🔢

E5-large Embed

multilingual-e5-large
"query: " prefix
"passage: " prefix
Pre-baked in image

⟳ Agentic Loop — max 3 iterations

📨 API Request (Frontend)

POST /chat

{ message, history[-20], context }

Build page_context

page / cart / user / selectedProduct

Slice history

10 turns / 50 000 tokens max

→

🧠 LLM Call #1 — Intent Detection

tool_choice: auto

2 tools: search_products, search_knowledge

temp 0.1 — thinking OFF

max_tokens: 512 → fast response

finish_reason?

"tool_calls" → execute tools
"stop" → skip to synthesis

→

🔧 Parallel Tool Execution

asyncio.gather()

Max 2 tools per iteration

Timeout: 5s / tool

E5-large embed → ChromaDB query

Similarity ≥ 0.3

Top 5 products / Top 3 knowledge docs

✍️ LLM Call #2 — Synthesis + SSE Stream

Inject tool_results

Build synthesis prompt với context từ tools

temp 0.7, stream: true

thinking: OFF, max_tokens: 1024

Stream text chunks

Stop streaming khi gặp <json> tag

Emit SSE events

text_chunk → action → sources → done

→

📋 Parse Action & SSE Response

_parse_json_response()

Extract từ <json>…</json> trong LLM text

Validate action

30+ valid actions → fallback "chat" nếu unknown

Multi-action support

"actions": [] → sequential execution plan

SSE to Frontend

data: {"type":"action","action":"search",...}

📄product.csv

~600KB, hàng nghìn SP

→

🧹BeautifulSoup

Strip HTML tags từ desc

→

🔗Chunk Builder

Tên + Danh mục + Giá + Mô tả

→

🔢E5-large Embed

"passage: " prefix → vector

→

🗄️ChromaDB Write

Persistent Volume

→

✅Ready for Search

products + knowledge cols

🎙️ Voice Input Phase

User nói: "Tìm thuốc ho"

VAD phát hiện giọng trong 85–3400 Hz band

Web Speech API nhận dạng

transcript = "Tìm thuốc ho" → isFinal = true

Noise filter pass

3 ký tự unicode, không phải filler → OK

processCommand("Tìm thuốc ho")

VoiceAgentContext kích hoạt pipeline

🤖 AI Processing Phase

POST /chat → Modal FastAPI

{ message: "Tìm thuốc ho", history, context }

LLM #1: Intent Detection

Qwen → finish_reason: "tool_calls"
→ search_products(query="thuốc ho")

ChromaDB vector search

E5 embed → top 5 products, similarity ≥ 0.3

LLM #2: Synthesis + Stream

Inject kết quả → tạo câu trả lời + action JSON

⚡ Action & Output Phase

SSE: action = "search"

payload: { query: "thuốc ho" } về Frontend

mapToolCallToIntent()

→ AgentIntent { type: "search", entities: {...} }

executeAction("search")

setSearchQuery("thuốc ho") → filteredProducts update → ShopPage re-render

TTS + Chat panel

"Tôi đã tìm thấy X sản phẩm thuốc ho..."
→ resume listening sau 500ms

Ước tính thời gian:

STT ~300ms

LLM #1 ~500ms

ChromaDB <50ms

LLM #2 (1st token) ~400ms

Action + TTS ~200ms

Total ~1.5s (warm)

NovaMed — System Workflow

① Luồng Web App bình thường

Navigate /shop

fetchProducts()

applyFilters()

addToCart()

Navigate /checkout

Điền form

Chọn thanh toán

POST order

POST /jwt-auth/v1/token

Chọn 2FA method

Xác thực 2FA

JWT Token stored

② Luồng Voice AI Agent — End-to-End

③ Modal Backend — Agentic Loop (Chi tiết)

POST /chat

Build page_context

Slice history

tool_choice: auto

temp 0.1 — thinking OFF

finish_reason?

asyncio.gather()

Timeout: 5s / tool

Similarity ≥ 0.3

Inject tool_results

temp 0.7, stream: true

Stream text chunks

Emit SSE events

_parse_json_response()

Validate action

Multi-action support

SSE to Frontend

④ Data Ingest Pipeline (ingest.py)

⑤ Full Lifecycle — "Tìm thuốc ho" → Kết quả

User nói: "Tìm thuốc ho"

Web Speech API nhận dạng

Noise filter pass

processCommand("Tìm thuốc ho")

POST /chat → Modal FastAPI

LLM #1: Intent Detection

ChromaDB vector search

LLM #2: Synthesis + Stream

SSE: action = "search"

mapToolCallToIntent()

executeAction("search")

TTS + Chat panel