⚡ System Architecture

NovaMed — System Workflow

Luồng thực thi toàn hệ thống: Web App & Voice AI Agent (Modal / Qwen)

① Luồng Web App bình thường

👤
User
Browser
⚛️
React App
Vite + TypeScript
🔄
React Router
Route matching
📦
AppContext
Global State
🌐
WordPress API
REST + WooCommerce
📄
Page Components
Render UI
Hiển thị kết quả
User sees result
🛒 Shop / Product Flow

Navigate /shop

React Router render ShopPage component

fetchProducts()

GET /wp-json/wc/v3/products → parse JSON

applyFilters()

category / price / sort → filteredProducts state → re-render

addToCart()

CartItem[] → AppContext → localStorage persist

💳 Checkout Flow

Navigate /checkout

Load cart items từ AppContext

Điền form

firstName / lastName / phone / address / city...

Chọn thanh toán

COD / SePay QR / Chuyển khoản

POST order

/wp-json/wc/v3/orders → nhận Order ID

🔐 Auth Flow

POST /jwt-auth/v1/token

username + password → WordPress JWT

Chọn 2FA method

OTP (email) hoặc Voice ID

Xác thực 2FA

OTP 6 chữ số / Voice biometrics matching

JWT Token stored

userProfile set → Dashboard access

② Luồng Voice AI Agent — End-to-End

① Audio Input
② STT
③ AI Processing
④ Intent Parsing
⑤ Action Execution
⑥ Output
🎙️Microphone
getUserMedia()
VAD Service
• Web Audio API FFT
• 85–3400 Hz band
• Wind/click filter
• Spectral analysis
Wake Word
"Nova ơi" → activate
Fuzzy matching + tonal
Exponential backoff
Auto resume
📝Web Speech API
SpeechRecognition
Noise Filter
isNoiseTranscript()
• ≤1 ký tự → skip
• Filler words skip
• Repeat pattern skip
Error Handling
no-speech → retry
Max 6 retries
Exp. backoff
Auto-resume
🤖Modal / Qwen
Qwen3.5-35B-A3B
MAIN
SGLang Server
POST /v1/chat/completions
tool_choice: auto
SSE streaming
FastAPI endpoint
Modal Infra
GPU: L40S 48GB
max 5 containers
Idle: 30 phút
Conc: 10 req/pod
🗺️Intent Parser
mapToolCallToIntent()
Tool Call → Intent
finish_reason: tool_calls
→ ActionName + args
→ ParsedIntent
→ AgentIntent
Multi-action
multi_action tool
→ actions[] array
→ sequential steps
(e.g. navigate+search)
Action Executor
actionExecutor.ts
30+ Actions
navigate / search
add_to_cart / filter
fill_form / submit
login / input_otp
ExecutorContext
navigate(), addToCart()
applyFilters()
logout(), showToast()
Fresh per step
🔊TTS + UI Update
speak() + render
TTS Chain
ElevenLabs (AI voice)
→ fallback Web Speech
vi / en / jp langs
Rate: 1.0x
UI Feedback
Chat panel update
Element highlight
Toast toast
→ resume listening
🔁
Sau khi TTS hoàn thành → tự động resume startListening() sau 500ms delay → vòng lặp mới. Nếu user nói "dừng lại" → isStoppedRef = true → vào idle, dừng hoàn toàn.

③ Modal Backend — Agentic Loop (Chi tiết)

🖥️
Modal Container
GPU: L40S 48GB
CUDA 12.1 + Python 3.11
Startup timeout: 600s
Max containers: 5
🤖
Qwen3.5-35B-A3B
SGLang server
127.0.0.1:30000
MoE: 3B active params
GPTQ-Int4 + moe_wna16
🗄️
ChromaDB
Persistent Volume
collections: products
collections: knowledge
Cosine similarity index
🔢
E5-large Embed
multilingual-e5-large
"query: " prefix
"passage: " prefix
Pre-baked in image
⟳ Agentic Loop — max 3 iterations
📨 API Request (Frontend)

POST /chat

{ message, history[-20], context }

Build page_context

page / cart / user / selectedProduct

Slice history

10 turns / 50 000 tokens max

🧠 LLM Call #1 — Intent Detection

tool_choice: auto

2 tools: search_products, search_knowledge

temp 0.1 — thinking OFF

max_tokens: 512 → fast response

finish_reason?

"tool_calls" → execute tools
"stop" → skip to synthesis

🔧 Parallel Tool Execution

asyncio.gather()

Max 2 tools per iteration

Timeout: 5s / tool

E5-large embed → ChromaDB query

Similarity ≥ 0.3

Top 5 products / Top 3 knowledge docs

✍️ LLM Call #2 — Synthesis + SSE Stream

Inject tool_results

Build synthesis prompt với context từ tools

temp 0.7, stream: true

thinking: OFF, max_tokens: 1024

Stream text chunks

Stop streaming khi gặp <json> tag

Emit SSE events

text_chunk → action → sources → done

📋 Parse Action & SSE Response

_parse_json_response()

Extract từ <json>…</json> trong LLM text

Validate action

30+ valid actions → fallback "chat" nếu unknown

Multi-action support

"actions": [] → sequential execution plan

SSE to Frontend

data: {"type":"action","action":"search",...}

④ Data Ingest Pipeline (ingest.py)

📄product.csv
~600KB, hàng nghìn SP
🧹BeautifulSoup
Strip HTML tags từ desc
🔗Chunk Builder
Tên + Danh mục + Giá + Mô tả
🔢E5-large Embed
"passage: " prefix → vector
🗄️ChromaDB Write
Persistent Volume
Ready for Search
products + knowledge cols

⑤ Full Lifecycle — "Tìm thuốc ho" → Kết quả

🎙️ Voice Input Phase

User nói: "Tìm thuốc ho"

VAD phát hiện giọng trong 85–3400 Hz band

Web Speech API nhận dạng

transcript = "Tìm thuốc ho" → isFinal = true

Noise filter pass

3 ký tự unicode, không phải filler → OK

processCommand("Tìm thuốc ho")

VoiceAgentContext kích hoạt pipeline

🤖 AI Processing Phase

POST /chat → Modal FastAPI

{ message: "Tìm thuốc ho", history, context }

LLM #1: Intent Detection

Qwen → finish_reason: "tool_calls"
→ search_products(query="thuốc ho")

ChromaDB vector search

E5 embed → top 5 products, similarity ≥ 0.3

LLM #2: Synthesis + Stream

Inject kết quả → tạo câu trả lời + action JSON

⚡ Action & Output Phase

SSE: action = "search"

payload: { query: "thuốc ho" } về Frontend

mapToolCallToIntent()

→ AgentIntent { type: "search", entities: {...} }

executeAction("search")

setSearchQuery("thuốc ho") → filteredProducts update → ShopPage re-render

TTS + Chat panel

"Tôi đã tìm thấy X sản phẩm thuốc ho..."
→ resume listening sau 500ms

Ước tính thời gian:
STT ~300ms
LLM #1 ~500ms
ChromaDB <50ms
LLM #2 (1st token) ~400ms
Action + TTS ~200ms
Total ~1.5s (warm)