How to Use Semantic Caching in AILANG
This guide explains the mental model, patterns, and best practices for using AILANG's semantic caching system effectively.
Mental Model
SharedMem + SharedIndex is the cognitive working memory for AI agents.
Think of it as:
- SharedMem: A persistent key-value store where agents park their thoughts
- SharedIndex: A semantic search layer that finds related thoughts by meaning
- sem_frame: The standard "thought" format - ID, version, timestamp, content, SimHash, and opaque payload
Unlike traditional caches (which store strings/bytes by exact keys), semantic caching lets agents:
- Find related content by meaning, not just exact matches
- Update state atomically with CAS (compare-and-swap)
- Share working memory across multiple agents without message passing
Namespace Conventions
Organize your frames using domain prefixes. This keeps namespaces clean and enables targeted searches:
| Prefix | Purpose | Example |
|---|---|---|
plan: | Goals, strategies, step-by-step plans | plan:build-feature-123 |
belief: | Agent beliefs, world model assertions | belief:user-preference-dark-mode |
doc: | Document summaries, parsed content | doc:readme-summary |
cache: | Expensive computation results | cache:llm-response-hash-abc |
session: | Per-session temporary state | session:user-42-context |
-- Good: Clear domain prefix
let _ = store_frame("plan:migrate-db", migration_plan)
let _ = store_frame("belief:api-rate-limit", rate_limit_info)
-- Bad: Ambiguous keys
let _ = store_frame("data123", something) -- What is this?
Canonical Patterns
Pattern 1: Cache-or-Compute
Check for cached result before expensive computation:
import std/sem (make_frame_at, store_frame, load_frame)
func get_or_compute(key: string, compute: () -> bytes) -> bytes ! {SharedMem} {
match load_frame(key) {
Some(frame) => frame.opaque,
None => {
let result = compute();
let frame = make_frame_at(key, "", result, _clock_now(()));
let _ = store_frame(key, frame);
result
}
}
}
Pattern 2: Semantic Deduplication
Before storing, check if a similar frame already exists:
func store_if_novel(ns: string, content: string, payload: bytes) -> bool ! {SharedMem, SharedIndex} {
let hash = _simhash(content);
let similar = _sharedindex_find_simhash(ns, hash, 1, 50, true);
match similar {
[] => {
-- No similar content found, store new frame
let key = ns ++ ":" ++ _uuid(());
let frame = make_frame_at(key, content, payload, _clock_now(()));
let _ = store_frame(key, frame);
let _ = _sharedindex_upsert(ns, key, hash, 1, _clock_now(()));
true
},
_ => false -- Similar content exists, skip
}
}
Pattern 3: Multi-Agent Plan Convergence
Multiple agents refine a shared plan using CAS:
import std/sem (update_frame, UpdateResult)
func refine_plan(goal_id: string, refiner: sem_frame -> sem_frame) -> sem_frame ! {SharedMem} {
let key = "plan:" ++ goal_id;
-- Retry loop for CAS conflicts
letrec attempt = \retries.
if retries <= 0 then
match load_frame(key) {
Some(f) => f,
None => make_frame_at(key, "initial", _bytes_from_string(""), _clock_now(()))
}
else
match update_frame(key, refiner) {
Updated(frame) => frame,
Conflict(current) => attempt(retries - 1), -- Retry with fresh state
Missing => {
-- Create initial frame
let initial = make_frame_at(key, "initial", _bytes_from_string(""), _clock_now(()));
let _ = store_frame(key, initial);
attempt(retries - 1)
}
}
in attempt(3)
}
Content vs Payload Guidance
The sem_frame has two content fields:
| Field | Type | Purpose | Indexed? |
|---|---|---|---|
content | string | Human-readable text for SimHash/search | Yes (SimHash) |
opaque | bytes | Binary payload (serialized data, embeddings) | No |
Guidelines:
-- Good: Content summarizes what's stored, opaque holds the actual data
let frame = make_frame_at(
"cache:api-response",
"Weather forecast for Seattle: sunny, 72F", -- Searchable summary
_bytes_from_string(full_json_response), -- Full data
_clock_now(())
)
-- Bad: Content is empty or duplicates opaque
let frame = make_frame_at(
"cache:api-response",
"", -- No searchability!
_bytes_from_string(full_json_response),
_clock_now(())
)
Determinism Guidance
AILANG's semantic caching is designed for deterministic behavior:
SimHash Determinism
- Same input text always produces same 64-bit hash
- Hamming distance scoring is deterministic
- Tie-breaking:
(score DESC, key ASC)- always same order
Search Determinism
- Use
deterministic=truein search calls for reproducible results - Limits scan scope with
maxScanparameter
-- Deterministic search (recommended for agents)
let results = _sharedindex_find_simhash("beliefs", hash, 5, 100, true)
-- ^^^^ deterministic mode
-- Best-effort search (faster, but order may vary on ties)
let results = _sharedindex_find_simhash("beliefs", hash, 5, 100, false)
CAS Determinism
- Atomic updates prevent race conditions
- Version numbers ensure causal ordering
- Conflict detection enables retry patterns
Trace Debugging
Enable tracing to see all SharedMem/SharedIndex operations:
# Trace SharedMem operations
ailang run --caps IO,SharedMem --trace-sharedmem --entry main file.ail
# Trace SharedIndex operations
ailang run --caps IO,SharedIndex --trace-sharedindex --entry main file.ail
# Trace both
ailang run --caps IO,SharedMem,SharedIndex --trace-sharedmem --trace-sharedindex --entry main file.ail
Trace output shows:
- Operation type (GET, PUT, CAS, UPSERT, FIND)
- Keys accessed
- Success/failure status
- Timing information
Two-Tier Search Architecture
AILANG provides two complementary search methods:
Tier 1: SimHash (Fast, Deterministic)
Best for:
- Near-duplicate detection
- Typo tolerance
- No external dependencies
let hash = _simhash("The quick brown fox")
let dist = _hamming_distance(hash1, hash2) -- 0-64 bits
-- Score: 1.0 - (distance / 64.0)
Tier 2: Neural Embeddings (Semantic, Accurate)
Best for:
- True semantic similarity ("car" ~ "automobile")
- Paraphrase detection
- Cross-lingual matching
-- Requires: ollama serve && ollama pull embeddinggemma
let emb = _ollama_embed("embeddinggemma", "The quick brown fox")
let results = _sharedindex_find_by_embedding("ns", emb, 5, 100, true)
Hybrid Search Pattern
Combine both for best results:
func hybrid_search(ns: string, query: string, top_k: int) -> list[string] ! {SharedMem, SharedIndex} {
-- Phase 1: Fast SimHash pre-filter (100 candidates)
let hash = _simhash(query);
let candidates = _sharedindex_find_simhash(ns, hash, 100, 1000, true);
-- Phase 2: Re-rank with embeddings (top_k results)
let query_emb = _ollama_embed("embeddinggemma", query);
rerank_by_embedding(candidates, query_emb, top_k)
}
Embeddings Doctrine
Why embeddings matter:
- SimHash catches near-duplicates (shared tokens/structure). It does NOT catch paraphrases.
- Embeddings catch paraphrases. "The sky is blue" ~ "The heavens are azure" won't match via SimHash but will via embeddings.
When to Compute Embeddings
Compute embeddings for frames that represent durable knowledge:
- Document summaries
- Plans and policies
- Entity facts
- Extracted structured state
Avoid embedding ephemeral "chatty" content. Prefer stable canonical text.
What to Embed
- Embed the frame's
contentfield, NOT the fullopaquepayload contentshould be short, canonical, and discriminative (1-3 sentences plus key identifiers/constraints)
-- Good: content is short summary, opaque holds full data
let frame = make_frame_at(
"doc:report-123",
"Q3 sales report: revenue $1.2M, 15% growth, APAC expansion", -- Embed this
_bytes_from_string(full_report_json), -- Not this
timestamp
)
How to Store with Embeddings
import std/sem (make_frame_at, store_frame, with_embedding)
func store_with_embedding(ns: string, key: string, content: string, payload: bytes, ts: int)
-> unit ! {SharedMem, SharedIndex, IO} {
-- Compute embedding once at store time
let emb_floats = _ollama_embed("embeddinggemma", content);
let emb_bytes = _embedding_encode(emb_floats);
-- Build frame with embedding
let frame = with_embedding(make_frame_at(key, content, payload, ts), emb_bytes, 768);
-- Store in SharedMem
let _ = store_frame(key, frame);
-- Index with embedding for semantic search
_sharedindex_upsert_emb(ns, key, frame.simhash, emb_floats, frame.ver, frame.ts)
}
Recommended Hybrid Retrieval
The best pattern is:
- Use SimHash to get candidates (fast, bounded)
- Embed the query once
- Rerank candidates by embedding similarity
- Load the best key
This gives you:
- Deterministic cost bounds - SimHash limits candidate set
- Semantic accuracy - Embeddings find paraphrases
- Stable traces - Same query = same results
func resolve_best_match_hybrid(ns: string, query: string, top_k: int, max_scan: int)
-> Option[sem_frame] ! {SharedMem, SharedIndex, IO} {
-- 1) SimHash candidates (deterministic, bounded)
let hash = _simhash(query);
let candidates = _sharedindex_find_simhash(ns, hash, top_k, max_scan, true);
match candidates {
[] => None,
_ => {
-- 2) Embed query once
let qemb = _ollama_embed("embeddinggemma", query);
-- 3) Rerank with embedding search
let results = _sharedindex_find_by_embedding(ns, qemb, 1, 100, true);
-- 4) Load best match
match results {
[] => None,
[best, ..._] => load_frame(best.key)
}
}
}
}
Best Practices for Embeddings
| Practice | Why |
|---|---|
| Canonicalize content | Stable phrasing improves retrieval consistency |
| Embed once at store time | Embedding is expensive (~160ms); don't recompute |
| Don't embed giant text | Store summary in content, full text in opaque |
| Prefer bounded rerank | Rerank 20-200 candidates; don't scan everything with embeddings |
| Record decisions | Log model name, candidate count, scores for debugging |
Determinism with Embeddings
For "Strict-ish" behavior with embeddings:
- Keep candidate generation deterministic:
find_similar_bounded(... deterministic=true) - Keep candidate bounds fixed (
top_k,max_scan) - Use deterministic tie-breakers:
(score DESC, key ASC)
If you later add ANN indexes, treat them as BestEffort by definition.
SimHash-only vs Hybrid vs Embedding-only
| Mode | Speed | Accuracy | Use When |
|---|---|---|---|
| SimHash-only | ~1ms | Near-duplicates only | High-throughput, exact/near matches |
| Hybrid | ~160ms | Paraphrases + near-duplicates | Recommended default |
| Embedding-only | ~160ms | Best semantic recall | Small namespaces, semantic search only |
Killer Examples
Example 1: Dedupe Expensive LLM Calls
module example/dedupe
import std/ai (call)
import std/sem (make_frame_at, store_frame, load_frame)
func cached_ai_call(prompt: string) -> string ! {AI, SharedMem, SharedIndex} {
let hash = _simhash(prompt);
let similar = _sharedindex_find_simhash("ai-cache", hash, 1, 50, true);
match similar {
-- Found similar prompt with high score (>0.95 = <4 bits different)
[hit, ..._] if hit.score > 0.95 => {
match load_frame(hit.key) {
Some(frame) => _bytes_to_string(frame.opaque),
None => call_and_cache(prompt, hash)
}
},
-- No match, call LLM and cache
_ => call_and_cache(prompt, hash)
}
}
func call_and_cache(prompt: string, hash: int) -> string ! {AI, SharedMem, SharedIndex} {
let response = call(prompt);
let key = "ai-cache:" ++ show(hash);
let frame = make_frame_at(key, prompt, _bytes_from_string(response), _clock_now(()));
let _ = store_frame(key, frame);
let _ = _sharedindex_upsert("ai-cache", key, hash, 1, _clock_now(()));
response
}
Example 2: Multi-Agent Belief Convergence
module example/beliefs
import std/sem (make_frame_at, update_frame, load_frame, UpdateResult)
type Belief = { assertion: string, confidence: float, evidence: [string] }
func add_evidence(belief_id: string, new_evidence: string) -> Belief ! {SharedMem} {
let key = "belief:" ++ belief_id;
match update_frame(key, \frame.
let belief = decode_belief(frame.opaque);
let updated = {belief |
confidence: min(1.0, belief.confidence + 0.1),
evidence: new_evidence :: belief.evidence
};
{frame | opaque: encode_belief(updated)}
) {
Updated(frame) => decode_belief(frame.opaque),
Conflict(current) => {
-- Another agent updated, merge our evidence
let existing = decode_belief(current.opaque);
add_evidence(belief_id, new_evidence) -- Retry
},
Missing => {
-- Create initial belief
let initial = { assertion: "", confidence: 0.5, evidence: [new_evidence] };
let frame = make_frame_at(key, "", encode_belief(initial), _clock_now(()));
let _ = store_frame(key, frame);
initial
}
}
}
Example 3: Debuggable Retrieval Pipeline
module example/debug_retrieval
func traced_search(query: string) -> [string] ! {IO, SharedMem, SharedIndex} {
-- Log query
let _ = print("SEARCH: " ++ query);
-- Tier 1: SimHash
let hash = _simhash(query);
let _ = print("SimHash: " ++ show(hash));
let tier1 = _sharedindex_find_simhash("docs", hash, 10, 100, true);
let _ = print("Tier1 results: " ++ show(length(tier1)));
-- Tier 2: Neural (if Tier 1 sparse)
if length(tier1) < 3 then {
let emb = _ollama_embed("embeddinggemma", query);
let _ = print("Falling back to neural search");
let tier2 = _sharedindex_find_by_embedding("docs", emb, 10, 100, true);
let _ = print("Tier2 results: " ++ show(length(tier2)));
map(\r. r.key, tier2)
} else {
map(\r. r.key, tier1)
}
}
Running Examples
# Basic SharedMem (no Ollama needed)
ailang run --caps IO,SharedMem --entry main examples/sharedmem_cache.ail
# Full semantic retrieval with SimHash
ailang run --caps IO,SharedMem,SharedIndex --entry main examples/semantic_retrieval.ail
# Neural semantic search (requires Ollama)
ollama serve &
ollama pull embeddinggemma
ailang run --caps IO,SharedMem,SharedIndex --entry main examples/neural_semantic_search.ail
Summary
| Concept | Key Point |
|---|---|
| Namespaces | Use prefixes: plan:, belief:, doc:, cache:, session: |
| Content field | Human-readable summary for search |
| Opaque field | Binary payload, not indexed |
| CAS updates | Use update_frame for atomic multi-agent coordination |
| SimHash | Fast, deterministic, good for near-duplicates |
| Neural | Semantic similarity, requires Ollama |
| Determinism | Always use deterministic=true for reproducible agent behavior |
Related Resources
- Semantic Caching vs Vector DBs - When to use which tool
- Roadmap - Why this matters and what's coming next
- Design Status - Implementation details
- Future Work - Redis, Firestore, hybrid search
- Agent Messaging - Semantic Search - Search CLI messages with SimHash and neural embeddings