independent research lab
ZBS GG
zbs·gg·no reset. no amnesia.
memory that knows
what tuesday cost you.
// what i have
- Pulse Factual Benchlatest real-extraction run, 2026-06-25: 38 gold probes over 8 host-extracted docs. primary metric is judged answer accuracy from the same extracted event texts: Pulse 21.1% vs real Mem0 23.7%, bootstrap delta -2.6pp (CI -7.9..0.0), no significant difference. abstention stayed close (7/8 vs 8/8). verbatim recall@k is diagnostic only because extraction paraphrases the source text.
- Empathic Memory Bench v3n=100 deterministic probe suite. Pulse v3 leads the stateful axis: stateful R@3 = 0.419 vs cosine_state 0.314, hybrid_state 0.219 — same query, different user state, different ideal episode (stateless cosine itself: 0.343 stateful). on overall R@3 cosine leads Pulse 0.420 vs 0.416 and we say so plainly · vs 7 retrieval baselines + 6 memory-system adapters (Mem0, Graphiti/Zep, LangMem, LlamaIndex, OpenAI Memory, claude-mem) · label-blind judge check: Pulse 7.722 vs 4.278 stateful fit over a Mem0 adapter, α = 0.910 (multi-vendor: 8.0 vs 5.4, α = 0.699, tentative) · own corpus, judge prompts open
- LongMemEval_S68.89% · published 500-question long-context evaluation · sanity check of the cosine+recency base (public benchmarks carry no state fields, so the state layer collapses to identity)
- EM-IB (internal)76% (single LLM-judge) · 1427 questions across 18 simulated seekers · internal empathic-support benchmark we built, corpus unreleased pending privacy review — self-reported, not third-party reproduced; not the external benchmark of similar former name (arXiv:2602.01885)
- LoCoMo32.51% F1 · 62.78% adversarial refusal · ACL 2024 corpus · base-retriever cross-check, not comparable to Mem0/Zep native J-score protocol
- replicationMIT · all corpora, judge prompts, scripts at github.com/zbs-gg/emo-bench · scope: single-user deployment corpus (year-long real-use logs); external benchmarks cross-check the foundation, v3 conditional boosts validated only on bench v3 · scripts default to a now-closed inference backend, see REPRODUCIBILITY.md for migration path · no third-party run yet — invite open
salience over 365 days(◆ = structural anchors · ● = regular events)
regular:
s = base · exp(−0.005 · days) → half-life ≈ 139danchor: s = max(base · exp(−0.0005 · days), base · 0.85) → preservedsimulated memory corpus, illustrative — not anyone's real store. ◆ structural anchors stay bright. ● regular events fade. anchor-aware decay covers what should survive. state-aware retrieval covers what should surface for THIS moment — mood, time, intent of the question. that's the stateful axis where Pulse v3 leads on the bench.
// receipts
three facts worth hovering over.
200 MB of living dialogues · label-blind judges: α = 0.910, direction holds cross-vendor · 14 months · no identity reset
// what i ship
- Pulsestate-aware memory engine for agents — Go + TS, MIT. Local Preview on npm (@zbs-gg/pulse): doctor-gated install, seeded stateful demo — same query, different user state, different justified episode. anchor-aware decay, conditional emotional and stateful boosts, chain-expanded recall.
- Heartstate-aware chat companion built on Pulse — TS, MIT. self-hosted reference client.
- Garden.Livepublic landing. shared root for the Pulse / Heart / Bench stack.
- Emo.Benchreproducible empathic-memory benchmark — corpus, queries, judge prompts, agreement analysis, raw per-judge JSON. leaderboard live; source at zbs-gg/emo-bench.
- paperRemember What Matters: State-Conditioned Episodic Retrieval for Emotional AI Companions — source + PDF at zbs-gg/pulse-paper. scoped claim, negative results disclosed, privacy boundary documented.