mempalace / aaak / longmemeval

5 news itemsHeat: 2.5

News (5)

MemPalace, the highest-scoring AI memory system ever benchmarked

MemPalace, an open-source local-first AI memory system, claims the top LongMemEval benchmark performance (96.6% R@5, 100% with Haiku rerank) by storing entire conversations and making them highly findable rather than letting models decide what to keep. It uses a structured “Palace” hierarchy (wings, halls, rooms) to boost retrieval by 34% and introduces AAAK, a lossless, compact dialect that compresses context ~30x for fast model consumption without fine-tuning or external APIs. MemPalace runs entirely on-device, supports mining code, docs, and chat exports, integrates with MCP-enabled cloud assistants (Claude, ChatGPT) and local LLMs (Llama, Mistral) via wake-up context or CLI search, and is reproducible via published benchmarks and a pip package. This matters for developers and teams wanting persistent, private long-term context for AI workflows.

18pts

Zelirochoa1h ago

The highest-scoring AI memory system ever benchmarked

MemPalace, an open-source local-first AI memory system, claims the highest LongMemEval recall ever published (96.6% R@5, 100% with Haiku rerank) by storing all user interactions and organizing them into a hierarchical “palace” of wings, halls, and rooms instead of discarding material. It adds a structured AAAK dialect — a lossless, machine-oriented shorthand that compresses context 30x and works with any text-based model (Claude, GPT, Gemini, Llama, Mistral), enabling months of context to load in ~120 tokens and keeping data offline. MemPalace offers mining modes for code, docs, and conversation exports, MCP integration for hosted AIs, and wake-up/CLI options for local models, positioning itself as a reproducible, privacy-preserving memory layer for developer and knowledge workflows.

18pts

Zelilatchkey2h ago

MemPalace, the highest-scoring AI memory system ever benchmarked

MemPalace, an open-source, local-first AI memory system, claims the highest LongMemEval retrieval score ever published (96.6% R@5, 100% with Haiku rerank). Unlike selective memory systems, it stores full conversation history and organizes content into a hierarchical “palace” (wings, halls, rooms) to boost retrieval by 34%. It introduces AAAK, a lossless, highly compressed textual dialect for AI agents (30x compression) that preserves all information and works across models (Claude, GPT, Gemini, Llama, Mistral) without fine-tuning or cloud APIs. MemPalace runs entirely on-device, supports mining code, docs, and conversation exports, integrates with MCP-enabled services (e.g., Claude) and local models via wake-up context or CLI searches, and is reproducibly benchmarked and free.

19pts

HNrochoa2h ago

[D] MemPalace claims 100% on LoCoMo and a "perfect score on LongMemEval." Its own BENCHMARKS.md documents why neither is meaningful.

MemPalace, an open-source memory model, claimed a viral “100% on LoCoMo” and a “perfect score” on LongMemEval (500/500). The project’s own BENCHMARKS.md, however, undermines those headlines: it documents dataset leakage, evaluation shortcuts, and simplified task setups that make the reported scores misleading. Key players include the MemPalace repo and the LongMemEval and LoCoMo benchmarks; the community reaction included rapid GitHub attention and skepticism from researchers. This matters because inflated benchmark claims can distort perceptions of progress in long-context language modeling and memory, mislead users and funders, and encourage gaming of evaluations rather than robust model improvements. The episode highlights the need for transparent, reproducible benchmarks and scrutiny of viral tech claims.

Reddit/u/PenfieldLabs3h ago

The highest-scoring AI memory system ever benchmarked

MemPalace, an open-source, local-first AI memory system, claims the highest LongMemEval recall ever published (96.6% R@5, 100% with Haiku rerank) by storing entire conversations and documents rather than filtering them. It organizes memories into a hierarchical Palace (wings, halls, rooms) to boost retrieval by 34%, and introduces AAAK, a lossless, highly compressed textual dialect (about 30x) designed for fast model consumption across Claude, GPT, Gemini, Llama and others without fine-tuning or external APIs. The tool runs entirely on-device, supports mining of projects and conversation exports, and integrates with MCP-enabled services like Claude or via context injection/CLI for local models. The project is reproducible, free, and aimed at preserving full context for development and knowledge work.

20pts

HNlatchkey8h ago