Today, I implemented one of the new features proposed on roadmap in my personal journal analysis app: a semantic search and synthesis engine. Instead of a traditional keyword search, I opted for an “Agentic RAG” (Retrieval-Augmented Generation) pipeline. This approach leverages a local LLM (gemma3n:latest
) not just for generating text, but for orchestrating the entire search process. This post outlines the design thinking behind this implementation and its future potential.
The Problem: Bridging Human Intent and Database Queries
A user usually asks, “What was my progress on the API refactor?” A database, however, requires precise, literal terms. A direct search for “API” and “refactor” would miss relevant entries logged as “updated endpoints” or “fixed backend routes.”
The core challenge was to translate the user’s high-level intent into a low-level, effective database query without the overhead of maintaining a dedicated vector database.
The Solution: A Two-Agent Pipeline
The current solution was to split the task between two specialized “agents,” each powered by a targeted LLM prompt.
1. The Query Planner Agent:
This is the first and most critical step. Its sole responsibility is to act as a “semantic bridge.”
- Input: The user’s raw, natural language query.
- Task: Transform the query into a structured JSON array of effective search keywords.
- Prompt Engineering: The prompt instructs the LLM to identify core concepts, generate synonyms and related technical terms (e.g., “API” ->
["endpoint", "route", "backend"]
), and discard conversational stop words. - Output: A clean array of keywords, like
["api", "refactor", "endpoint", "route"]
.
This agent turns a fuzzy query into a high-signal set of search terms, dramatically improving retrieval recall.
2. The Synthesizer Agent:
Once the Planner provides the keywords, we perform a full-text search against the description
and notes
fields in our PostgreSQL database using Prisma. The top 20 most recent results are then passed to the second agent.
- Input: The original user query and the context of the top 20 retrieved journal activities.
- Task: Generate a concise, markdown-formatted summary that directly answers the user’s query.
- Prompt Engineering: The prompt instructs the model to base its answer only on the provided context, preventing hallucination, and to structure the output for clear presentation.
- Output: A human-readable summary, ready for the UI.
This entire pipeline is exposed via a single POST
endpoint at /api/journal/search
, which is consumed by a new Next.js page with a simple React frontend.
Design Trade-offs and Future Optimizations
This architecture represents a deliberate set of trade-offs:
- Simplicity over Precision: We avoided the complexity of
pgvector
and embedding backfills by using the LLM’s reasoning for the initial search. This is simpler to implement but may be less precise than a true vector similarity search. - Speed over Comprehensiveness: By limiting the retrieval step to the top 20 activities (
take: 20
), we ensure a fast response and keep the LLM context manageable. However, this means the summary is based on a subset of all possible relevant data.
This leads to a clear path for future optimization:
Map-Reduce Summarization for Full-Database Analysis:
To overcome the take: 20
limitation, the next step for optimization is to implement a batch-processing pattern.
- Full Retrieval: Fetch all matching activities from the database.
- Map: Break the results into manageable batches (e.g., of 20). Send each batch to the LLM to generate an intermediate summary.
- Reduce: Combine all the intermediate summaries and send them to the LLM (a new agent) one final time to produce a single, cohesive, and highly accurate final answer.
This “Map-Reduce” approach would trade latency for a far more comprehensive analysis, making the tool even more powerful. It’s an exciting next step for the project.