friar's quill
Friar's Quill — fully local processing from video or audio to timestamped text and structured Markdown, with privacy-first, offline-first semantics.
This project was built entirely solo — from product idea to implementation. I handled everything: UX, system design, frontend, backend, and integration of local AI models.
AI tools were used as assistants, not as a replacement for decision-making. The architecture, product logic, and overall direction were defined and executed by me.
The goal was not just to "make a tool," but to explore how far a fully local AI product can go when treated as a complete user-facing system rather than a technical demo.
The core idea was to move away from generic "AI dashboard" aesthetics and build something with a strong narrative identity.
The product is designed in a medieval theme, where the act of processing content is framed as documentation, interpretation, and judgment — almost like a monk recording knowledge or a court analyzing testimony.
A key reference point was Teenage Engineering EP-1320 — not in terms of functionality, but in how it reimagines modern technology through a historical lens. That contrast felt fresh and memorable.
From that foundation, the main modes naturally emerged:
Chronicle — a monk-like figure carefully documenting events into structured knowledge
Inquisition (The Tribunal) — a system that questions, evaluates, and searches for inconsistencies or "heresy" in the material
Ballad — a bard extracting the most vivid moments and turning them into something memorable and shareable
Instead of abstract feature names, each mode represents a role with intent, which makes the interaction more intuitive and gives the product a distinct personality.
Friar's Quill is a desktop application for fully local processing of video and audio content — from transcription to structured summaries.
The product is designed around a simple idea: long-form content is hard to consume, but most existing tools either rely on cloud APIs or produce raw, unstructured transcripts. Friar's Quill solves both problems by combining offline-first processing with semantic analysis.
Users can input either local files or URLs, including platforms like YouTube. The system converts speech into timestamped text and then transforms it into structured Markdown documents that are immediately usable — whether for reading, searching, or further analysis.
The summarization layer is built around three primary modes:
Chronicle (default, notes) produces a structured breakdown of the content — overview, key ideas, themes, terminology, and quotes
Inquisition (inquisition) evaluates the material based on user-defined criteria, identifying required elements, red flags, and scoring dimensions
Ballad (bard) focuses on highlights — extracting the most important or engaging moments based on a chosen focus, with optional timestamps and quotes
Under the hood, the system processes transcripts in chunks using a map-reduce approach, allowing it to scale to long videos while maintaining coherence in the final output.
The application also includes The Oracle, a constrained Q&A interface that allows users to interact with the processed content. Unlike typical chat systems, it answers strictly based on the transcript and summary, often referencing timestamps and quotes. This turns passive content into something interactive.
From a technical perspective, the product is built as a desktop application using Electron for the interface and Python for the processing pipeline. Speech recognition and language models run locally, and the system is optimized to avoid loading heavy components into memory simultaneously, making it usable on consumer hardware.
Overall, Friar's Quill is positioned not just as a utility, but as a self-contained local AI system — one that prioritizes privacy, control, and meaningful output over raw capability.