Personal Indexing Service

Published

As much as I love my emacs setup, I can’t take my laptop with me everywhere and that is my biggest compliant. For me investing in personal infrastructure makes sense as I build more one of one software that improves my life. More specifically, there are ways of searching for information I’ve built up over the years that I’ve come to rely on. To be able to search for information consistently across devices, there needs to be a personal indexing service.

What would a personal indexing service look like?

I built a protoype indexer and search UI and you can see a demo here.

I want to build a personal indexer for different documents stays up-to-date so that I can query it securely in one unified place.

Querying should be multi-modal, it should combine results from similarity search, full text search with BM25 scoring, exact match, graph relationships, and SQL to that I can always get the best result depending on how I want to find things.

Querying should be provided via an API that I can query securely. Then it can be used from anywhere—iOS shortcut, emacs, org-ai, and other one of one software.

Overall, I’ll use the following principles to make decisions about what to build:

  1. Minimize maintenance I don’t have time for TLC, it needs to work and not take any of my attention unless I’m improving it. (e.g. always able to rebuild from source, no npm).
  2. Extensible to more sources I should be able to add new ways of searching without having to redo the whole thing. Similarly, I want to use different methods of search depending on what I’m doing or collate them together into one (e.g. exact search, similarity search, LLM, structured).
  3. Fast as hell I really can’t stand things that are slow that I use constantly. I search for things all the time and I’m very sensitive to milliseconds of latency. Speed is undervalued.

Sources

Tier 1

  • Notes
  • Journal
  • Meetings
  • Task lists and project files

Nice to have

  • Email
  • Calendar

Index

Should this all live in postgres?

  • Vector similarity search
  • Trigram full text search
  • Graph relationship search

Indexer

Write it in Rust to go super fast and multi-threaded? Speed might matter with the volume of file I/O and then text processing that needs to happen. Is python good enough?

Tools

For use with LLMs

  • Pre-process the request for search
  • Search notes using similarity search using a vector DB

For use with other programs

  • Language server for notes to query for related notes as you go, suggest links, see backlinks, auto tag, grammar check

Mobile

  • Write a note and link other notes to it by inserting an org-id link
    • Shortcut -> web view -> type -> search API -> select result -> paste link
    • Can a shortcut insert text at point? It might be annoying to have to paste after searching. (Sadly no you can’t due to iOS restrictions).
  • Search for a task and jump to it to work on it

Speed up org-roam and org-ql

After adding thousands of notes, it’s very slow to (seconds). I’d like to replace both searches with the indexer so it’s consistent and I can add new features on top like omni search, backlinks, related items.