As much as I love my emacs setup, I can’t take my laptop with me everywhere and that is my biggest compliant. For me investing in personal infrastructure makes sense as I build more one of one software that improves my life. More specifically, there are ways of searching for information I’ve built up over the years that I’ve come to rely on. To be able to search for information consistently across devices, there needs to be a personal indexing service.
What would a personal indexing service look like?
I built a protoype indexer and search UI and you can see a demo here.
I want to build a personal indexer for different documents stays up-to-date so that I can query it securely in one unified place.
Querying should be multi-modal, it should combine results from similarity search, full text search with BM25 scoring, exact match, graph relationships, and SQL to that I can always get the best result depending on how I want to find things.
Querying should be provided via an API that I can query securely. Then it can be used from anywhere—iOS shortcut, emacs, org-ai, and other one of one software.
Overall, I’ll use the following principles to make decisions about what to build:
- Minimize maintenance
I don’t have time for TLC, it needs to work and not take any of my attention unless I’m improving it. (e.g. always able to rebuild from source, no
npm
). - Extensible to more sources I should be able to add new ways of searching without having to redo the whole thing. Similarly, I want to use different methods of search depending on what I’m doing or collate them together into one (e.g. exact search, similarity search, LLM, structured).
- Fast as hell I really can’t stand things that are slow that I use constantly. I search for things all the time and I’m very sensitive to milliseconds of latency. Speed is undervalued.
Sources
Tier 1
- Notes
- Journal
- Meetings
- Task lists and project files
Nice to have
- Calendar
Index
Should this all live in postgres?
- Vector similarity search
- Trigram full text search
- Graph relationship search
Indexer
Write it in Rust to go super fast and multi-threaded? Speed might matter with the volume of file I/O and then text processing that needs to happen. Is python good enough?
Tools
For use with LLMs
- Pre-process the request for search
- Search notes using similarity search using a vector DB
For use with other programs
- Language server for notes to query for related notes as you go, suggest links, see backlinks, auto tag, grammar check
Mobile
- Write a note and link other notes to it by inserting an
org-id
link- Shortcut -> web view -> type -> search API -> select result -> paste link
- Can a shortcut insert text at point? It might be annoying to have to paste after searching. (Sadly no you can’t due to iOS restrictions).
- Search for a task and jump to it to work on it
- Can Working Copy jump to a specific heading from
x-url
? According to the docs, it can, but would need to know the line number
- Can Working Copy jump to a specific heading from
Speed up org-roam
and org-ql
After adding thousands of notes, it’s very slow to (seconds). I’d like to replace both searches with the indexer so it’s consistent and I can add new features on top like omni search, backlinks, related items.
Links to this note
-
I’m setting up
dokku
as a personal infrastructure PaaS for running services like the personal indexing service. -
On Mobile Safari, text inputs can not be
autofocus
by design. Apple expects the user to initiate the input every time. -
Rather than score search results on the probability that the query is relevant to a document, BM25 provides a ranking of probability. That’s because the probability the query appears in the document doesn’t actually matter to the results. This is a heuristic that makes the algorithm efficient and provide excellent results.
-
Rust Memory Profiling on Macos
Working on my personal indexing service, I noticed that large files were getting OOM killed. That’s surprising because rust makes it fairly difficult to do bad things with memory (you can roughly approximate where memory is dropped just by reading code).
-
Why You Still Need an SSL Certificate With Tailscale
I have a private network using Tailscale that runs a few local websites and services. Accessing the websites happens via the Tailscale client which connects nodes in the tailnet directly (e.g. my phone and a dokku hosted website) encrypting data from end to end. While this is a great way to secure the session it’s not validating the identity of the website.
-
Using Github Actions to Access Tailnet
I want to access a private network behind Tailscale network so that I can make an API call to update my personal indexing service when a GitHub repo changes.