• Mushy Systems

    As large language models proliferate into every service and ultimately replaces business logic, we will be left with the horrible burden of maintaining mush.

    Mush happens when a system can’t quite be understood by looking at it. LLMs and abstractions like AI agents cause us to lose read access—one can no longer read code to understand what’s going on. Even if you could read it, code generated by LLMs make a codebase harder to reason about.

    My biggest fear with large, complex, AI-powered systems is that debugging starts to look more like psychiatry.


    Published

  • Rust Build Caching With Docker

    Compiling rust dependencies every time a docker image is built can take a very long time. To cache dependencies so that they don’t need to be compiled every time, you can use/abuse how docker caching works using stages.

    The following example uses two stages to build and then run my_app. By generating a fake main.rs and compiling it, Docker is tricked into caching all dependencies. We then bust the cache to trigger building the app by copying the actual app code and touch-ing main.rs.

    # Build stage
    FROM rust:bookworm AS builder
    
    WORKDIR /
    
    ## Cache rust dependencies
    RUN mkdir ./src && echo 'fn main() { println!("Dummy!"); }' > ./src/main.rs
    COPY ./Cargo.toml .
    RUN cargo build --release
    
    ## Actually build the app
    RUN rm -rf ./src
    COPY ./src ./src
    RUN touch -a -m ./src/main.rs
    RUN cargo build --release
    
    # Run stage
    FROM debian:bookworm-slim AS runner
    COPY --from=builder /target/release/my_app /my_app
    ENTRYPOINT ["./my_app"]
    

    I adapted this from the StackOverflow thread here.

    See also:


    Published

  • Gameboy Color OLED Mod

    The Gameboy Color has a new (to me) mod to replace the screen with a repurposed Blackberry AMOLED screen.

    Here’s the AMOLED screen kit from the Hispeedido official store on AliExpress and the Gameboy Color shell from the eXtremerate official store on AliExpress.

    From my research, the shell should fit the OLED screen kit even though it says it’s cut for “IPS v2 screen kits”. This turned out not to be correct

    Parts

    • Shell #1 from XtremeRate on AliExpress $20.77
    • OLED kit from Hispeedido on AliExpress $46.60
    • Shell #2 from FunnyPlaying $9.90
    • Button in three different colors $1.90 x 3 from FunnyPlaying and $8.50 for shipping (ugh)

    Tutorial I used.

    Gotchas

    • Screen doesn’t fit IPS shell, it has to be cast for the laminated screen
    • Screws are not the same size and if you use the wrong length you will screw right through the front plate (there are three tri head screws that are shorter and normal Philips head screws that are longer)

    Published

  • When Does a Service-as-Software Model Make Sense?

    The service-as-software model is nacsent but expected to be experimented with in different fields as artificial intelligence techniques improve and enable new applications.

    However, slapping AI + {category} + service-as-software should draw reasonable skepticism. There are market constraints that will make adoption more difficult. There are capability gaps that will make solutions incomplete.

    So when does it makes sense for service-as-software?

    Completely replaces a function or role

    Of course taking an established function and selling a service that will replace someone’s job is not going to sell, but supplementing high-demand areas is viable. For example, there are more job openings for engineers than there are qualified people to fill them which creates demand for AI employees that can do the job fully.

    Performs work that wasn’t done before

    There are only so many hours in one day and there is work that is not financially viable to do but people want. For example, not every business can afford 24/7 support that can resolve customer issues but they certainly would like to. This latent demand could be tapped into at the right price point which would be infeasible even for a low-cost offshore vendor operation (which is notoriously hard to get right).

    Other examples:

    • Penetration testing which typically happens annually
    • Monitoring and reviewing logs of critical systems for insights
    • Hard to compile reports like annual business reviews

    The outcome is clearly defined

    The unit of work the service is delivering is ideally measurable and matches how the customer defines success. When the unit of the work is the outcome of the intent, outcome-based pricing aligns incentives clearly. This probably wouldn’t work if you define the outcome too generally (what would be the unit of work of HR?) or the job is a negative art. You could use a proxy measurement, but the further away from the real value, the less clear it becomes.

    (Some ideas drawn from A System of Agents brings Service-as-Software to life)


    Published

  • Outcome-Based Pricing

    Outcome-based pricing (or result-based pricing) is becoming popularized again due to services powered by artificial intelligence that are enable intent-based outcome specification. That means charging per unit of value which is the desired outcome.

    Examples

    Intercom’s new AI chat service charges $0.99 per successful resolution (customer indicates issue resolved or they stop responding). In most SaaS business models, pricing would be per seat where the measurement of value is how many people the service enables to do their work. Now that AI, in some cases, can perform the work autonomously, revenue models for these companies can more closely resemble the actual job to be done—resolved support tickets in this case.

    11x provides an autonomous SDR agent AI. Rather than charge per seat, they started by charging per task—identifying accounts, researching accounts, writing email and LinkedIn messages, scheduling meetings, and so on. Tasks completed makes the outcome clear, “you pay us money, we do these tasks for you that you can easily verify and attribute as real work a person would otherwise have to do.” An even more intent-based pricing plan would be to charge per qualified lead but I can see how that wouldn’t work since there are many variables out of the control of 11x when it comes to getting someone to book a meeting which means charging by output rather than outcome probably works best.

    See also:


    Published

  • When to Be Directive

    I was at a high-end clothing store the other day. I saw one of the workers on an iPad. Curious, I looked over his shoulder to see what he was doing.

    He was making sure the clothing rack he was standing in front of matched the picture on his iPad exactly. He checked the order of each garment. He spaced each hangar exactly. He checked and then rechecked before moving on to the next one.

    This was clearly a process someone thought important enough to make each store follow precisely. Someone designed each detail intentionally so that it fit together as a pleasing whole.

    It’s okay to be directive where the details matter.

    See also:


    Published

  • Rust Memory Profiling on MacOS

    Working on my personal indexing service, I noticed that large files were getting OOM killed. That’s surprising because rust makes it fairly difficult to do bad things with memory (you can roughly approximate where memory is dropped just by reading code).

    After strugging to find a memory profiler for macOS (and not even being able to install Xcode for some reason), I settled on a stupid solution using Activity Monitor which comes pre-installed on every Mac. First, I changed the main method to execute just the code path I suspected was resulting in large memory usage (calculating embeddings) after adding logging to see which file was being worked on before getting OOM killed. Next, I opened Activity Monitor to the Memory tab and typed the name of the rust crate in the search box. Since names are consistent when running cargo run, I could see the value of memory used which gets sampled every second or so. I tried a few code changes, reran it each time, and voila—fixed! Sometimes all you need is a fast feedback loop.


    Published

  • Why I Like Incidents

    A lot of tech company workers dread incidents. They are a high-pressure and often high-stakes ordeal that requires urgent attention. I’ve witnessed some incidents that lasted for months.

    I’ve come to like incidents because I see them as moments of progress. At a minimum, the team just learned how something works (or doesn’t work). At its best, the incident process guarantees that improvements are made which prevent systems from failing the same way twice.

    So while I have that same feeling of dread whenever an incident is called, I remind myself that we are going to learn something important and this is the price of entry.


    Published

  • Use a Scenario Table to Organize Complicated Situations

    When thinking in scenarios, I find it useful to lay it out as a table. A table is the most compact way of sharing definitions with a team. A table helps you explain what’s going on in a way that is mutually exclusive and collectively exhaustive. This keeps the team organized and provides the ability to refer to each case.

    What goes into a scenario table?

    Columns and values

    Which columns determines which dimensions of the problem matter. If you are responding to an incident for example, the goal of the table is to summarize all of the ways a bad thing happened so each column would be contributing factors.

    Picking the right columns is a thinking aid to help you and the team reason about the problem. What other possible combinations of these columns exist in the system? Do we have examples and does it matter? Which can you rule out entirely?

    Labels for each case

    Since we’re in a table format, it’s easy to add a column for labeling each case (a row in the table) so they can be referred to easily. If your situation is very complicated (i.e. many columns), simply number the rows and refer to the case by number. If it’s not as complicated, use a very short name that is distinct and memorable for each case (I use the former for incidents and the latter for product brief).

    See also:


    Published

  • Emacs Sticky Buffer

    Sometimes I want an emacs buffer to always be visible but I want to ignore it when navigating between buffers.

    For example, I want a list of org-mode tasks I need to do today and I want it at the top of the window so it’s highly visible. Since I heavily use other-window and previous-multiframe-window to switch between buffers, it would slow me down if I have to visit the sticky buffer every time I’m cycling between buffers (I like to think of it as clockwise and counterclockwise).

    Here’s my first attempt at doing just that: sticky-buffer-mode.el.

    Now I can go to the buffer with the list of tasks and sticky-buffer-add and future navigation (I have previous and next bound to C-x p and C-x o) won’t visit it.


    Published

  • Manual Spam Filter

    I block every unwanted email I receive to keep my email inbox as signalful as possible (“block” is Gmail speak for “create a filter for this one email address and always send it to spam”).

    Isn’t that a lot of work?

    It’s certainly more than zero effort but the damage a low-signal email inbox is very high for me. Gmail’s built in spam filtering is good at catching the obvious stuff, but it flat out misses every cold email and recruiting agency. Missing a single important email can be the difference between closing a customer, fixing an issue, and making a hire.

    My manual email spam filter list


    Published

  • Four Levels of Product Market Fit

    First Round Capital has a helpful guide to product market fit that helps to orient founders so they can focus on the right things. Rather than a binary, yes/no, evaluation of product market fit, the guide discusses different levels.

    Level 1 - Nascent PMF is when you have some initial customers mostly from warm intros. Churn, gross margin, burn multiples, don’t really matter at this point so much as quickly discovering who the right customer is and solving their problem well.

    Level 2 - Developing PMF is when you have customers, the solution delivered is more repeatable, you have up to $1MM ARR, generating your own leads, and customers would be very dissapointed if you went away. Some metrics around go-to-market efficiency start to matter like net revenue retention, gross margin, and burn multiples because you are proving the business can effectively drive demand.

    Level 3 - Strong PMF is when demand is flooding in and this is often what founders refer to as “feeling the pull” and everything feels easier. There is strong growth of 3x revenue or higher, more leads are coming from word of mouth, and marketing/sales is very efficient with low CAC/burn multiples/churn and high gross margin and sales conversion rate.

    Level 4 - Extreme PMF is when companies earn permission to build new products that expand TAM, the brand starts to become synonymous with the product category, and the business is growing fast. Burn multiple is < 1 and there are more scalable customer acquisition channels.

    Read Levels of PMF by First Round Capital.


    Published

  • Don't Combine Domain Name Hacks

    I see a lot of startups with domain names like getbarai.co and trymspledword.io which combine multiple domain name hacks. These look untrustworthy, are hard to search, difficult to spell, and long. It’s hard enough to get anyone interested in what you are doing that introducing any friction can cause people to forget about it.

    Here’s my rule for startup domains:

    Use only one domain name hack (mispelling of a real word, made up word, prefix before a real word, non-dotcom top level domain, etc.) AND always use .com. Then, when you have the money, buy the clean .com domain name.

    For example, when I started Mosey, we started with getmosey.com (prefix hack) then mosey.so (non-dotcom TLD), then eventually bought mosey.com.

    See also:


    Published

  • Net Revenue Retention

    NRR measures the ability of a business to expand revenue over time. It’s only really useful if the company sells multiple products or one of them has some sort of usage based scaling factor that implies customers use it more over time.

    One of the misleading ways to look at NRR is when there is a single product that is usage based and scales with the growth of their customers. If the customers growth stops due to an economic condition outside of anyone’s control the NRR becomes stagnant or lower than 100%. All that measures is the growth of the economy rather than the effectiveness of the business.

    See also:


    Published

  • You Can Reach Further Than You Think

    One of the interesting lessons from rock climbing is that people can reach much further than they think they can. When looking at the next hold, it can look so far away that we forget how long our arms and legs actually are. I’m no mountain climber but having gone climbing a few times, this stuck with me.

    In other aspects of life where performance matters a great deal, it’s useful to remember the things that seem out of reach might be within your grasp after all.

    See also:


    Published