Recent projects — Marc Kendal

What pulls me into a project, almost every time, is that the design doesn't yet exist. Figuring out what the thing should be, weighing the trade-offs, deciding what is worth building. That is the part of the work I am drawn to, and the code follows from it. The projects below are a selection of that work: my recent independent work, and some of what I built over my previous five years at Fortress Technology.

The independent work includes an automated trading platform, two plugins for AI coding agents (one open-source, one private), an AI platform I built around my own job search, and three smaller open-source Python tools.

The work at Fortress Technology was built across the five years I spent there: the real-time image-processing stack, the camera monitor screen, and the data-ingestion and labelling tool, all built for the flagship ICON X-ray inspection machine. Alongside those projects, I built a pill-dispensing control application for an external customer, and the translation tool that keeps Fortress's product interfaces running across more than forty languages.

A note on workflow: I am actively developing my spec-driven workflow with AI agents. I write the specifications, Claude generates the implementation against them, and I review every change before it is committed. The architectural judgment and the design decisions stay with me, and the spec-writing practice itself keeps evolving as I learn what is most effective.

Table of Contents

Independent builds

Automated trading platform (2025–present)

Since the summer of 2025 I have been building an automated trading platform. The starting question was a simple one: could I write a trading system that could outperform existing investment opportunities? The answer, so far, is yes.

I have developed a unique methodology for algorithmically identifying and assessing opportunities. Combined with extensive performance testing and machine-learning filters, this has the potential for a large upside: at a minimum, to beat the S&P 500, with the goal of many multiples of that performance.

A working belief I have come to is that a trading system lives or dies on its execution layer as much as on the strategies running inside it. A clever algorithm running on a brittle execution pipeline is not profitable; a disciplined, fail-safe execution layer can compound even a moderate strategy. Therefore a lot of the engineering work in this project has gone into making the execution side performant, reliable, and fail-safe, rather than into the algorithms alone.

The architecture

The system is organised in layers. A composition root wires every component together at startup; an opportunity scanner continuously evaluates each enabled strategy on live market data; and whenever a strategy's entry conditions trigger, a trade cycle is spun up as an asynchronous task to manage that one trade end-to-end: opening positions, watching them while they run, and closing them cleanly.

The trade cycle owns the position end-to-end as a single asynchronous task: it opens the position, watches it while it runs, and closes it cleanly. Trade-record updates from across the system (opens, fills, closes, adjustments) are sequenced through a single in-process pipeline, so every write lands in order regardless of which part of the system produced it. Snapshot broadcasting and health monitoring run as their own background tasks, so a slow consumer never holds up a live trade. Each layer has a defined responsibility and a defined interface, and that is what lets the project keep growing without one piece destabilising the next.

Plug-and-play strategies and brokers

Strategies plug into the controller through a strict contract: a strategy declares when to enter, when to exit, what instruments to trade, how to size the position, and (optionally) a pre-trade and post-entry filter. Once those are declared, the system handles the rest: execution, persistence, risk, and lifecycle.

Adding a new strategy means writing those few methods, registering it, and the controller will scan for opportunities, route execution, manage the trade, and persist the result, all without the strategy author touching execution or risk code.

Brokers follow the same decoupling pattern. A single exchange-abstraction protocol expresses what the system needs from a venue; pointing the platform at a different broker is a matter of writing a thin adapter, and Interactive Brokers, IG, OANDA, and a fully-mocked venue for tests all sit behind the same interface today.

Operational safety

A central focus of the engineering in this project has been the failure modes that only surface once real capital is moving. A position reconciler watches for drift between what the broker says is open and what the system has recorded (for instance, when an order fills but the fill notification is delayed) and resolves the discrepancy by re-querying the original order and issuing a corrective adjustment.

A timezone-aware overnight guard freezes new openings before each broker's daily swap cutoff and liquidates remaining positions in time to avoid the charge.

An adaptive rate limiter watches the actual error surface coming back from the broker. If a rate-limit response appears, the system reduces concurrency and remembers the decision rather than blindly retrying against the same ceiling.

At launch, the system sweeps for any orphaned positions and reconciles them against the trade record before going live, so an unhandled shutdown, however unlikely, cannot quietly leave a position open.

Shadow and paper modes

The same strategy code that runs in production also runs against a real market feed with orders routed to a mocked venue. Shadow mode goes a step further: a separate scanner evaluates strategies on live ticks but never enters the live trade path, recording what would have happened to a separate store.

Both modes use the same entry-signal, exit-signal, and exit-loop machinery as the live system; the only difference is at the order-placement boundary. This symmetry between live and shadow modes is what makes it possible to evaluate a new strategy with experimental control: the only variable is the strategy itself.

The monitor

Alongside the core, a separate FastAPI application provides a live view of the system: PnL, Sharpe, drawdown, snapshots of trades as they run, and per-strategy and per-instrument breakdowns, all streamed to the browser over a WebSocket.

The monitor is intentionally read-only and decoupled from the trading core. It pulls trade and health data over SSH from the trading machine, rather than the trading core needing to know the monitor exists, so a monitor outage can never affect live trading. Pushover alerts fire on health-state transitions rather than threshold trips, so only one alert is produced per issue.

A demo login opens the real dashboard with sensitive fields anonymised, so the architecture and the live behaviour are visible without exposing the underlying portfolio. It is hosted on my own cloud and is not publicly linked. You can request access here.

The machine manager

The platform is designed to run on machines hosted by several cloud providers, including DigitalOcean, Paperspace, and Vultr. This means trading is never dependent on any single provider's infrastructure. The machine manager is the single tool that runs all of these machines from one place. It can set up a new machine, install the trading software on it, and update that software cleanly. Each update carries with it the exact version of the software being installed, so there is never any uncertainty about what is running on any given machine.

The manager is also aware of what is happening on each machine while it is running. Before installing a new version of the software, it first asks the machine to stop opening any new trades. It then waits while the trades that are already in progress finish. Only once every trade has closed does the new version of the software go in. There is an option to skip this wait if needed, but by default the system is given enough time to finish what it has started. A trade in progress is never interrupted by an update.

When the system's keys and passwords are changed, the new value is sent to the machine, the software is restarted, and the result is then tested directly with the broker to confirm it works correctly. If that check fails, the manager restores everything to its previous state, including the earlier keys, the earlier encryption, and the software running on its earlier version. A failed change therefore leaves the machine in exactly the condition it was in beforehand.

To set up secure remote access to each machine, the manager creates a digital key, installs it on the machine, and then turns off password access so that only the key can be used to log in. After doing so, it tests the key one final time to make sure it still works. If the test fails, password access is restored, so that an unsuccessful setup never leaves the user locked out of their own machine.

When records of trades are copied from each machine for review, every record is given a unique fingerprint on the machine itself, and that fingerprint is checked again once the record has arrived. If the two fingerprints do not match, the record is flagged as suspect rather than quietly accepted. The copy is taken using the database's own internal backup procedure, so the result is a consistent picture of the data even as the live system continues to update it.

What is next

An ML-gated signal filter is under active development, so a trained classifier can veto a strategy's signals before they ever become orders.

Localytics

Localytics and docugit (below) are two small tools that pull signal from a project's whole history in a structured, measured way. When you ask an AI agent to review a codebase, it is not always clear whether it has done so in a structured, measured way across the full git history; these tools do that, every time.

While I was building the trading platform I wanted the kind of activity dashboard that GitHub Insights provides (commit rhythm, function-level complexity, churn heatmaps) but not at the cost of sending private source code to a hosted platform that might quietly absorb it into training data. So I built my own.

Localytics is a small local FastAPI server that you run on your own machine and point at a codebase. It analyses the code in place: commit activity, function-level cyclomatic complexity via Radon, the file-type mix of the project, per-file churn, and weekly, monthly, and yearly activity heatmaps.

A second FastAPI dashboard, hosted on Render and backed by Redis, shows the results. Only aggregated numbers ever cross the wire (your source code itself never leaves your machine), and the link between the two is TLS-encrypted so the API-key headers cannot be sniffed off the network in transit. Dependencies are self-contained via PEP 723, which means uv is the only thing you need on the host. Live demo →

Docugit

Companies can claim R&D tax credits for the engineering work they do, but the report itself is laborious to produce: it requires reviewing months of commit history and translating it into narrative form, project by project and period by period. Docugit automates precisely that workflow. Multiplied across an engineering team, the time it returns to each engineer can compound into meaningful productivity gains for the organisation as a whole.

It is a small command-line tool that pulls detailed git changes from one or more repositories across a configurable time range (weekly, fortnightly, monthly, quarterly, or yearly) with file-type and folder filters so it focuses on real engineering work and skips generated files, dependency directories, and documentation. The output is shaped for LLM ingestion: hand it to ChatGPT or a similar model and it produces a draft report you can edit, rather than one you have to write from scratch.

Even as LLMs grow more capable, the tool's value endures. It traverses every commit deterministically and reconstructs the trajectory of engineering work across long horizons. An LLM asked to summarise the same history unaided may not match it for detail, nor faithfully capture the order in which the work unfolded.

The next two projects are plugins for AI coding agents, primarily Claude. Both run inside the agent and add specialised capabilities (clean-architecture review and quantitative-finance methodology, respectively) that the agent can call on as it works.

Python Clean Architecture Plugin

I built this Claude Code plugin so that Python clean-architecture practices can be applied to real code as it is being written, rather than living only in textbooks and the occasional review. The plugin is open-source on GitHub and runs on both Claude Code and Codex CLI. You can install it with /plugin marketplace add MKToronto/python-clean-architecture.

My motivation in building this plugin was to take the work an architecturally-minded engineer would already be doing (checking patterns, weighing trade-offs, looking for improvements) and amplify it. The plugin acts as a tireless second opinion on the design: it confirms when the right pattern has been applied, suggests refinements that often go unseen from inside the work, and surfaces issues at the moment they are introduced rather than weeks later in review. It is built on Arjan Codes' courses on the designer mindset, Pythonic patterns, and FastAPI clean architecture, which the rules and recipes inside it operationalise.

There are twelve slash commands inside it, between them covering the full lifecycle of a Python service: scaffold-api, add-endpoint, scaffold-tests, refactor-legacy, extract-god-class, decouple, make-pythonic, review-architecture, review-api-design, check-quality, diagnose-smells, and suggest-patterns. Each one returns its findings ordered by severity, with file-and-line references and a suggested fix.

Underneath the commands sits a skill pack of more than fifty reference files (seven design principles paired with refactoring recipes, twenty-two code-quality rules, and twenty-five Pythonic patterns) along with a working FastAPI hotel-booking example that demonstrates the full architecture with tests around it. Example output →

The plugin was architected and directed by me, with Claude writing most of the prose under iterative direction. I curated the skill boundaries and validated every reference against the source material it draws on. Arjan's courses are attributed explicitly in the README.

Quant Finance Plugin

I built the Quant Finance Plugin for use inside my trading research; it is currently private. It distils the methodology from Marcos López de Prado's three books on quantitative machine learning into seventeen named skills and a dedicated quant-expert subagent, giving me a research surface I can call on as I work rather than re-reading the source material each time. I architected the plugin and directed its development, with Claude writing most of the prose under iterative direction; I curated the skill boundaries and validated the references against the source books.

Contact me for more information.

Bespoke Career Ops

Bespoke Career Ops is a private AI platform I designed and built around a candidate-first premise. Most products in this space optimise for the employer or the job board; this one optimises for me. It runs locally on my own machine and drives my own job search end-to-end.

A set of AI agents handles the repetitive work of every application. One agent searches the market for roles that match my profile and the location strategy I have set. Another agent tailors my CV per opening, working from a structured base variant rather than starting from scratch. Another agent locates named hiring contacts at each target company. Other agents draft outreach (LinkedIn DMs and cold emails), generate strategy notes for the role, write phone-screen and technical-interview prep, produce study plans for the technologies I will be tested on, and scaffold take-home exercises. A separate email scanner watches my inbox and classifies replies into the right pipeline state. The platform tracks every application through its full lifecycle, from first sighting to offer or close.

Because the platform runs locally, the agents work from a depth of context that a LinkedIn or Otta-style tool cannot reach: every CV variant I have written, a detailed profile of my goals, interests, skills, target archetypes, and the kinds of roles I am ruling out, notes on every project I have shipped, deep code reviews of the systems I have built, and the company, recruiter, and channel lists I maintain. That depth is what separates this tool from a generic AI match. Whether the agents are searching for roles, tailoring a CV, drafting outreach, answering a form question, or generating interview prep, the output is built from me, from my history, my goals, and what I am looking for, not from a single CV compared against a single job ad.

The platform runs locally so the agents can use my Claude subscription instead of paying per-token on the API. The trade-off is that nothing on my machine is visible from elsewhere, so a small push step ships a sanitised snapshot of the pipeline to a separate password-protected dashboard on Render. What goes out is the status of each application: company, role, status, route, last action, next action. What stays local is everything sensitive: notes, contact details, fit scores, CV variants, the underlying JD content. Anyone I share the link with can see where the search is at any moment, without me having to expose the working files.

A central challenge is coordinating long-running AI work across many agents at once. A dozen can be active at any moment, each writing to the same shared file, each producing output that needs review before it is sent.

Each agent's progress streams live as it runs. Closing the panel mid-run does not stop the agent: it minimises into a small pill at the bottom of the screen and keeps going, and clicking the pill restores the full view. The platform locks the shared file during each write, so two agents finishing at the same moment cannot overwrite each other. Closing the browser stops every agent cleanly, so nothing keeps running unattended.

The source is private. I am happy to walk through the architecture and live-demo it on a screen-share. Contact me if you would like to see it.

Anaconda Bootstrap

I built this small bash utility to bootstrap Anaconda on macOS with a single command. It detects an existing install and skips that step if there is one; otherwise it downloads and installs Anaconda from scratch, without needing the binary pre-shipped alongside the script, and then sets up a named environment from a YAML file. It handles the macOS permission prompts up front so the user never has to visit System Settings to click "open anyway". Anaconda has mostly been superseded by uv as a default Python environment manager, but its ecosystem is older and more tried-and-tested, and I still reach for it on projects where I want precise control over which package versions are being used.

Fortress Technology (2019–2024)

The five projects below are a selection from the work I did at Fortress, rather than a complete inventory. The ICON is running on production lines worldwide. The through-line was end-to-end ownership. Each of these projects started as a problem that needed solving. I designed the solution, built it from start to finish, and performance-tested it heavily before it shipped. Every piece of code I wrote at Fortress went out with zero post-ship bugs and no customer complaints. Product demo → · Fortress ICON product page →

Real-time image-processing stack

The Python stack sits at the visible end of a complex pipeline. The pre-existing C, OpenGL, and Perl stack performed the hardware control, image acquisition, and contaminant detection. It exposed a structured block of data to Python through shared memory: the raw sixteen-bit X-ray image as a pixel array, a separate array of pixel indices for each contaminant class (bone, metal, ceramic, glass), geometry arrays describing the outlines and pixel blobs used in cropping and rendering, and metadata such as the image width. Python read those inputs, composed them into a single live image, and streamed it to the user-interface screen in real time.

I built the Python side: the rendering algorithms, the performance engineering, the cross-language interface with the C layer, the async web server that ties everything together, and the live operator-facing API for tuning the system in production.

Crossing into Python from C

The two languages communicated through a single shared-memory region. On every cycle, the C side wrote a fresh frame's payload along with a set of handshake flags into the block, and Python attached to that region with the sysv_ipc library and read directly from it.

What made this design resilient to changes on the C side was that no byte offsets were hardcoded on the Python side. A parser read the C header definitions at build time and derived the matching numpy structured dtype automatically; the result was cached to JSON so production deployments could load the layout at startup without the C source present.

The C struct remained the single source of truth on both sides of the language boundary, and a change on the C side propagated through to Python on the next build. The two codebases could evolve in parallel, with the build pipeline absorbing the cross-language synchronisation.

Performance

Optimising Python to render in real time required compiling the slowest parts of the rendering pipeline down to native code with Numba. The compiled steps cover bounding-box computation, pixel-coordinate unpacking, mask building, hole filling, and image conversion, and most of them run in parallel across CPU cores.

Numba does not have coverage for all of NumPy's functions, and the one I needed for unpacking pixel coordinates from flat indices was unsupported. To work around this, I wrote my own version, computing each y and x coordinate explicitly inside a Numba-compiled parallel loop.

In rare cases, the area of high-density contaminants exceeded what the normal rendering pipeline could process within the real-time frame budget. I designed an innovative recursive algorithm for that case: a dynamic fill that adapts to the shape of each mask in order to keep pace with the production line. The result is that every image renders quickly enough to be displayed in real time, including those with large areas of high-density contaminants.

Atomic publishing

Output images were written to disk in a way that prevented the frontend from ever seeing a half-written file. Each new frame was first written to a temporary path and then atomically swapped into place. The frontend always saw either the previous frame or the new one, never the bytes in between.

Async coordination

Underneath the renderer sits a constellation of long-running asynchronous tasks. The main loop watches the C-side handshake flags in shared memory, reads each new frame as it becomes available, and queues it for rendering before acknowledging the C side so it can produce the next one. A separate connection-recovery loop watches for shared-memory drops and reattaches the segment if it goes away, so the rest of the system can pause and resume without restarting.

Auxiliary loops feed the renderer with sensor data, and a set of broadcast loops push WebSocket updates out to the connected clients.

Holding all of this together is a small set of narrow asyncio.Lock()s, each guarding a single region of shared state.

Polling shared memory is blocking I/O, so it runs inside a thread-pool executor and feeds the async server through an asyncio queue, which keeps the event loop free to serve HTTP and WebSocket traffic without ever stalling on a hardware read.

Live tuning from the operator's screen

The system was designed so that the operator could change the way contaminants were drawn without ever stopping the machine. A small set of HTTP endpoints accepts updates to anomaly colours, mask shapes, the layer order in which contaminant classes are stacked, and the various thresholds that control pixel skipping and recursive hole-filling. Each incoming change is type-checked against a validator map before being applied, written through to disk so it survives a restart, and (if the operator has asked for it) used to immediately re-render the most recent frame so the change is visible on screen within a fraction of a second.

The result is that the screen behaves less like a passive viewer and more like a working surface that the operator can adjust in place, on the production line, while the line is running.

Batch persistence

As well as serving the live image, Python writes a structured archive of every image the machine produces, grouped by batch. The C side hands Python a destination folder for each new image; Python takes the binary data already in memory from the live render and writes it to disk in numpy binary format, alongside a JSON file capturing the rendering instructions used.

Python is also the source of truth for the archive at runtime. It walks the on-disk batches directory on startup to build an in-memory model of every batch and its images, updates that model as new images arrive, and serves it to the front end through a REST API. The archive itself is kept in a deterministic layout, indexed by batch number and timestamp, and is durable across restarts, so any image can be reconstructed after the fact with the same inputs and the same parameters, for offline review or for use by other tools.

Performance testing

The renderer's behaviour under load was studied empirically. During performance testing, every frame logged its render time alongside the inputs most likely to drive cost: the image dimensions, and the count and size distribution of the contaminant masks within it.

A separate analysis layer then ran Pearson correlations between render time and each of those input properties, computed the distribution of render times across the run including tail percentiles up to the 99.9th, and identified outlier frames using interquartile-range bounds. Taken together, this gave a clear picture of which characteristics of an incoming image actually drove rendering time, where the worst-case frames sat, and how often they appeared, so that optimisation effort could be aimed at the cases that genuinely mattered in production.

Offline dev tool

Before the live shared-memory hookup with the C side existed, I built a Python layer that bound to compiled C extensions through ctypes to convert raw sixteen-bit detector binaries into rendered images offline. That gave me a way to develop and iterate on the rendering work using stored binaries from disk, without needing the rest of the machine to be live. Even after the live path was working, the offline tool kept its place: detector binaries are compact storage on disk and reconstituting them into rendered images is inexpensive, so the binaries were kept as the archival source of truth and turned into images on demand. The conversion layer underneath was reused across three pieces of work: the offline tool itself, the live machine (for re-rendering past images on demand), and the data-ingestion and labelling tool (for displaying ingested images).

The rendering stack ships on field hardware inside a Docker image I extended from the existing Fortress platform base, with its own conda environment baked in.

Camera monitor screen

The ICON has two operator screens. The first is the main X-ray graphics UI, which is part of the image-processing stack described above. The second is a separate screen that gives the operator a live multi-camera view of the conveyor belt running through the machine, so they can watch each product travel through the machine.

The frontend is a SvelteKit application (Svelte 3, Vite, TypeScript, the static adapter) running inside kiosk-mode Chromium on the Raspberry Pi. The cameras attached to the Pi are read directly by the browser through the native navigator.mediaDevices API: the app enumerates the connected devices, requests their streams, and lays them out as a primary view with thumbnail strips and a workflow for switching the main camera. A small Python service runs alongside the browser to collect logs from the frontend, so anything that goes wrong in the field can be inspected after the fact.

I owned the project end-to-end (the Svelte UI, the supporting Python service, and the on-device deploy pipeline), and the design that shipped was the result of seven prior iterations of Python-based streaming backends. I tried WebRTC, aiortc, HLS, gstreamer, ffmpeg, threading-driven bounded queues, and an asyncio producer-consumer architecture. Each had its own particular failure mode, but they all shared the same root problem: any Python layer between the camera and the browser added enough latency that the operator's view stopped being truly real-time, and a live production-line view has to be real-time.

The only architecture that met the latency requirement was a single-layer one: the browser reading the cameras directly and rendering them, with nothing in between. The shipping design dropped the Python streaming layer entirely and let the browser's native mediaDevices API do the work. The seven iterations were not wasted: they were how I knew no two-layer design would beat a single-layer one for this use case.

Field deployment

The whole system is delivered to a new machine as a single self-contained tarball. The tarball includes the Mambaforge installer, a conda-packed Python environment, the bundled npm packages for the SvelteKit frontend, and the application code itself: everything needed to run, with no internet connection required at any point. On the field Pi, an install script extracts the tarball, installs Mambaforge from the bundled installer, unpacks the environment with conda-unpack, builds the SvelteKit app from the bundled npm packages, and registers the startup script with crontab. From that point on, the Pi reboots into a screen session that launches the log server, builds the Svelte preview, and finally launches Chromium in kiosk mode pointing at the local app, so a field technician can deliver a working machine to a customer site without ever touching the network.

Data-ingestion and labelling tool

The problem here was upstream of the engineering itself. Any serious ML work on X-ray inspection would need labelled training data of a kind that did not yet exist in a usable form. I proposed building a tool to address that, R&D leadership backed the proposal, and I designed and built it.

The result was an internal web application that pulled time-windowed image batches from across Fortress's installed base of X-ray machines, fed the raw binaries through the existing C image-processing pipeline, and presented the converted images in a Svelte single-page application where operators recorded contaminant labels and the per-image context that came with them. The transport layer ran over SSH and was designed around the realities of pulling large image batches from machines on unreliable factory networks, with the kind of timeout, keep-alive, and recovery behaviour that long-running file pulls actually need to survive.

A FastAPI backend held the collection lifecycle together, persisted state to disk after every meaningful change, and exposed a simulation mode so that the same workflow could be exercised locally against pre-staged binaries with nothing live attached.

What an operator actually did, on a working day, was move through batches of recently captured images, identify whether a contaminant was present, and (if one was) say what kind, what size, and how it related to the product being inspected. The labelling schema was structured around food-X-ray realities: contaminant categories, size brackets, and subtypes where a category had distinct varieties an ML model might need to tell apart. A batch could carry multiple independent contaminants, because in production lines they often do.

Alongside the contaminant labels, every image carried the machine-state context it was captured under, which the system extracted automatically from the metadata the X-ray machine shipped next to the binary. None of that context had to be entered by the operator; it travelled with the data.

Building a labelling tool is one thing; building one whose output is actually useful as machine-learning training data is a different problem entirely, and the constraint the rest of the design had to be organised around. I drew on my data-science master's training to design the software around the failure modes — fuzzy labels, class drift, inconsistent units — that would otherwise sink a classifier trained on its output. As an operator labelled, the screen showed a running count of images with and without contaminants, colour-coded green when the two sides remained in proportion and red when they began to drift outside a tolerance, so the dataset did not tip silently towards one class while attention was elsewhere.

Operators were free to enter weights and dimensions in whatever units felt natural for the product in front of them (grams or kilograms, ounces or pounds, centimetres or metres, inches or feet), but at export time the system normalised every entry into a single canonical form (grams for weight, centimetres for distance), so the dataset that fed the training pipeline never carried inconsistent units into training.

The directory structure that came out the other end was already split along the contaminant / no-contaminant lines that downstream training and evaluation actually use, with each image present in the formats a model would commonly need: sixteen-bit PGM, eight-bit PNG and JPEG, and a Sobel-edge variant for models that work on edge features rather than raw pixels. The deliverable was the tool itself, not any one dataset that came out of it: new products and new contaminant types come through the inspection line continuously, and the tool was built to make it possible to assemble a fresh training dataset for any of them, in the same shape and with the same per-image context, on demand.

Background processing that did not freeze the app

Running the C image-processing pipeline against a full batch of images is heavy work, and getting it to run alongside a live user-facing server on a single self-contained process was harder than it first appeared. FastAPI's built-in BackgroundTasks fires only after the response is returned and offers no way to surface progress or know when the work has finished. Distributed task queues like RQ and Celery introduced operational complexity that was not worth taking on for a tool meant to run as a single process on a developer's or operator's machine.

The approach that actually worked was to push the heavy converting work onto a worker thread, leaving the rest of the application free to keep responding. The operator could carry on labelling, browsing past collections, or starting a new batch, while the earlier batch finished converting behind the scenes. A second thread ran alongside, keeping the tool's status fresh so the frontend always had something current to show whenever it came back to ask.

Field options that grew with the dataset

A labelling tool used over years and across many operators has a quiet design problem: how to keep the categorical labels consistent without forcing every new contaminant type, product, customer site, or operator name to be added through code. I solved this by making the field options self-extending. On startup, the server walks every collection that has ever been saved and unions every contaminant type, every product, every company, every operator, and every machine location that has ever been entered into the autocomplete dropdowns the next operator would see. As soon as a collection containing a new entry was saved, that entry became part of the available options the next time the server started. Over the lifetime of a labelling programme the dataset emerges with consistent string labels rather than the proliferation of typos and near-duplicates ("ferrous", "Ferrous", "ferrous metal", "ferous metal") that an unconstrained free-text field would produce. The tool keeps pace with whatever new categories the business encounters in the field, without the engineering team having to ship a release for every new product or contaminant variant.

Surviving unreliable networks

The X-ray machines I was pulling from sat on factory networks I had no control over, and links could drop in the middle of a batch. I built a continuous background loop that watched the SSH connection, ping-tested it on a beat, and re-established it whenever it went down, plus a wrapper in front of every SSH-using operation that did the same check just before reaching for the line. The labelling tool stayed up through network outages, and when the link came back the next batch pull just worked.

Field-aware disk monitoring

The bottleneck on a long collection is rarely the labelling tool. It's disk space on a machine in the field. Before each batch pull I ran a disk-usage check on the remote X-ray machine over SSH, and stopped the collection cleanly if the machine was running low rather than letting the operator initiate a batch that would fail mid-pull and leave a half-pulled folder behind.

Three ways to take the data away

Different downstream consumers want the same collection in different shapes, so I built three download modes: raw (binaries only, for storage), processed (with the rendered images, for review), and with-evaluation-split (pre-stratified into contaminant and no-contaminant halves, ready to drop into a training pipeline). The same collection serves an archive, a reviewer, and a training run without anyone re-shaping the directory layout in between.

Reports for non-engineering stakeholders

While engineering teams happily read JSON and directories of images, project managers and customers want something more readable. Alongside every collection, the tool emits a Word document that brings the whole record into a single file: collection ID, dates, per-batch context (notes, contaminants, product dimensions, machine location), and thumbnails of every X-ray image rendered inline. Anyone in the business can see what was collected without touching the file system.

Multiple contributors per collection

In this tool, a collection is the labelled dataset for one product: the captured images, the labels recorded against them, and what the machine was doing at the time. A collection is built up batch by batch, and it is rarely the work of one person at one moment in time.

What the first operator entered into a collection is kept untouched, and nothing they recorded can be quietly overwritten later. The design still allows for additions: another operator can extend an existing collection with new batches, and a reviewer working through the dataset can leave their own notes alongside the original ones. The original record stays intact; the new notes carry everything added by later contributors.

Operator-customisable display

On a busy screen full of batches, different operators care about different details within each batch. The settings panel exposes around two dozen batch-card fields that can be toggled independently, so that each operator can shape the batch view around what actually matters in a given pass through the data.

Working with cross-version field hardware

X-ray machines in the field run a mix of firmware versions, and the autosave mechanism they expose differs between them. The labelling tool detects which version it is connected to and, on older firmware, emulates the newer behaviour itself, so deployments do not need to wait for the machines to be upgraded.

The labelling tool ships inside the same Docker image as the rendering stack, in its own conda environment, with the C extensions compiled and the frontend npm-built into the image at build time.

Pill-dispensing control

A customer-facing product Fortress shipped to an external pharmaceutical client. The brief was substantial in scope: an automated tablet dispenser the customer was building needed a control application that could run on an operator's PC, talk to the dispensing hardware reliably, expose a usable interface for counting and calibration, and ship as a versioned product into a regulated pharmacy environment. I designed and built it, working directly with the customer's technical leadership and head of R&D to land on a shared plan.

The application drives the dispenser over its own low-level binary protocol. I chose Flask for the web layer because Python gave me direct and ergonomic access to the lower-level networking the dispenser required. Python's selectors module provided non-blocking I/O over a single event loop; I used it to keep several connections open to the dispenser at once and to dispatch each incoming message to its handler based on a preassigned message-type identifier, all while serving the operator's browser without ever stalling on either side.

The application itself has four main capabilities.

The pill-counting workflow lets the operator pick a drug from the database, enter how many pills they want counted, and click Count; the application sends the relevant configuration to the dispenser, asks it to dispense, and reports back the count along with any pills the machine flagged as questionable.

The calibration workflow is how a new drug is added to the system: the operator picks a calibration template and a sample size, and the application walks the dispenser through a sequence (reset the machine, send the chosen parameters, wait for the dispenser to measure and report its own metrics, merge those metrics into the drug's record, send the finalised configuration back to the hardware), reporting progress at each step as it happens.

The parameter-database editor lets the operator edit the records that counting and calibration both depend on, either one field at a time or in bulk through a CSV upload that the application normalises and merges automatically.

The live message log records every message that has crossed between the application and the machine, in order, with timestamps, so the operator or an engineer reviewing afterwards can see exactly what was said and when.

The drug database is an in-memory pandas DataFrame loaded from a CSV at startup and persisted back to disk after every meaningful change. Counting reads the drug's row; calibration writes the dispenser's measurements into it. The same DataFrame backs the parameter-database editor: every edit, single-field or bulk CSV upload, goes through the same merge path with column-name normalisation applied first.

State updates from the dispenser are pushed to the browser through a Flask Server-Sent Events stream, so the screen reflects the state of the physical machine in close to real time without the page having to refresh or repeatedly ask. The dispenser's numeric state codes are decoded into plain English on the application side before they reach the browser, so an error from the machine arrives at the operator's screen as a sentence they can act on rather than as a code they have to look up. The application also enforces safety on the operator's behalf: it refuses to send a command to a machine that is in the middle of another operation, and marks the dispenser as disconnected if it has been silent for more than a minute.

Multilingual HMI pipeline

Each Fortress machine has a touchscreen, the human-machine interface (HMI), which must remain translated across more than ten languages, and that translation work is done by professional translators who are not engineers and whose working tool is Excel. The source-of-truth for the translations sits in a deeply nested JSON file the machine reads at runtime: six different content shapes inside one file, each with its own structure of arrays, nested dictionaries, language-keyed sub-objects, and per-record fields like name and description. Editing this file directly is not a workable proposition for a non-engineer.

The first design decision was to build the editing experience around the translator's existing toolset. The pipeline takes each of the six content shapes and exports it as a separate Excel file, with a column layout that mirrors the underlying record. Each file is locked at the sheet level so the only thing the translator can edit is the column they are translating; the English reference text and the structural identifiers stay read-only, and a translator cannot accidentally damage the JSON's structure no matter what they do. When the files come back, the pipeline reads each one, compares every cell against the existing JSON, applies only the changes that are actually different, records each change as it goes, and writes the updated JSON out as a new file alongside the original.

The next challenge was making it plain to the translator, at a glance, what was still untranslated inside the Excel file they were already working in. The starting point was a simple per-cell language check. Through iteration it became clear that no single detector holds up against real HMI content: single-letter cells, numbers and units, all-caps acronyms, abbreviations, partly-translated rows where only a word or two has changed, and cells where the translator has chosen a valid synonym rather than the obvious word all look indistinguishable from untouched English.

What emerged was a four-pass detector. The first checks the cell's language directly. The second translates it back into English and compares the result to the original. The third inspects whatever still looks ambiguous, word by word. The fourth checks against known synonyms of the English word, in case the translator has chosen a perfectly valid alternative. A cell is flagged for the translator only when every pass agrees it still needs translating.

Cells that survive all of those passes light up in bold in the Excel file, and the translator can see at a glance what is actually outstanding.

The challenge here was volume. A full HMI contains thousands of pieces of text, and preparing the translator's working files for a new language means checking every one of them. The detector relies on several external services: Google Translate for language detection and translation, and Wordhoard for synonyms. Without caching, a single run would issue many thousands of network requests and hit the rate limits on those services long before finishing. The answer was a layered on-disk cache sitting between the pipeline and the network, with one layer for language detections, one for translations, and one for synonym lookups. The cache grows with every run. Each new word and translation it encounters is saved on first contact, so every subsequent run finds more of what it needs already there and finishes faster than the one before.

The remaining design constraint was the user themselves: a professional translator who wasn't necessarily technical, and might not even have Python installed on their machine. The install-and-run experience had to work for that user, unaided. Today this kind of problem is largely solved by UV; at the time it was not. I bridged the gap with a one-step installer plus a launcher the translator can double-click from the Finder. The installer detects whether Miniforge is already on the machine, downloads and installs it if not, and creates the conda environment from the bundled YAML file. The launcher activates the environment and starts a step-by-step prompt that walks the translator through every choice (which JSON file, which target language, which output format, whether to flag cells that still need translating) and validates each input as it goes. The result is a tool a translator can install, launch, and use without engineering support.