Has it been a year already? Wow.
I started working on PixelBox in 2021 as a way of managing my gamedev assets and memes. In short, there were too many times where I found myself scrounging through the bottomless pile of photos on my hard drive for the perfect response, often forgetting the very thing for which I was searching in the process. A similar issue was one of “have I already downloading this asset from the store”?
To solve both of these issues I started working on PixelBox. It’s a simple tool designed to provide quick search for related images, plus title and, recently, exif search.
This post is mostly a retrospective on the things that went nicely, things that didn’t work out, things that are making me nervous, and places I’d like to take the application.
What Went Well
PixelBox was build with Rust, egui, Tract-ONNX, and PyTorch. I am quite happy with all of these options, as they’ve proven to add very little in the way of head-to-wall percussion and a lot of functionality. In particular, I am very much a fan of egui now that I’ve gotten used to it. The Rust ecosystem doesn’t have very many full-fledged cross-platform gui toolkits. There are several which _claim_ to be cross-platform but will fail to build on Windows or lack some pretty basic functionality. The two that I know of which live up to the hype are FLTK-RS and egui. Neither is really native look-and-feel, but ultimately I decided this was an acceptable trade-off for ease of use and memory-footprint. Building a UI in egui takes some getting used to, and I don’t have quite as much control over the layout as I might want, but Godot this is not, and ultimately I would have made a mess of it given the option.
What Didn’t Go Well
Fearless Concurrency Only Applies to Rust
The longest running and most painful issue by far was an issue with SQLite that would crash the application once every twenty or so searches. To understand why, we need to touch on how the image search works.
When PixelBox indexes an image, it passes the RGB info through a neural network (via ONNX) and gets back a low-dimensional embedding on the unit hypersphere. If that’s too complicated, just imagine two points, x and y, on the surface of a ball. If that’s too simple, just imagine a vector of N digits from -1 to 1 whose squared sums total to one. When we want to search for a similar image, we iterate over our index* and perform cosine_similarity on each image, returning a distance**. Cosine similarity is a user-defined function written in Rust and exposed to the SQLite driver. It’s quite fast, searching over ~100k images in about 100ms on my system.
* Yes, technically iterating over a database isn’t best practice, but we’re not here to solve the general ANN problem and this is plenty fast at the scale that we’re running.
** Technically it’s returning a similarity, but we can remap -1 to 1 into a 0 to infinity range.
Once every twenty searches, however, the function would disappear.
How does one _lose_ a function? Excellent question. For a while I thought that perhaps Rust was reclaiming and shuffling around a pointer to the method since I didn’t have anything pinned, but that didn’t make sense. After a lot of avoiding the problem, I found a solution purely by happenstance. Chalk one up for procrastination.
The application was running in the background and indexing my dev directory. I left for a cup of coffee and when I came back to run a search, the application crashed. Eyebrows raised. I started backgrounding the app for longer and longer periods, eventually finding that the application would trigger the exception and fault after about one minute.
It turned out that when I was adding the function definition to SQLite, it was available only to the current connection in the pool. I had gone with a connection pool because I wanted any thread to be able to write to the index — easier to parallel spider the directories. This was the problem.
It took a bit of rewriting. I stripped out the SQLite connection pooling and added a single thread which would read from the inbound image stream and index the images. The issue went away and the application was even faster.
Image Search is Hard
PixelBox was built based on the idea that one could train a network to learn the representation of an image via contrastive learning. We take a bunch of images and randomly match them with themselves or with another image. We end up with a big list of (img_x, img_y, same=Yes/No). Before we push the images through the network, we randomly corrupt each image so they look different. This means assigning a random rotation or changing colors or drawing blocks on top. Then we have our network learn a function which takes two images and spits out same=Yes/No. Voila: contrastive learning. The trick, though, is when it’s finally time to use this network, we drop that last layer which takes our embedding and spits out the ‘yes/no’ value, leaving only the raw numbers in the representation. The assumption is this latent representation should be close for two images which are also similar.
Does it work? Sorta’. There’s a lot of room for improvement. I could use an out-of-the-box solution like EfficientNet or ImageNet, but I’m stubborn and want to do my own thing. It works well enough for finding near and exact duplicates.
Things That Make Me Nervous
Towards the end of the last development cycle, more and more of the internals of the app ended up getting passed around the UI as something of a global state. The final commit before shipping even exposed two engine-internal values, max_search_distance and min_image_similarity. Exposing internal methods tends to make for tight coupling of pieces and opens the door to spaghetti-code in the future. I want to be mindful of this with further development, but I still need to strike a balance of pragmatism and good practice. I’ve seen too many cases of people fetishizing OOP and building out software with a practice in mind that neglects the intention of the practice.
Lastly, I’m not worried about the possibility that nobody uses PixelBox. It’s useful to me and I’m sharing it in the hopes it will be useful to others. The thing that worries me is that a _lot_ of people find it useful and I myself can’t muster the enthusiasm to continue to support it.
Things That Make Me Hopeful
The application is in a good state. Search and indexing are fairly robust and the UI is not completely terrible. I’m looking forward to implementing three key features which will make my life easier:
Search by Text in Image
OCR is a tricky problem. In 2020, Du et. al. created Paddle-Paddle OCR — an open-source multi-stage OCR model. It used synthetic data to train the detector, angle detector, and recognition system. I tried this approach in January of 2018 and couldn’t make it work because I’m dumb: https://github.com/JosephCatrambone/ocr_experiment/blob/master/Generated%20Character%20Recognition.ipynb
Age has made me more stubborn, however, and I’m determined to make a useful OCR library for Python and Rust. I’ll dust off that code and give it my all to make it work.
Search by Description
Modern machine learning is beautiful. We can take an arbitrary image and, from it, generate a plaintext description. I want the user to be able to search on the contents of an image using plain English or, barring that, search on automatically generated tags.
Search Inside Zip Files
This one’s big for my use case. I have plenty of asset files downloaded from stores and most of them are compressed and chilling on my disk somewhere. If PixelBox could search inside of these so I don’t end up re-downloading a zip file for the n-th time it would save me a lot of headache.
Wrapping Up
I’m relieved to make this release. There’s still work to be done but things are in a good enough state for me to take some time and relax. I hope people find it useful and I hope this writeup provides some motivation or inspiration to work on your own creative endeavors.