Mhess is a stupid messy chess game about giving your pieces superpowers and fighting robot masters. It’s been the center of my attention for the past month or so, and it seemed fitting to share a few of the things I’m learning about the good, the bad, and the ugly of chess engine programming.

As a caveat, I deliberately did not look at any existing implementations. I was hoping that avoiding them would prevent having any pre-conceived notions about the best approach and would lead to a more novel, interesting implementation. If you’re interested in writing your own chess game, I would recommend looking at The Chess Programming Wiki.

Implementing a chess game and a chess engine are very different problems, though a chess game needs a chess engine, the design rules for a fully functional game are slightly different than those of a pure chess engine. For starters, something I had not thought about when building out the engine component was how difficult keeping the internal board representation and external board representation in sync would be. I spent a significant amount of time debugging why things in the game world would disappear or refuse to move. This challenge was compounded, or perhaps made more subtle by what I think are, in hindsight, bad decisions.

Bad Decision 1: Game State representation.

I use the following to represent a game state. It’s designed to be light weight and quick to clone so that when we start looking at trees of depth 14 we don’t blow out our compute time.

#[derive(Clone, Eq, Hash)]
pub struct GameState {
	pub board_width: u8,
	pub board_height: u8,
	pub board_holes: vec![],
	pub board_state: Vec<(u8, u8, Piece)>,
	pub en_passant_space: Option<(u8, u8)>,
	pub black_can_queenside_castle: bool,
	pub white_can_queenside_castle: bool,
	pub black_can_kingside_castle: bool,
	pub white_can_kingside_castle: bool,
	pub halfmove_count: u8,
	pub fullmove_count: u8,
	pub current_player: PlayerColor,
}

The board_state is worth highlighting. I had started with a vector of tuples of u16,Piece. That proved to be an unnecessary optimization since we always immediately converted the idx into x,y. Note that there’s a deliberate choice to have x,y in the board state and NOT inside the piece. This is done to make cloning the board state a shallow operation and really fast. Generating 11 million moves takes less than 700ms in debug mode.

A piece does not know its position in the world. Instead, a move generator takes a game state and produces a bunch of “gamemove” items. A gamemove item takes a gamestate and produces from it a new state. If this whole thing seems roundabout, that’s because it is. Complexity of defining moves is generally localized to the gamemove.rs code, but there’s a LOT of complexity to be localized. Implementing castling has been a mess because checking whether pieces are being attacked involves generating all of the possible moves for the next state. Similarly, checking for checkmate (or check) involves basically first generating the next moves and then determining which result in the king being able to be captured. This is not a great solution.

The next experiment will be a lengthy one and will probably involve trying to switch from the current gamestate + move generator to a gamestate + heavy piece system. If that works out, or if it doesn’t, I’ll be sure to post an update here.

Team Dogpit is hosting Cultivation Jam! Full details are available on the itch.io page.

The special hidden theme is ‘Household Goods’.

The idea that keeps sticking in my head is a clone of Stardew Valley or Animal Crossing, but aboard a generation ship. You need to take care of the plants and fix pieces of the ship using only the tools you can scrounge up locally. Your goal is to prepare enough food to feed a set of 100 people for one season.

MVP:

  • Player can click to move around.
  • Clicking on nearby plants will give the option to tend to them, which means checking hydroponics levels, looking for signs of rot/blight, taking samples, and pollinating.
  • Plants have real life stages, seedling, sapling, grown, flowering, dying, dead.
  • Plants have real life needs, like sun, nutrients, space, temperature.

Part of the challenge might be in balancing time taking care of the plants with time spent exploring the ship’s waste and part recycling in the hopes of finding new and interesting components. Diving deeper into those depths will let you find more stuff but increases risk of getting squished by falling debris or maybe eaten by a sentient dishwasher or something. But that’s too far ahead. For now, let’s get a player walking around.

Day 1-3:

Well the click to move around turned out to be a little dumb. Not only does it require that we figure out pathing (not hard, but tricky to decide on things architecturally), it doesn’t make sense! If someone has to click on something to move to it, why can’t they just click on it!?

I replaced that with a character controller and the normal WASD + controller support.

I started and stopped the inventory a dozen times because every solution felt hacky. I have a single Inventory scene which holds a bunch of items. Each item has an InventoryItem sub-object. When something is in the inventory, the InventoryItem node gets pulled out and the parent (the actual world item) is shoved into the InventoryItem’s world object slot. I had each InventoryItem also inherit from a NinePatchRectangle because I wanted that to be a nice background, but this is a hassle, so I’m making them TextureButtons. Instead, the inventory itself will draw the boxes and will highlight the destination on mouse moves. That’s more sensible as far as I’m concerned.

And I made a robot:

Day 4-5:

Only had about 30 minutes per night to work on things. Spent most of the time working on the drag and drop and the cross-inventory dragging. There are still some bugs but now one can drag items from one inventory into another inventory. The inventory object emits a signal when the cursor moves out and when it moves back in. It feels messy, but it’s self contained at least and makes proper use of signals.

Left: Player inventory. Right: Some other inventory.

Day 6-8

An evening setback from an unexpected social outing (no regrets), followed by a productive stint of fixing things and changing around how the UI is connected together. My efforts to detach the UI display from the UI contents didn’t pan out because the UI contents still need position info, which means they need to be inside a tree, which means they need to be owned. Instead, I have a UI Exchange interface which takes two inventories and allows a user to click and drag things between them. This turned out to be really nice. I also added a Global which tracks which UI screen is open and prevents the player from moving if they’re looking at an inventory. I started work on saving and loading, since that will be required eventually anyway. While I’d intended to start on the dungeon crawling aspect, that got a little side tracked with stashing and unstashing the different parts of the level.

Has it been a year already? Wow.

I started working on PixelBox in 2021 as a way of managing my gamedev assets and memes. In short, there were too many times where I found myself scrounging through the bottomless pile of photos on my hard drive for the perfect response, often forgetting the very thing for which I was searching in the process. A similar issue was one of “have I already downloading this asset from the store”?

To solve both of these issues I started working on PixelBox. It’s a simple tool designed to provide quick search for related images, plus title and, recently, exif search.

This post is mostly a retrospective on the things that went nicely, things that didn’t work out, things that are making me nervous, and places I’d like to take the application.

What Went Well

PixelBox was build with Rust, egui, Tract-ONNX, and PyTorch. I am quite happy with all of these options, as they’ve proven to add very little in the way of head-to-wall percussion and a lot of functionality. In particular, I am very much a fan of egui now that I’ve gotten used to it. The Rust ecosystem doesn’t have very many full-fledged cross-platform gui toolkits. There are several which _claim_ to be cross-platform but will fail to build on Windows or lack some pretty basic functionality. The two that I know of which live up to the hype are FLTK-RS and egui. Neither is really native look-and-feel, but ultimately I decided this was an acceptable trade-off for ease of use and memory-footprint. Building a UI in egui takes some getting used to, and I don’t have quite as much control over the layout as I might want, but Godot this is not, and ultimately I would have made a mess of it given the option.

What Didn’t Go Well

Fearless Concurrency Only Applies to Rust

The longest running and most painful issue by far was an issue with SQLite that would crash the application once every twenty or so searches. To understand why, we need to touch on how the image search works.

When PixelBox indexes an image, it passes the RGB info through a neural network (via ONNX) and gets back a low-dimensional embedding on the unit hypersphere. If that’s too complicated, just imagine two points, x and y, on the surface of a ball. If that’s too simple, just imagine a vector of N digits from -1 to 1 whose squared sums total to one. When we want to search for a similar image, we iterate over our index* and perform cosine_similarity on each image, returning a distance**. Cosine similarity is a user-defined function written in Rust and exposed to the SQLite driver. It’s quite fast, searching over ~100k images in about 100ms on my system.
* Yes, technically iterating over a database isn’t best practice, but we’re not here to solve the general ANN problem and this is plenty fast at the scale that we’re running.
** Technically it’s returning a similarity, but we can remap -1 to 1 into a 0 to infinity range.

Once every twenty searches, however, the function would disappear.

How does one _lose_ a function? Excellent question. For a while I thought that perhaps Rust was reclaiming and shuffling around a pointer to the method since I didn’t have anything pinned, but that didn’t make sense. After a lot of avoiding the problem, I found a solution purely by happenstance. Chalk one up for procrastination.

The application was running in the background and indexing my dev directory. I left for a cup of coffee and when I came back to run a search, the application crashed. Eyebrows raised. I started backgrounding the app for longer and longer periods, eventually finding that the application would trigger the exception and fault after about one minute.

It turned out that when I was adding the function definition to SQLite, it was available only to the current connection in the pool. I had gone with a connection pool because I wanted any thread to be able to write to the index — easier to parallel spider the directories. This was the problem.

It took a bit of rewriting. I stripped out the SQLite connection pooling and added a single thread which would read from the inbound image stream and index the images. The issue went away and the application was even faster.

Image Search is Hard

PixelBox was built based on the idea that one could train a network to learn the representation of an image via contrastive learning. We take a bunch of images and randomly match them with themselves or with another image. We end up with a big list of (img_x, img_y, same=Yes/No). Before we push the images through the network, we randomly corrupt each image so they look different. This means assigning a random rotation or changing colors or drawing blocks on top. Then we have our network learn a function which takes two images and spits out same=Yes/No. Voila: contrastive learning. The trick, though, is when it’s finally time to use this network, we drop that last layer which takes our embedding and spits out the ‘yes/no’ value, leaving only the raw numbers in the representation. The assumption is this latent representation should be close for two images which are also similar.

Does it work? Sorta’. There’s a lot of room for improvement. I could use an out-of-the-box solution like EfficientNet or ImageNet, but I’m stubborn and want to do my own thing. It works well enough for finding near and exact duplicates.

Things That Make Me Nervous

Towards the end of the last development cycle, more and more of the internals of the app ended up getting passed around the UI as something of a global state. The final commit before shipping even exposed two engine-internal values, max_search_distance and min_image_similarity. Exposing internal methods tends to make for tight coupling of pieces and opens the door to spaghetti-code in the future. I want to be mindful of this with further development, but I still need to strike a balance of pragmatism and good practice. I’ve seen too many cases of people fetishizing OOP and building out software with a practice in mind that neglects the intention of the practice.

Lastly, I’m not worried about the possibility that nobody uses PixelBox. It’s useful to me and I’m sharing it in the hopes it will be useful to others. The thing that worries me is that a _lot_ of people find it useful and I myself can’t muster the enthusiasm to continue to support it.

Things That Make Me Hopeful

The application is in a good state. Search and indexing are fairly robust and the UI is not completely terrible. I’m looking forward to implementing three key features which will make my life easier:

Search by Text in Image

OCR is a tricky problem. In 2020, Du et. al. created Paddle-Paddle OCR — an open-source multi-stage OCR model. It used synthetic data to train the detector, angle detector, and recognition system. I tried this approach in January of 2018 and couldn’t make it work because I’m dumb: https://github.com/JosephCatrambone/ocr_experiment/blob/master/Generated%20Character%20Recognition.ipynb

Age has made me more stubborn, however, and I’m determined to make a useful OCR library for Python and Rust. I’ll dust off that code and give it my all to make it work.

Search by Description

Modern machine learning is beautiful. We can take an arbitrary image and, from it, generate a plaintext description. I want the user to be able to search on the contents of an image using plain English or, barring that, search on automatically generated tags.

Search Inside Zip Files

This one’s big for my use case. I have plenty of asset files downloaded from stores and most of them are compressed and chilling on my disk somewhere. If PixelBox could search inside of these so I don’t end up re-downloading a zip file for the n-th time it would save me a lot of headache.

Wrapping Up

I’m relieved to make this release. There’s still work to be done but things are in a good enough state for me to take some time and relax. I hope people find it useful and I hope this writeup provides some motivation or inspiration to work on your own creative endeavors.