Remote Database Schema Migration

I was sure that we could not get the schema for WyCash Plus right on the first try. I was familiar with Smaltalk-80's object migration mechanisms. I designed a version that could serve us in a commercial software distribution environment.

I chose to version each class independently and write that as a sequential integer in the serialized versions. Objects would be mutated to the current version on read. We supported all versions we ever had forever.

I recorded mutation vectors for each version to the present. These could add, remove and reorder fields within an object. One-off mutation methods handled the rare case where the vectors were not enough description.

We shipped migrations to our customers with each release to be performed on their own machines when needed without any intervention.

Occasionally we would send specific customers field patches that included migrations specific to their needs. This then folded into our ongoing development with no further attention. Different users might migrate in different orders so long as each abstraction migrated in sequence there was no problem.


I wrote a program that would manage a database of mutation vectors independent of our source code. This turned out to be hard for me to operate correctly. I was convinced fatal mistakes would be made so I discarded it in favor of hand-crafted vectors stored as an array per object in the running program.

Alan Darlington became most skillful of our group managing these resources. We would change what ever we wanted in development and then design the migrations to be delivered, consulting Alan if necessary.

Alan Darlington eventually discovered a way to keep primary data in source code comments and use a "do it" to generate correct mutation vector arrays when needed. This was genius. It keep our version history in the source code management where it belonged.


We were replacing spreadsheets and chose the same open/save approach to persistence.

We supplemented this with a transaction log that could be read on startup to recover unsaved data.

One customer didn't know that save was expected and only ever turned off their PC when their work was done. When startup starting getting slow we suggested a save. All was better.

We concocted a small scale sharing mechanism coded by reading each others recovery logs. This delayed the "high volume" implementation on top of a shared database.

We rewrote our serialization to use binary data rather than parsing text strings. We stored these records in a db with four tables: instruments, transactions, portfolios and other. Our mutation mechanisms survived this conversion.

We found that interning symbols was slowing reading. We tried rewriting this as a custom primitive with no significant improvement. Success came with inserting our own symbol table in front of the system symbol table. With only our own symbols we achieved near perfect hashing.


We let powerful database structures emerge. For example some instruments contained the transaction that provided collateral to the investment. This bundle would be saved in the instrument table without touching the transaction table unless the collateral were redeemed.

I've described to others how we let our implementation evolve. Others would claim that such flexibility would be impossible and cited database migration as an overwhelming cost. When I described our solution I was told that I cheated by changing the rules. Imagine that.