Data Wrangling Font Files

Data appeared in many forms that always required reformatting before use. I found lex, the unix scanner generator companion to yacc, to be an efficient tool with straightforward development.

We acquired heavily encoded font files for potential use in the Projection Kanji Keyboard project.

I studied the file format which wasn't anywhere near the format we needed in microprocessor memory. There was a logic to it with key codes and modes which translate to regex and start states supported by lex.

A lex program is composed of regex that trigger action routines coded in c. I hadn't written a c program but I knew regular expressions and I could figure out printf which handled my output. Lex took care of generating a main program for me.

For the early versions of the converter could just collect and format statistics of what I would eventually process.