This week at work I had to deal with fixed-width column data like
this. That
is: a plain text file where each line holds a single record with a predefined order of
columns/fields. Each column has a predefined length (in symbols), and, if the value is shorter, it’s
padded with spaces. Every line contains every column, and, therefore, every line has exactly the
same length. For example,
Field A | Field B | Field C |…
01234567890123456789AB0123456789ABCD…
X YYY ZZZ …
…
having the lengths of 10, 12, and 14 respectively, would be
Parsing the data, I came up with a very natural approach which relies on Kotlin/Java
enums. Nothing groundbreaking
or novel here. I’m sure this trick is familiar to many. However, I liked how naturally the tool
suits the problem, and decided to share.
Note that, like in this example, we may need just a subset of the fields present in the data
source. Nevertheless, obviously, we’d have to enum-erate all the columns anyway, up to the
rightmost relevant to us. I particularly like that it’s so easy to adapt to the changes in the data
source format, and to support additional columns.
I only have to say that, if you have a lot of relevant fields packed into the data line, and a lot
of lines, you may wish — for a better performance — to
memoize the offset() function.