« sure I have it, it's filed under "very misc." | Main | object model »

filing

It's impossible to design without immersing yourself in the problem set. That was one of my problems in working on jukebox, I had a particularly small CD collection and had to rely on others to validate that the genre/artist/album concept can scale. If I ever worked on windows, I expect to get busted because I keep a flat list of files inside my documents and use a filenaming schema for organization. No folder villages here. But now I know better, and would happily change my own filing style to mimic others and answer the challenges they must face. This is harder than it sounds, I know: eventually you start thinking of data in a certain shape, the brain takes shortcuts, next thing you know you're on the wrong bus because you forgot to switch to "bus stop - bus number - get on" instead of "bus stop - bus arrival time - get on."

I'm reviewing a service for my family's use that stores files for you. Paper files, the ones that come in boxes. They take your boxes, store them, and if you want anything pulled you just ask. You use those cardboard banker's style boxes that use hanging files and manila files, but are easier to lift than a file cabinet. I got to this decision point because a) suddenly spending $60 a month to store this stuff, b) giving up on the fantasy of having a T-rex size scanner parade into my home on little legs, swallow all my files, and spit out CDs accurately categorized while burping. There is some indication that the file service in question would be less money that a storage bin, and certainly more functional if done the right way.

Here's the catch about moving forward with this: The service has four levels of categorization for retrieval purposes only. You can label the box. Then once inside the box, you can label nestings up to three deep. If you have a box marked 2003 finances, the interior green hanging folders could be Retirement, Taxes, Income, Expenses. Underneath one of those you can have a manila box folder such as Fidelity. Then there's only one level left: you could use it in case you have two Fidelity accounts in the retirement folder, perhaps Roth and Traditional. This seems a suitably rich limitation, but I wanted to throw it out there as a requirement.

The quality of any filing system (in otherwords, metadata schema) can be measured by the accuracy of retrieval (in otherwords, search results). We can't just bring our boxes over as-is and expect any query to the service to work. A sample query we're experiencing right now: "Pull up all receipts for the car in the months between June 2003 and April 2004. We're particularly interested in the fuel pump and the engine blocks. We forgot the bank we ran the purchase through, as well as the name of the service station or the month in question. Also bring up a couple of check imprints for the water damage for the house. We forgot the amount. The payee will be sdf sdfrtv." That's an actual query. Our accountant wants it. If we find this, it means real money to us. The question is, how much work do we have to do to the existing messy file boxes to make this sort of query retrievable?

Perhaps the scanner is a better idea after all...

The silver lining is in solving this problem, a real-world, immersive problem, I can hopefully carry those concepts over to other systems I design.