Wednesday, December 02, 2009

Accounting for Temperatures

I’ve had a good deal of experience with accounting systems, and it has just struck me that the whole global temperature database should be constructed in a transactional fashion.

To wit:

Each station in the record has an ‘account’ – a GUID if you will, and all temperature values and adjustments for a given station are recorded as transactions for that account, assigned to a date (and even time), and classified as some Type of transaction.

Types of transactions (trx) could usefully be globally codified: as it’s clear that a potent source of confusion is just what value adjustment happened to what data, when. The analogy here is to Generally Accepted Accounting Principles (GAAP) which rules the accounting world. Sorely needed in the world's temperature records....

Accounting systems which use the ’single-table’ approach and are in essence just a big bucket of transactions thereby, could in fact be adapted for this sort of recording. I've worked with SunSystems, Coda and Kypera - all are of this type. There are probably many others.

Trx types would obviously include:

- RAW measurement
- UHI adjustment
- EQP equipment change/calibration etc
- LOC location change adjustment

and so on.

Then, every Step in any process which causes value adjustments to be made to any data point (account/station, day) could be traced, and most importantly, added as a new trx, thus leaving the raw data strictly alone.

Auditors love this sort of approach, and the SQL database engines which store and handle the data, make short work of the heavy lifting in terms of summing trx, adding trx, and consolidating data. Of course the database itself is typically chock full of compliance features (thanks to SarbOx).

As a final aid to traceability, each trx can be stamped with the process name that put it there, and even a description. And who…..

And the beauty of this is that the existing data-generation routines can still operate, but instead of altering arrays of values, they would be adding trx per single data point.

As the start of a Global Temperature Dataset, a transactional accounting-style structure would begin to address the ‘amateur hour’ data storage and versioning techniques we see in so much of the CRU dataset, thanks to Harry.