Time is money.
When managing costs and performance in a cloub-based system, one of the most important factors can be the size of hard drive storage. SSD are extremely fast, but as of 2015 their capacities are generally in the 20-40GB range, and that can fill up rapidly with time series data. I have been acutely aware of the needs to manage space for my simulation model data fopr quite some time, b ut only recently did it dawn on me that a large portion of my space was NOT being used to store my data — it was being used to INDEX my data.
In one install I have a time series table (named ‘dh_timeseries’) with a modest 1.8 million records. The table itself stores 3 integer columns (including an integer that is used as a unix timestamp) and a single floating point column (float8). There are 3 indexes on this table: 1 small integer (dh_ts_vix_idx), and 2 big integer (dh_ts_fix_idx & dh_ts_tix_idx). The table takes up 208 MB, while these 3 indices take up 40MB each, for a total of 120MB (see Query 1 & Table 1) — increasing my overall storage by 1/3. Which means that my 20GB HD will be full 1/3 sooner than I’d like, but the speed benefit of these indices are worth it — time is money.
Query 1: select relname, round(sum(8.0 * relpages / 1024)) as size_mb FROM pg_class where relname like ‘dh_timeseries_%’ group by relname order by relname;
Table 1: Output of query of storage for a single table and threee integer indices on that tables data.