Automatic Data Integrity Checks and Reaggregation

The aggregation algorithm runs semi-continuously and attempts to keep aggregated data up-to-date. It checks whether any new records have been added to the database since the last time it aggregated data for the given device. It can also be manually notified of any "back-loading" events where for some reason we updated past values in the database (that have already been aggregated). 

However, it is still possible for aggregate data to be incorrect. For example, if previously aggregated readout data is manually altered, or rewritten by the back-end without notifying the aggregation algorithms of the "back-load" event. There may also be "bugs" in the aggregation algorithm that we haven't yet uncovered leading to incorrect aggregate values. To try and mitigate such issues, one algorithm (sp_identify_full_reaggregate_candidates) runs once every night and attempts to identify any incorrectly aggregated values, or missing aggregate data. It then flags the associated devices to be completely re-processed by the aggregation routines.

Currently this routine checks for the following issues, however, it could be improved upon and further checks could be added. 

The first round of checks is fairly simple and as such completes fairly quickly:

  1. First device_readout_daily does not match date of first readout: we have readout data in device_readout or DER, however values are missing in DRD completely, or the first day that we have values for is after the day of the first energy readout. Also check if we have data in DPR, but we don't have aggregated power values for that day in DRD.
  2. First system_production does not match first readout: same as the previous check (First device_readout_daily does not match date of first readout), except that instead of checking DRD, we are checking SPD.
  3. First system_consumption does not match first readout: same as above, except that we are checking SCD for PRIMARY SITE LOAD type devices and associated systems.
  4. First system_net does not match first readout: same as above, except that we are checking SND for PRIMARY GRID NET type devices and associated systems.

Abbreviations used:

  • DR: device_readout
  • DER: device_energy_readout
  • DRD: device_readout_daily
  • DPR: device_power_readout
  • SPD: system_production_daily
  • SCD: system_consumption_daily
  • SND: system_net_daily

A second round of tests loops over any devices not identified in the first round. These tests take a long time to run so they are not set to run automatically, however, parameters can be set in the routines to force the checks to run. Here we loop through each device and first identify if there are any "gaps" in the daily aggregated values:

  1. device_energy_readout exists but device_readout_daily does not
  2. - or - device_readout exists but device_readout_daily does not

  3. device_energy_readout exists but system_production_daily does not
  4. - or - device_reaodut exists but system_production_daily does not

  5. device_energy_readout exists but system_consumption_daily does not
  6. - or - device_readout exists but system_consumption_daily does not

  7. device_energy_readout exists but system_net_daily does not
  8. - or - device_readout exists but system_consumption_daily does not

  9. device_celltemp_readout exists but device_celltemp_readout_daily does not

  10. device_irradiance_readout exists but device_irradiance_readout_daily does not

  11. device_power_readout exists but device_power_readout_daily does not
    • sp_identify_full_reaggregate_candidates

    Revision #2
    Created Tue, Feb 11, 2020 10:30 AM by Lieszkovszky László
    Updated Tue, Feb 11, 2020 11:38 AM by Lieszkovszky László