Experiment 08 — NAB Benchmark

Δ.72 vs Numenta Anomaly Benchmark

Validating the coherence framework against 53 time series across 6 NAB categories. Point-level precision, recall, and F1 comparison against a variance-based detector.

Δ = (P · A · R) / (D + N)
Coherence-based anomaly detection: ALERT when Δ < 0.3, M < 0.4, W < 0.4
Rolling window: 168 points, step 24
0.049
Δ F1 Score
0.346
Variance F1 Score
0.075
Δ Precision
0.083
Δ Recall
53
Time Series
6/53
Δ Wins (F1)

Δ Coherence

Precision: 0.075

Recall: 0.083

F1: 0.049

Wins on 6 of 53 files

Variance Detector

Precision: 0.336

Recall: 0.406

F1: 0.346

Wins on 32 of 53 files

Experiment 08

NAB Benchmark — Overall Summary

Macro-averaged precision, recall, and F1 across all 53 NAB time series. The coherence framework uses gated detection (Δ + M + W), while the variance detector flags windows where rolling variance exceeds 2.5x the global variance.
NAB Summary

F1 Score Comparison by NAB Category

Macro-averaged F1 per category. Each category contains real or synthetic time series with different anomaly characteristics.
F1 by Category
CategoryFiles Coh. PCoh. RCoh. F1 Var. PVar. RVar. F1 Winner
artificial WithAnomaly 6 0.045 0.419 0.081 0.278 0.284 0.278 variance
real AWSCloudwatch 17 0.085 0.072 0.074 0.397 0.549 0.436 variance
real AdExchange 6 0.000 0.000 0.000 0.146 0.287 0.183 variance
real KnownCause 7 0.038 0.048 0.042 0.155 0.189 0.168 variance
real Traffic 7 0.000 0.000 0.000 0.171 0.286 0.208 variance
real Tweets 10 0.197 0.031 0.053 0.624 0.545 0.554 variance

Precision & Recall Breakdown

Side-by-side precision and recall by category. Coherence detectors tend toward higher recall (fewer missed anomalies) at the cost of precision.
Precision & Recall

Representative Time Series with Alerts vs Labels

Selected examples from different categories showing coherence alerts (blue shading) vs NAB ground-truth anomaly windows (red shading). Signal in blue, rolling-mean baseline in white.
Example Time Series

All 53 Time Series

Point-level precision, recall, and F1 for each NAB file. Sorted by category.
CategoryFilePointsAnomalies Coh. PCoh. RCoh. F1Var. F1
artificialWithAnomaly art_daily_flatmiddle 4,032 1 0.078 0.762 0.142 0.000
artificialWithAnomaly art_daily_jumpsdown 4,032 1 0.038 0.345 0.069 0.000
artificialWithAnomaly art_daily_jumpsup 4,032 1 0.100 0.945 0.182 0.839
artificialWithAnomaly art_daily_nojump 4,032 1 0.052 0.464 0.093 0.000
artificialWithAnomaly art_increase_spike_density 4,032 1 0.000 0.000 0.000 0.000
artificialWithAnomaly art_load_balancer_spikes 4,032 1 0.000 0.000 0.000 0.831
realAWSCloudwatch ec2_cpu_utilization_24ae8d 4,032 2 0.000 0.000 0.000 0.616
realAWSCloudwatch ec2_cpu_utilization_53ea38 4,032 2 0.000 0.000 0.000 0.000
realAWSCloudwatch ec2_cpu_utilization_5f5533 4,032 2 0.000 0.000 0.000 0.000
realAWSCloudwatch ec2_cpu_utilization_77c1ca 4,032 1 0.027 0.027 0.027 0.000
realAWSCloudwatch ec2_cpu_utilization_825cc2 4,032 1 0.000 0.000 0.000 0.693
realAWSCloudwatch ec2_cpu_utilization_ac20cd 4,032 1 1.000 0.596 0.747 0.562
realAWSCloudwatch ec2_cpu_utilization_c6585a 4,032 0 0.000 0.000 0.000 1.000
realAWSCloudwatch ec2_cpu_utilization_fe7f93 4,032 3 0.226 0.388 0.285 0.525
realAWSCloudwatch ec2_disk_write_bytes_1ef3de 4,730 1 0.192 0.205 0.199 0.321
realAWSCloudwatch ec2_disk_write_bytes_c0d644 4,032 3 0.000 0.000 0.000 0.269
realAWSCloudwatch ec2_network_in_257a54 4,032 1 0.000 0.000 0.000 0.873
realAWSCloudwatch ec2_network_in_5abac7 4,730 2 0.000 0.000 0.000 0.552
realAWSCloudwatch elb_request_count_8c0756 4,032 2 0.000 0.000 0.000 0.000
realAWSCloudwatch grok_asg_anomaly 4,621 3 0.000 0.000 0.000 0.376
realAWSCloudwatch iio_us-east-1_i-a2eb1cd9_NetworkIn 1,243 2 0.000 0.000 0.000 0.473
realAWSCloudwatch rds_cpu_utilization_cc0c53 4,032 2 0.000 0.000 0.000 0.528
realAWSCloudwatch rds_cpu_utilization_e47b3b 4,032 2 0.000 0.000 0.000 0.623
realAdExchange exchange-2_cpc_results 1,624 1 0.000 0.000 0.000 0.000
realAdExchange exchange-2_cpm_results 1,624 2 0.000 0.000 0.000 0.000
realAdExchange exchange-3_cpc_results 1,538 3 0.000 0.000 0.000 0.480
realAdExchange exchange-3_cpm_results 1,538 1 0.000 0.000 0.000 0.000
realAdExchange exchange-4_cpc_results 1,643 3 0.000 0.000 0.000 0.279
realAdExchange exchange-4_cpm_results 1,643 4 0.000 0.000 0.000 0.342
realKnownCause ambient_temperature_system_failure 7,267 2 0.000 0.000 0.000 0.419
realKnownCause cpu_utilization_asg_misconfiguration 18,050 1 0.000 0.000 0.000 0.000
realKnownCause ec2_request_latency_system_failure 4,032 3 0.000 0.000 0.000 0.511
realKnownCause machine_temperature_system_failure 22,695 4 0.266 0.335 0.297 0.246
realKnownCause nyc_taxi 10,320 5 0.000 0.000 0.000 0.000
realKnownCause rogue_agent_key_hold 1,882 2 0.000 0.000 0.000 0.000
realKnownCause rogue_agent_key_updown 5,315 2 0.000 0.000 0.000 0.000
realTraffic TravelTime_387 2,500 3 0.000 0.000 0.000 0.000
realTraffic TravelTime_451 2,162 1 0.000 0.000 0.000 0.000
realTraffic occupancy_6005 2,380 1 0.000 0.000 0.000 0.000
realTraffic occupancy_t4013 2,500 2 0.000 0.000 0.000 0.486
realTraffic speed_6005 2,500 1 0.000 0.000 0.000 0.000
realTraffic speed_7578 1,127 4 0.000 0.000 0.000 0.287
realTraffic speed_t4013 2,495 2 0.000 0.000 0.000 0.685
realTweets Twitter_volume_AAPL 15,902 4 0.973 0.206 0.340 0.509
realTweets Twitter_volume_AMZN 15,831 4 0.000 0.000 0.000 0.662
realTweets Twitter_volume_CRM 15,902 3 0.000 0.000 0.000 0.498
realTweets Twitter_volume_CVS 15,853 3 0.000 0.000 0.000 0.809
realTweets Twitter_volume_FB 15,833 2 0.000 0.000 0.000 0.491
realTweets Twitter_volume_GOOG 15,842 3 0.000 0.000 0.000 0.618
realTweets Twitter_volume_IBM 15,893 2 0.000 0.000 0.000 0.483
realTweets Twitter_volume_KO 15,851 3 0.000 0.000 0.000 0.642
realTweets Twitter_volume_PFE 15,858 4 0.000 0.000 0.000 0.510
realTweets Twitter_volume_UPS 15,866 5 1.000 0.106 0.192 0.315
window=168 step=24 Δ<0.3 M<0.4 W<0.4 var_z=2.5 2.8s runtime

The coherence detector uses the full gated alert: Δ < threshold AND M (Memory-of-Attractor) < threshold AND W (Windowed Recovery) < threshold. Baseline is a 168-point centered rolling mean. The variance comparator flags windows where rolling variance exceeds 2.5x the global residual variance.