Semantics and Anomaly Preserving Sampling Strategy for Large-Scale Time Series Data

By: Shibbir Ahmed, Md Johirul Islam, and Hridesh Rajan

PDF Download Download Paper

Abstract

We propose PASS, a O(n) algorithm for data reduction that is specifically aimed at preserving the semantics of time series data visualization in the form of line chart. Visualization of large trend line data is a challenge and current sampling approaches do produce reduction but result in loss of semantics and anomalous behavior. We have evaluated PASS using 7 large and well-vetted datasets (Taxi, Temperature, DEBS challenge 2012-2014 dataset, New York Stock Exchange data, and Integrated Surface Data) and found that it has several benefits when compared to existing state-of-the-art time series data reduction techniques. First, it can preserve the semantics of the trend. Second, the visualization quality using the reduced data from PASS is very close to the original visualization. Third, the anomalous behavior is preserved and can be well observed from the visualizations created using the reduced data. We have conducted two user surveys collecting 3000+ users’ responses for visual preference as well as perceptual effectiveness and found that the users prefer PASS over other techniques for different datasets. We also compare PASS using visualization metrics where it outperforms other techniques in 5 out of the 7 datasets.

ACM Reference

Ahmed, S. et al. 2022. Semantics and Anomaly Preserving Sampling Strategy for Large-Scale Time Series Data. ACM/IMS Transactions on Data Science. 1, 1 (Jan. 2022). DOI:https://doi.org/10.1145/3511918.

BibTeX Reference

@article{Ahmed2022,
  author = {Shibbir Ahmed and Md Johirul Islam and Hridesh Rajan},
  title = {Semantics and Anomaly Preserving Sampling Strategy for Large-Scale Time Series Data},
  journal = {ACM/IMS Transactions on Data Science},
  volume = {1},
  number = {1},
  article = {1},
  month = {January},
  year = {2022},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  abstract = {We propose PASS, a O(n) algorithm for data reduction that is specifically aimed at preserving the semantics of time series data visualization in the form of line chart. Visualization of large trend line data is a challenge and current sampling approaches do produce reduction but result in loss of semantics and anomalous behavior. We have evaluated PASS using 7 large and well-vetted datasets (Taxi, Temperature, DEBS challenge 2012-2014 dataset, New York Stock Exchange data, and Integrated Surface Data) and found that it has several benefits when compared to existing state-of-the-art time series data reduction techniques. First, it can preserve the semantics of the trend. Second, the visualization quality using the reduced data from PASS is very close to the original visualization. Third, the anomalous behavior is preserved and can be well observed from the visualizations created using the reduced data. We have conducted two user surveys collecting 3000+ users’ responses for visual preference as well as perceptual effectiveness and found that the users prefer PASS over other techniques for different datasets. We also compare PASS using visualization metrics where it outperforms other techniques in 5 out of the 7 datasets.},
  doi={10.1145/3511918},
}