Fast regression with an ℓ∞ guarantee
Eric Price, Zhao Song, et al.
ICALP 2017
On a stream (Formula Presented.) of two dimensional data items (x, y) where x is an item identifier and y is a numerical attribute, a correlated aggregate query (Formula Presented.) asks to first apply a selection predicate σ along the y dimension, followed by an aggregation AGG along the x dimension. For selection predicates of the form (Formula Presented.) or (Formula Presented.), where parameter c is provided at query time, we present new streaming algorithms and lower bounds for estimating correlated aggregates. Our main result is a general method that reduces the estimation of a correlated aggregate AGG to the streaming computation of AGG over an entire stream, for an aggregate that satisfies certain conditions. This results in the first sublinear space algorithms for the correlated estimation of a large family of statistics, including frequency moments. Our experimental validation shows that the memory requirements of these algorithms are significantly smaller than existing linear storage solutions, and that these achieve a fast per-record processing time. We also study the setting when items have weights. In the case when weights can be negative, we give a strong space lower bound which holds even if the algorithm is allowed up to a logarithmic number of passes over the data. We complement this with a small space algorithm which uses a logarithmic number of passes.
Eric Price, Zhao Song, et al.
ICALP 2017
Michael Crouch, Andrew McGregor, et al.
ESA 2016
David P. Woodruff
NeurIPS 2014
Piotr Indyk, Eric Price, et al.
FOCS 2011