Donald D. Chamberlin, Morton M. Astrahan, et al.
CACM
Counts of unique values are frequently needed information in database systems. Especially, they are essential in query optimization and physical database design. Traditionally, exact counts were obtained by sorting, which is an expensive operation. In this paper we present three algorithms for counting unique values by probabilistic methods. These algorithms require only one pass over the data, and produce approximations to the true count with certain standard deviations. For deviations acceptable in practical environments (~10%), the algorithms require only modest amounts of memory space and computation time. We have implemented all three algorithms in System R. We also present the results of the experiments on accuracy and performance of these algorithms. © 1987.
Donald D. Chamberlin, Morton M. Astrahan, et al.
CACM
Morton M. Astrahan, John F. Jacobs
Annals of the History of Computing
Mario Schkolnick
ACM Transactions on Database Systems (TODS)
Mario Schkolnick, Paolo Tiberio
COMPSAC 1979