Stefan Berger, Scott McFaddin, et al.
MDM 2004
Cloud infrastructures promise to provide highperformance and cost-effective solutions to large-scale data processing problems. In this paper, we identify a common class of data-intensive applications for which data transfer latency for uploading data into the cloud in advance of its processing may hinder the linear scalability advantage of the cloud. For such applications, we propose a "stream-as-you-go" approach for incrementally accessing and processing data based on a stream data management architecture. We describe our approach in the context of a DNA sequence analysis use case and compare it against the state of the art in MapReduce-based DNA sequence analysis and incremental MapReduce frameworks. We provide experimental results over an implementation of our approach based on the IBM InfoSphere Streams computing platform deployed on Amazon EC2, showing an order of magnitude improvement in total processing time over the state of the art. © 2012 IEEE.
Stefan Berger, Scott McFaddin, et al.
MDM 2004
Pooja Aggarwal, Ajay Gupta, et al.
ICSOC 2020
Seetharami Seelam, Apoorve Mohan, et al.
ISCA 2023
David Wolpert, Gerry Strevig, et al.
ISSCC 2025