Understanding digital documents
J. Cooper
HICSS 1999
Summarization technologies today work, in essence, by performing data reduction over the original document source. Document fragments, identified as particularly representative of content, are extracted and offered to the user; typically, such fragments are sentence-sized, and the summary is nothing more than a concatenation of these sentences. We argue that for content characterization, phrasal units with certain discourse properties are more representative than sentences. From such a position, we outline a model of document content abstraction based on a notion of topically prominent topic stamps. For such abstractions to be useful, they need to retain contextual highlights of their occurrences in the documents; to be usable, they further need to be able to function as windows into the full documents, with suitably designed interfaces for navigation into areas of particular interest. This paper proposes a way for contextualing document highlights, relates this to our model of salience-based content characterization, and demonstrates how the document abstractions derived from such principles facilitate dynamic document content presentation. We argue that dynamic documents abstractions effectively mediate different levels of granularity analysis, from terse document highlights to full contextualized foci of particular interest. We close by describing a range of dynamic document viewers which embody novel presentation metaphors for delivery of document content.
J. Cooper
HICSS 1999
Erich P. Stuntebeck, John S. Davis II, et al.
HotMobile 2008
Raymond Wu, Jie Lu
ITA Conference 2007
Pradip Bose
VTS 1998