Lecture Wednesday 06 April 2005
Summarization
Mani, I and Bloedorn, E; Summarizing similarities and differences among related documents. Information Retrieval, 1999. 1(1-2): p. 35-67.
automatic text summarization
analysis phase
refinement phase
synthesis phase
why salient information?
establish similarities & differences
easier to compate salient items than the entire text body
by identifying both commonalities and differences, we can see what's novel
(very different than the nnotion of centroids)
represent each document of a graph with nodes as a word instance and edges are multiple types: ADJACENCY, SAME, ALPHA, PHRASE, NAME, COREFERENTIAL
weights of graph nodes: activation vector
phrase extraction:
WordNet as HP
topic-related text region spreading activation (think of nodes being lit up by queries)
two types of evaluation: extrinsic evaluation & intrinsic evaluation
extrinsic: how summary affects outcome of some other task
intrinsic: judgements of informativeness
what is it you should evaluate?
what if users disagree?
force disagreement?
compare it to system....
Okurowski, M. et al (2000). Text summarizer in use: lessons learned from real world deployment and evaluation. Proceedings of the ANLP/NAACL Workshop on Automatic Summarization, 49-58.
Mark Pope presenting
Question: we seem to be making the assumption that we are improving upon relevance rather than leaving it behind, that we are trying to get things "faster"
maybe we already know what our most relevant documents are, and instead of getting a "more efficient" representation, maybe we want to learn something remarkable
maybe we pick our favorite papers on a topic, we've read them and have grokked them well, but maybe we feel we might be missing something.
this idea of the technology suggesting the task is good
but also being more creative with problem identification than "information overload" in the "millions of relevant document" sense
let's IMAGINE some uses that are not currently part of any professional's task
Mani, I and Bloedorn, E; Summarizing similarities and differences among related documents. Information Retrieval, 1999. 1(1-2): p. 35-67.
automatic text summarization
analysis phase
refinement phase
synthesis phase
why salient information?
establish similarities & differences
easier to compate salient items than the entire text body
by identifying both commonalities and differences, we can see what's novel
(very different than the nnotion of centroids)
represent each document of a graph with nodes as a word instance and edges are multiple types: ADJACENCY, SAME, ALPHA, PHRASE, NAME, COREFERENTIAL
weights of graph nodes: activation vector
phrase extraction:
WordNet as HP
topic-related text region spreading activation (think of nodes being lit up by queries)
two types of evaluation: extrinsic evaluation & intrinsic evaluation
extrinsic: how summary affects outcome of some other task
intrinsic: judgements of informativeness
what is it you should evaluate?
what if users disagree?
force disagreement?
compare it to system....
Okurowski, M. et al (2000). Text summarizer in use: lessons learned from real world deployment and evaluation. Proceedings of the ANLP/NAACL Workshop on Automatic Summarization, 49-58.
Mark Pope presenting
Question: we seem to be making the assumption that we are improving upon relevance rather than leaving it behind, that we are trying to get things "faster"
maybe we already know what our most relevant documents are, and instead of getting a "more efficient" representation, maybe we want to learn something remarkable
maybe we pick our favorite papers on a topic, we've read them and have grokked them well, but maybe we feel we might be missing something.
this idea of the technology suggesting the task is good
but also being more creative with problem identification than "information overload" in the "millions of relevant document" sense
let's IMAGINE some uses that are not currently part of any professional's task