Lecture Monday 28 February 2005
Information Extraction
----------------------
Two general approaches to information systems:
1. knowledge engineering
- hand-constructed grammars
- human experts design rules
- e.g., Paice & Jones, Blaschke et al
2. trained systems
- use stats where possible to learn rules
- Riloff less info, not fully learned; Califf & Mooney
Even trained system approaches require background knowledge of some sort
knowledge vs. data trade-off
1.2 million words needed to learn statistically
levels of info
text
words e.g., POS
noun phrase e.g., phrase units
sentence level
inter sentence level e.g., anaphoric resolution & discourse analysis
template level -vchanges format to output required
effort increased as
AAAI Applet & Hobbes
IE techniques:
KB -
Semi-learned
Learned
Representation for Learning
---------------------------
pre filler pattern
filler: what you want
post filler pattern
RAPIER: Robust automated production of information extraction rules
covering algorithm: takes a sentence that is a positive example
while more positive examples remain, create a rule that removes majority of positive examples
(seems like it would have pretty good precision but not so good recall)
RAPIER is one good example of a covering algo
----------------------
Two general approaches to information systems:
1. knowledge engineering
- hand-constructed grammars
- human experts design rules
- e.g., Paice & Jones, Blaschke et al
2. trained systems
- use stats where possible to learn rules
- Riloff less info, not fully learned; Califf & Mooney
Even trained system approaches require background knowledge of some sort
knowledge vs. data trade-off
1.2 million words needed to learn statistically
levels of info
text
words e.g., POS
noun phrase e.g., phrase units
sentence level
inter sentence level e.g., anaphoric resolution & discourse analysis
template level -vchanges format to output required
effort increased as
AAAI Applet & Hobbes
IE techniques:
KB -
Semi-learned
Learned
Representation for Learning
---------------------------
pre filler pattern
filler: what you want
post filler pattern
RAPIER: Robust automated production of information extraction rules
covering algorithm: takes a sentence that is a positive example
while more positive examples remain, create a rule that removes majority of positive examples
(seems like it would have pretty good precision but not so good recall)
RAPIER is one good example of a covering algo