Monday, February 28, 2005

Lecture Monday 28 February 2005

Information Extraction
----------------------

Two general approaches to information systems:

1. knowledge engineering
- hand-constructed grammars
- human experts design rules
- e.g., Paice & Jones, Blaschke et al

2. trained systems
- use stats where possible to learn rules
- Riloff less info, not fully learned; Califf & Mooney


Even trained system approaches require background knowledge of some sort


knowledge vs. data trade-off
1.2 million words needed to learn statistically


levels of info

text
words e.g., POS
noun phrase e.g., phrase units
sentence level
inter sentence level e.g., anaphoric resolution & discourse analysis
template level -vchanges format to output required

effort increased as


AAAI Applet & Hobbes


IE techniques:
KB -
Semi-learned
Learned



Representation for Learning
---------------------------

pre filler pattern
filler: what you want
post filler pattern


RAPIER: Robust automated production of information extraction rules


covering algorithm: takes a sentence that is a positive example
while more positive examples remain, create a rule that removes majority of positive examples

(seems like it would have pretty good precision but not so good recall)

RAPIER is one good example of a covering algo

0 Comments:

Post a Comment

<< Home