Wednesday, March 23, 2005

Lecture Wednesday 23 March 2005

Clifton, C, Cooley, R, , JM, and Rauch, J; TopCat: data mining for topic identification in a text corpus. in Principles of Data Mining and Knowledge Discovery. Third European

Experimental Design:
Factor: something you are changing
keywords being assigned
interst instead of support-confidence
person-org-place: using this representation other entities,

Level: what you are setting the factor to
It is important to cite reasons for why it is you're selecting factors, why it is you're fixing those factors

e.g., setting minimum and maximum term frequencies to get documents with a minimum of five terms in both docs. to retain as close as half of the original document corpus

e.g., why ten-fold cross-validation? because most people use it....

Blake, C. & Pratt, W. (2001). Better rules fewer features: A semantic approach to selecting features from text. In Proceedings of the Institute of Electrical and Electronics Engineers Data Mining Conference (IEEE DM 2001), San Jose, CA.


Post a Comment

<< Home