Monday, January 31, 2005

Monday 31 January 2005 Lecture

KD: Association

Typically used for recommendations
Also called market basket analysis
What do people buy together? What attributes are associated

left hand side: antecedent
right hand side: consequent

association rule must have an associated population P
- pop consists of a set of instances
e.g., each sale at a store is an instance
- set of all transactions is the population


set of items I {I1, I2, ..., Im}
transactions: D = {t1, t2, ... tn}
Itemset
set of items that satisfy some criteria or other


association rule algo
we are generally only intersted in association rules w/ generally high support

A priori algorithm
If {ACD} is frequent, then all subsets of {ACD} are frequent ({AC}, {AD}, {CD})

Two questions: why is unidirectional causaility implied by the terminology (e.g., antecedent, consequent)? Isn't it bidirectional by nature? Direction on the graph as we speak of it now is not temporal

Also, why aren't we interested in low support? Do we want to get only the best association rules in all cases, or sometimes do we want to describe the population space as completely as possible? isn't that determined by some extent to how we plan on using the results?

Re: different feature representations yield different

Confidence vs. support: interestingness!

We may have that info already present in a DB...

Another algo: Instance-Based Learning

Decision trees, clustering and association rules are created on historical data, then model us used to predict/describe class of new instance

Instance based: no model is created ahead of time
- learned when a new instance arrives
- identify historical data that is simlar

Similar challenges as clustering with respect to distance
symbolic distances are particualarly difficult
instance-based learning effective when efficient db design
similarity is being pushecd into the db--next-generation dbs will enable similarity-based queries

0 Comments:

Post a Comment

<< Home