Monday 31 January 2005 Lecture
KD: Association
Typically used for recommendations
Also called market basket analysis
What do people buy together? What attributes are associated
left hand side: antecedent
right hand side: consequent
association rule must have an associated population P
- pop consists of a set of instances
e.g., each sale at a store is an instance
- set of all transactions is the population
set of items I {I1, I2, ..., Im}
transactions: D = {t1, t2, ... tn}
Itemset
set of items that satisfy some criteria or other
association rule algo
we are generally only intersted in association rules w/ generally high support
A priori algorithm
If {ACD} is frequent, then all subsets of {ACD} are frequent ({AC}, {AD}, {CD})
Two questions: why is unidirectional causaility implied by the terminology (e.g., antecedent, consequent)? Isn't it bidirectional by nature? Direction on the graph as we speak of it now is not temporal
Also, why aren't we interested in low support? Do we want to get only the best association rules in all cases, or sometimes do we want to describe the population space as completely as possible? isn't that determined by some extent to how we plan on using the results?
Re: different feature representations yield different
Confidence vs. support: interestingness!
We may have that info already present in a DB...
Another algo: Instance-Based Learning
Decision trees, clustering and association rules are created on historical data, then model us used to predict/describe class of new instance
Instance based: no model is created ahead of time
- learned when a new instance arrives
- identify historical data that is simlar
Similar challenges as clustering with respect to distance
symbolic distances are particualarly difficult
instance-based learning effective when efficient db design
similarity is being pushecd into the db--next-generation dbs will enable similarity-based queries
Typically used for recommendations
Also called market basket analysis
What do people buy together? What attributes are associated
left hand side: antecedent
right hand side: consequent
association rule must have an associated population P
- pop consists of a set of instances
e.g., each sale at a store is an instance
- set of all transactions is the population
set of items I {I1, I2, ..., Im}
transactions: D = {t1, t2, ... tn}
Itemset
set of items that satisfy some criteria or other
association rule algo
we are generally only intersted in association rules w/ generally high support
A priori algorithm
If {ACD} is frequent, then all subsets of {ACD} are frequent ({AC}, {AD}, {CD})
Two questions: why is unidirectional causaility implied by the terminology (e.g., antecedent, consequent)? Isn't it bidirectional by nature? Direction on the graph as we speak of it now is not temporal
Also, why aren't we interested in low support? Do we want to get only the best association rules in all cases, or sometimes do we want to describe the population space as completely as possible? isn't that determined by some extent to how we plan on using the results?
Re: different feature representations yield different
Confidence vs. support: interestingness!
We may have that info already present in a DB...
Another algo: Instance-Based Learning
Decision trees, clustering and association rules are created on historical data, then model us used to predict/describe class of new instance
Instance based: no model is created ahead of time
- learned when a new instance arrives
- identify historical data that is simlar
Similar challenges as clustering with respect to distance
symbolic distances are particualarly difficult
instance-based learning effective when efficient db design
similarity is being pushecd into the db--next-generation dbs will enable similarity-based queries