Friday, February 11, 2005

Friday CRADLE Talk Mining Spatial Patterns from Protein Structures

Luke Huan

Seq->Struct->Function

Structure indicates function

Mine frequent subgraphs; retrieve spatial motifs frm protein structure data

Global vs. local alignments of protein structures
local lignments are motifs

Mining for subgraphs/spatial motifs is a challenging problem for data mining

How to model protein w/ set of points
each aa us presented by a point in a 3d space; protein structure is a point set
LCP largest common point set problem


Is clique hashing a useful pattern finding approach for other domains?
Heavy combinmatorial approach


SCOP Structureal Classification of Proteins DB
10 classes
800 folds
1294 superfamilies
2327 families

every protein entered into this db is annotated using this scheme


hypergeometirc distribution:
The problem of finding the probability of such a picking problem is sometimes called the "urn problem," since it asks for the probability that i out of N balls drawn are "good" from an urn that contains n "good" balls and m "bad" balls. It therefore also describes the probability of obtaining exactly i correct balls in a pick-N lottery from a reservoir of r balls (of which n = N are "good" and are "bad"). (from MathWorld http://mathworld.wolfram.com/HypergeometricDistribution.html)

Searching for biological relevance:
reading papers: key word search, drawing from experience
website search
talking to people who know the subjects


presented an algo for finding frequently occurring spatial motifs
discovered motifs are specific, measured by low P-values using the hypergeometric distribution


motifs they discover are highly specific, measured by low P-values

mapping structures to highly specific motifs
how about mapping sequence

0 Comments:

Post a Comment

<< Home