|
Back to Colloquia
Physics Colloquium Friday, Feb. 15th, 2008,
4:00 P.M.
E300 Math/Science
Center; Refreshments at 3:30 P.M. in
Room E200
Department of Physics
University of Calgary
Mutual Information and Applications: from Sequence Alignment to Heart Beat
Analysis
I will first recall main features of the mutual information (MI) as a measure
for similarity or statistical dependence. In particular, I will discuss its
embodiments in two versions of information theory: Probabilistic (Shannon)
versus algorithmic (Kolmogorov). I will compare two different strategies for
estimating the latter, one involving sequence alignment and file compression
("zipping"), the other just zipping alone. Next, I will show how MI in general
leads to very simple hierarchical clustering (construction of dendrograms).
The last part of the talk will be devoted to estimating Shannon MI from
real-valued data, and two applications thereof: So-called "independent
component analysis" (ICA) and microarray gene expression. In ICA (a blind
source separation technique) a composite signal is linearly "de-mixed" into
its least dependent components -- without knowing a priori how the mixing was
done and what the components should look like. Our main example here deals with
the ECG (electrocardiogram) of a pregnant woman, where the goal is to seperate
the heart beat of the fetus from that of the mother (and from noise). Finally,
in microarray gene expression we shall see how differences between linear
dependency measures (such as the Pearson correlation coefficient) and nonlinear
measures (MI) allow to find structural features in gene regulation networks.
|
|