Physics Colloquium - Friday, Feb. 15th, 2008, 4:00 P.M.

E300 Math/Science Center; Refreshments at 3:30 P.M. in Room E200

Peter Grassberger
Department of Physics
University of Calgary

Mutual Information and Applications: from Sequence Alignment to Heart Beat Analysis

I will first recall main features of the mutual information (MI) as a measure for similarity or statistical dependence. In particular, I will discuss its embodiments in two versions of information theory: Probabilistic (Shannon) versus algorithmic (Kolmogorov). I will compare two different strategies for estimating the latter, one involving sequence alignment and file compression ("zipping"), the other just zipping alone. Next, I will show how MI in general leads to very simple hierarchical clustering (construction of dendrograms). The last part of the talk will be devoted to estimating Shannon MI from real-valued data, and two applications thereof: So-called "independent component analysis" (ICA) and microarray gene expression. In ICA (a blind source separation technique) a composite signal is linearly "de-mixed" into its least dependent components -- without knowing a priori how the mixing was done and what the components should look like. Our main example here deals with the ECG (electrocardiogram) of a pregnant woman, where the goal is to seperate the heart beat of the fetus from that of the mother (and from noise). Finally, in microarray gene expression we shall see how differences between linear dependency measures (such as the Pearson correlation coefficient) and nonlinear measures (MI) allow to find structural features in gene regulation networks.