Any information generating process can be viewed as a source that emits a sequence of symbols chosen from a finite alphabet.

$>$ ASCll symbols (text)

$>n$ -bit image values $\left(2^{n}\right.$ symbols)

The simplest form of an information source is the so called Discrete Memoryless Source (DMS). Successive symbols produced by such a source are statistically independent.

A DMS is completely specified by the source alphabet $S=\left{s_{1}, s_{2}, \ldots, s_{n}\right}$ and the associated probabilities $P=\left{p_{1}, p_{2}, \ldots, p_{n}\right}$

The Self-Information of a symbol $s_{i}$ with probability $p_{i}$ is defined as:

$$

I\left(s_{i}\right)=\log *{2} \frac{1}{p*{i}}=-\log *{2} p*{i}

$$

The occurrence of a less probable event provides more information.

The information of a sequence of independent events taken as a single event equals the sum of their individual information. An event can be the occurence of a symbol.

The Average Information per Symbol or Entropy of a DMS is:

$$

H(S)=\sum_{i=1}^{n} p_{i} I\left(s_{i}\right)=-\sum_{i=1}^{n} p_{i} \log *{2} p*{i}

$$

The Entropy is measured in bits/symbol.

Interpretation of entropy:

- By definition it is the average amount of information that symbols of a specific source carry.

$-$ Entropy is also a measure of the “disorder” (uncertainty) of a system.

When we design a coding scheme the average number of bits per symbol we can achieve is always greater that the entropy. Therefore, the entropy is the best we can do in term of bits/symbol!

## Extension of a DHS

Given a DMS of size $n$, it might be beneficial to group the original symbols of the source into blocks of $N$ symbols. Each block can now be considered as a single source symbol generated by a source $S^{N}$ which has $n^{N}$ symbols.

In this case the entropy of the new source is

$$

H\left(S^{N}\right)=N \times H(s)

$$

We observe that when the source is extended, the entropy increases, however, the symbols increase in length. The entropy per original symbol remains the same.

## Noiseless source coding theorem

Let $S$ be a source with alphabet size $n$ and entropy $H(s)$.

Consider coding blocks of $N$ source symbols into binary codewords. For any $\delta>0$ (with $\delta$ a small number), it is possible by choosing $N$ large enough to construct a code in such a way that the average number of bits per original source symbol $l_{\text {avg }}$ satisfies the following:

$$

H(S) \leq l_{a v g}<H(s)+\delta

$$

We observe that for $\delta$ small enough, the average number of bits per symbols converges to the entropy of the source. This is the best coding we can achieve

The above is not realistic since the alphabet size increases too much with $N$.

## Course Description

This course introduces students to information theory, a mathematical formalism for quantifying and reasoning about communication. While traditionally a part of electrical engineering, it has found several powerful applications in the theory of algorithms and complexity and adjacent fields such as combinatorics and game theory. The first third of the course will teach students the basics of information theory (Shannon entropy, mutual information, Kullback-Liebler divergence). The rest of the course will sample topics from error correcting codes, communication complexity, data structures, and optimization, in each case highlighting applications of information theory.

## Prerequisites

This course requires a good background in mathematics (especially beginning probability theory) and exposure to formal analysis of algorithms (Dartmouth’s CS 31 or an equivalent course). Preparedness will be assessed in Week 1 using a diagnostic test; see the schedule for details.

According to the ORC, the prerequisites are “COSC 31 or COSC 30 plus permission of the instructor.”

## Course Staff

**Professor:** Amit Chakrabarti

There is no teaching assistant for this course.

## Textbook

The book *Elements of Information Theory, 2nd edition*, by Cover and Thomas is recommended (though not required) for the first half of the course. No existing textbook covers all the material of the second half. Other useful online resources.

- Information Theory, Inference, and Learning Algorithms, by David MacKay, is another very nice textbook for the first half of the course.
- Lecture Notes on Information Theory, by Polyanskiy and Wu, is an excellent modern resource, more mathematically deep than what we’ll do and with a statistics slant.

As we head into specialized topics, registered students will work on scribing lecture notes as the course runs, and these will form our own in-house resource.

## Work and Assessment

The assessed work in the course consists of (a) solving several homework problems, spaced out throughout the term; (b) scribing notes for up to two lectures; (c) a final project. Many homework problems will be challenging. For details, please see the assessment tab.