**Definition.**

Poisson random variable: Here, we briefly review some properties of the Poisson random variable that we have discussed in the previous chapters. Remember that a discrete random variable $X$ is said to be a Poisson random variable with parameter $\mu$, shown as $X \sim \operatorname{Poisson}(\mu)$, if its range is $R_{X}={0,1,2,3, \ldots}$, and its PMF is given by

Here are some useful facts that we have seen before:

- If $X \sim$ Poisson $(\mu)$, then $E X=\mu$, and $\operatorname{Var}(X)=\mu .$
- If $X_{i} \sim$ Poisson $\left(\mu_{i}\right)$, for $i=1,2, \cdots, n$, and the $X_{i}$ ‘s are independent, then

$$

X_{1}+X_{2}+\cdots+X_{n} \sim \operatorname{Poisson}\left(\mu_{1}+\mu_{2}+\cdots+\mu_{n}\right)

$$ - The Poisson distribution can be viewed as the limit of binomial distribution.

$N(t)$ has the Poisson distribution with parameter $\lambda t ;$ that is to say,

$$

\mathbb{P}(N(t)=j)=\frac{(\lambda t)^{j}}{j !} e^{-\lambda t}, \quad j=0,1,2, \ldots .

$$

Condition $N(t+h)$ on $N(t)$ to obtain

$$

\begin{aligned}

\mathbb{P}(N(t+h)=j) &=\sum_{i} \mathbb{P}(N(t)=i) \mathbb{P}(N(t+h)=j \mid N(t)=i) \

&=\sum_{i} \mathbb{P}(N(t)=i) \mathbb{P}((j-i) \text { arrivals in }(t, t+h]) \

&=\mathbb{P}(N(t)=j-1) \mathbb{P}(\text { one arrival })+\mathbb{P}(N(t)=j) \mathbb{P}(\text { no arrivals })+o(h) .

\end{aligned}

$$

Thus $p_{j}(t)=\mathbb{P}(N(t)=j)$ satisfies

$$

p_{j}(t+h)=\lambda h p_{j-1}(t)+(1-\lambda h) p_{j}(t)+o(h) \quad \text { if } \quad j \neq 0

$$

$p_{0}(t+h)=(1-\lambda h) p_{0}(t)+o(h)$

Subtract $p_{j}(t)$ from each side of the first of these equations, divide by $h$, and let $h \downarrow 0$ to obtain

$$

p_{j}^{\prime}(t)=\lambda p_{j-1}(t)-\lambda p_{j}(t) \quad \text { if } \quad j \neq 0

$$

likewise

$$

p_{0}^{\prime}(t)=-\lambda p_{0}(t)

$$

The boundary condition is

$$

p_{j}(0)=\delta_{j 0}=\left{\begin{array}{ll}

1 & \text { if } j=0 \

0 & \text { if } j \neq 0

\end{array}\right.

$$

Equations (3) and (4) form a collection of differential-difference equations for the $p_{j}(t)$. Here are two methods of solution, both of which have applications elsewhere.

Method A. Induction. Solve (4) subject to the condition $p_{0}(0)=1$ to obtain $p_{0}(t)=e^{-\lambda t}$. Substitute this into (3) with $j=1$ to obtain $p_{1}(t)=\lambda t e^{-\lambda t}$ and iterate, to obtain by induction that

$$

p_{j}(t)=\frac{(\lambda t)^{j}}{j !} e^{-\lambda t}

$$

Method B. Generating functions. Define the generating function

$$

G(s, t)=\sum_{j=0}^{\infty} p_{j}(t) s^{j}=\mathbb{E}\left(s^{N(t)}\right)

$$

Multiply (3) by $s^{j}$ and sum over $j$ to obtain

$$

\frac{\partial G}{\partial t}=\lambda(s-1) G

$$

with the boundary condition $G(s, 0)=1$. The solution is, as required,

(6)

$$

G(s, t)=e^{\lambda(s-1) t}=e^{-\lambda t} \sum_{j=0}^{\infty} \frac{(\lambda t)^{j}}{j !} s^{j} .

$$

## Second Definition of the Poisson Process:

Let $N(t)$ be a Poisson process with rate $\lambda$. Consider a very short interval of length $\Delta$. Then, the number of arrivals in this interval has the same distribution as $N(\Delta) .$ In particular, we can write

$$

\begin{aligned}

P(N(\Delta)=0) &=e^{-\lambda \Delta} \

&=1-\lambda \Delta+\frac{\lambda^{2}}{2} \Delta^{2}-\cdots \text { (Taylor Series) }

\end{aligned}

$$

Note that if $\Delta$ is small, the terms that include second or higher powers of $\Delta$ are negligible compared to $\Delta$. We write this as

$$

P(N(\Delta)=0)=1-\lambda \Delta+o(\Delta)

$$

Here $o(\Delta)$ shows a function that is negligible compared to $\Delta$, as $\Delta \rightarrow 0$. More precisely, $g(\Delta)=o(\Delta)$ means that

$$

\lim _{\Delta \rightarrow 0} \frac{g(\Delta)}{\Delta}=0

$$

Now, let us look at the probability of having one arrival in an interval of length $\Delta$.

$$

\begin{aligned}

P(N(\Delta)=1) &=e^{-\lambda \Delta} \lambda \Delta \

&=\lambda \Delta\left(1-\lambda \Delta+\frac{\lambda^{2}}{2} \Delta^{2}-\cdots\right) \text { (Taylor Series) } \

&=\lambda \Delta+\left(-\lambda^{2} \Delta^{2}+\frac{\lambda^{3}}{2} \Delta^{3} \cdots\right) \

&=\lambda \Delta+o(\Delta) .

\end{aligned}

$$

We conclude that

$$

P(N(\Delta)=1)=\lambda \Delta+o(\Delta)

$$

Similarly, we can show that

$$

P(N(\Delta) \geq 2)=o(\Delta)

$$

In fact, equations $11,1,11,2$, and $11.3$ give us another way to define a Poisson process.

## Sum of two possion process

You might sometimes want to add together two (or more) independent Poisson processes, where by independent I mean that any two increments, one from each process, are independent random variables. For example, $N_{1}$ might be counting the number of retirements from Company #1, which occur at the Poisson rate 10 per year, and $N_{2}$ might be counting those from Company #2, which occur at the Poisson rate 15 per year; for some reason you are interested in the total retirements from the two companies combined. It seems plausible that the combined retirements occur at the rate $10+15=25$ per year. What is not at all obvious, however, nor simple to prove, is that 25 is a Poisson rate – that is, that the combined number of retirements constitutes a Poisson process with rate 25 per year. Since the proof of this result is beyond the scope of this note, I’ll simply state the result.

**Fact** (sum of Poisson processes) Suppose that the Poisson processes $N_{1}$ with rate function $\lambda_{1}$ and $N_{2}$ with rate function $\lambda_{2}$ are independent. Then the counting process $N$ defined by $N(t)=$ $N_{1}(t)+N_{2}(t)$ is a Poisson process with rate function $\lambda$ given by $\lambda(t)=\lambda_{1}(t)+\lambda_{2}(t)$.

New stochastic processes can be created as a mixture of Poisson processes. Since mixture distributions are important in other contexts as well and may not be familiar to you, please allow me to ramble a bit about mixtures in general. It might help to have a simple example first.

Example 1.16 Imagine a box full of fair dice; $20 \%$ of the dice are four-sided with the faces numbered one through four, and $80 \%$ are six-sided with the faces numbered one through six. If you repeatedly

$$

9

$$

roll a four-sided die, you’ll be seeing values from a Uniform random variable on the integers $1,2,3$, 4, all equally likely. And if you repeatedly roll a six-sided die, you’ll be seeing values from a Uniform random variable on the integers $1,2,3,4,5,6$, all equally likely.

But suppose that you reach into the box, grab a die at random (so a $20 \%$ chance of grabbing a four-sided die), roll it once, record the number, and return the die to the box. And then you repeat that process over and over, grabbing and rolling a random die each time. The results you will see are not from a Uniform random variable on $1,2,3,4$, nor from a Uniform random variable on 1,2, $3,4,5,6$. Rather, they are from a $20 / 80$ mixture of those two distributions.

Suppose that you want to compute the probability of rolling a 3 with a randomly chosen die – that is, $\operatorname{Pr}[N=3]$ where $N$ follows the mixture distribution. The rigorous mathematical computation of this proceeds as follows:

$$

\begin{aligned}

\operatorname{Pr}[N=3] &=\operatorname{Pr}[N=3 \text { and four-sided }]+\operatorname{Pr}[N=3 \text { and six-sided }] \

&=\operatorname{Pr}[\text { four-sided }] \operatorname{Pr}[N=3 \mid \text { four-sided }]+\operatorname{Pr}[\text { six-sided }] \operatorname{Pr}[N=3 \mid \text { six-sided }] \

&=(0.2)\left(\frac{1}{4}\right)+(0.8)\left(\frac{1}{6}\right)=\frac{11}{60}=0.18333 .

\end{aligned}

$$

The more intuitive Mixing Method approach is as follows: pretend you know which case you are in and compute the answers $[1 / 4$ and $1 / 6$ in this case $]$, and then compute the expected value of those answers as the cases vary [so $(0.2)(1 / 4)+(0.8)(1 / 6)=11 / 60$ as above]. Let’s use this approach to compute the expected value of $N .$ If you knew you had a four-sided die, the expected value would be $2.5$, while it would be $3.5$ for a six-sided die. So what is the expected value for a random die? Just the expected value of those two answers as the die-type varies: $(0.2)(2.5)+(0.8)(3.5)=3.3 .$ \Phi

For the Mixing Method used in Example $1.16$ to be valid, the quantity you are computing for the mixture distribution must involve only a linear computation with the probability function (or density function for a continuous-type random variable) of the mixture – that is, no squaring of probabilities, no dividing by probabilities, et cetera. This is certainly true for an unconditional probability for the mixture, as just illustrated with $\operatorname{Pr}[N=3]$ above. It’s also valid for the expected value, which is just a sum of probabilities times values of the variable. Likewise for, say, the second moment, since that is just a sum of probabilities times squared values of the variable; in Example 1.16, for example, $\mathrm{E}\left[N^{2}\right]=(0.2)(30 / 4)+(0.8)(91 / 6)=409 / 30=13.6333 .$ But not so for the variance, since the variance involves the square of the mean and thus the squaring of probabilities. In Example $1.16$, for instance, $\operatorname{Var}[N]$ is correctly computed as the mixture’s second moment $409 / 30$ minus the square of the mixture’s mean 3.3: $\operatorname{Var}[N]=409 / 30-3.3^{2}=2.7433$. Mixing the two variances $15 / 12$ and $35 / 12$ in the $20 / 80$ proportion gives the incorrect value $2.5833 .$