3.2.2 Expectation
If you have a collection of numbers $a_1,a_2,...,a_N$, their average is a single number that describes the whole collection. Now, consider a random variable $X$. We would like to define its average, or as it is called in probability, its expected value or mean. The expected value is defined as the weighted average of the values in the range.
Definition
Let $X$ be a discrete random variable with range $R_X=\{x_1,x_2,x_3, ...\}$ (finite or countably infinite). The expected value of $X$, denoted by $EX$ is defined as $$EX=\sum_{x_k \in R_X} x_k P(X=x_k)=\sum_{x_k \in R_X} x_k P_X(x_k).$$
To understand the concept behind $EX$, consider a discrete random variable with range $R_X=\{x_1,x_2,x_3, ...\}$. This random variable is a result of random experiment. Suppose that we repeat this experiment a very large number of times $N$, and that the trials are independent. Let $N_1$ be the number of times we observe $x_1$, $N_2$ be the number of times we observe $x_2$, ...., $N_k$ be the number of times we observe $x_k$, and so on. Since $P(X=x_k)=P_X(x_k)$, we expect that $$P_X(x_1)\approx \frac{N_1}{N},$$ $$P_X(x_2)\approx \frac{N_2}{N},$$ $$\hspace{10pt} . \hspace{20pt} . \hspace{20pt} .$$ $$P_X(x_k)\approx \frac{N_k}{N},$$ $$\hspace{10pt} . \hspace{20pt} . \hspace{20pt} .$$ In other words, we have $N_k \approx N P_X(x_k)$. Now, if we take the average of the observed values of $X$, we obtain
$\textrm{Average }$  $=\frac{N_1 x_1+N_2 x_2+N_3 x_3+...}{N}$ 
$\approx \frac{x_1 N P_X(x_1)+x_2N P_X(x_2)+x_3N P_X(x_3)+...}{N}$  
$=x_1 P_X(x_1)+x_2 P_X(x_2)+x_3 P_X(x_3)+...$  
$=EX.$ 
Thus, the intuition behind $EX$ is that if you repeat the random experiment independently $N$ times and take the average of the observed data, the average gets closer and closer to $EX$ as $N$ gets larger and larger. We sometimes denote $EX$ by $\mu_X$.
Let's compute the expected values of some wellknown distributions.
Example
Let $X \sim Bernoulli(p)$. Find $EX$.
 Solution

For the Bernoulli distribution, the range of $X$ is $R_X=\{0,1\}$, and $P_X(1)=p$
and $P_X(0)=1p$. Thus,
$EX$ $=0 \cdot P_X(0)+1 \cdot P_X(1)$ $=0 \cdot (1p)+ 1 \cdot p$ $=p$.

For the Bernoulli distribution, the range of $X$ is $R_X=\{0,1\}$, and $P_X(1)=p$
and $P_X(0)=1p$. Thus,
For a Bernoulli random variable, finding the expectation $EX$ was easy. However, for some random variables, to find the expectation sum, you might need a little algebra. Let's look at another example.
Example
Let $X \sim Geometric(p)$. Find $EX$.
 Solution

For the geometric distribution, the range is $R_X=\{1,2,3,... \}$ and the PMF is given by
$$P_X(k) = q^{k1}p, \hspace{20pt} \text{ for } k=1,2,...$$
where, $0 < p < 1$ and $q=1p$. Thus, we can write
$EX$ $=\sum_{x_k \in R_X} x_k P_X(x_k)$ $=\sum_{k=1}^{\infty} k q^{k1}p$ $=p\sum_{k=1}^{\infty} k q^{k1}$.
Now, we already know the geometric sum formula $$\sum_{k=0}^{\infty} x^k= \frac{1}{1x}, \hspace{20pt} \textrm{ for } x < 1.$$ But we need to find a sum $\sum_{k=1}^{\infty} k q^{k1}$. Luckily, we can convert the geometric sum to the form we want by taking derivative with respect to $x$, i.e., $$\frac{d}{dx} \sum_{k=0}^{\infty} x^k= \frac{d}{dx} \frac{1}{1x}, \hspace{20pt} \textrm{ for } x < 1.$$ Thus, we have $$\sum_{k=0}^{\infty} k x^{k1}= \frac{1}{(1x)^2}, \hspace{20pt} \textrm{ for } x < 1.$$ To finish finding the expectation, we can write$EX$ $=p\sum_{k=1}^{\infty} k q^{k1}$ $=p \frac{1}{(1q)^2}$ $=p \frac{1}{p^2}$ $=\frac{1}{p}$.
So, for $X \sim Geometric(p)$, $EX=\frac{1}{p}$. Note that this makes sense intuitively. The random experiment behind the geometric distribution was that we tossed a coin until we observed the first heads, where $P(H)=p$. Here, we found out that on average you need to toss the coin $\frac{1}{p}$ times in this experiment. In particular, if $p$ is small (heads are unlikely), then $\frac{1}{p}$ is large, so you need to toss the coin a large number of times before you observe a heads. Conversely, for large $p$ a few coin tosses usually suffices.

For the geometric distribution, the range is $R_X=\{1,2,3,... \}$ and the PMF is given by
$$P_X(k) = q^{k1}p, \hspace{20pt} \text{ for } k=1,2,...$$
where, $0 < p < 1$ and $q=1p$. Thus, we can write
Example
Let $X \sim Poisson(\lambda)$. Find $EX$.
 Solution

Before doing the math, we suggest that you try to guess what the expected value would be.
It might be a good idea to think about the examples where the Poisson distribution is used. For
the Poisson distribution, the range is $R_X=\{0,1,2,\cdots \}$ and the PMF is given by
$$P_X(k) = \frac{e^{\lambda} \lambda^k}{k!}, \hspace{20pt} \text{ for } k=0,1,2,...$$
Thus, we can write
$EX$ $=\sum_{x_k \in R_X} x_k P_X(x_k)$ $= \sum_{k=0}^{\infty} k \frac{e^{\lambda} \lambda^k}{k!}$ $=e^{\lambda} \sum_{k=1}^{\infty} \frac{ \lambda^k}{(k1)!}$ $=e^{\lambda} \sum_{j=0}^{\infty} \frac{\lambda^{(j+1)}}{j!}$ $(\textrm{ by letting }j=k1)$ $=\lambda e^{\lambda} \sum_{j=0}^{\infty} \frac{ \lambda^j}{j!}$ $=\lambda e^{\lambda} e^{\lambda}$ $(\textrm{ Taylor series for } e^{\lambda})$ $=\lambda$.
So the expected value is $\lambda$. Remember, when we first talked about the Poisson distribution, we introduced its parameter $\lambda$ as the average number of events. So it is not surprising that the expected value is $EX=\lambda$.

Before doing the math, we suggest that you try to guess what the expected value would be.
It might be a good idea to think about the examples where the Poisson distribution is used. For
the Poisson distribution, the range is $R_X=\{0,1,2,\cdots \}$ and the PMF is given by
$$P_X(k) = \frac{e^{\lambda} \lambda^k}{k!}, \hspace{20pt} \text{ for } k=0,1,2,...$$
Thus, we can write
Before looking at more examples, we would like to talk about an important property of expectation, which is linearity. Note that if $X$ is a random variable, any function of $X$ is also a random variable, so we can talk about its expected value. For example, if $Y=aX+b$, we can talk about $EY=E[aX+b]$. Or if you define $Y=X_1+X_2+\cdots+X_n$, where $X_i$'s are random variables, we can talk about $EY=E[X_1+X_2+\cdots+X_n]$. The following theorem states that expectation is linear, which makes it easier to calculate the expected value of linear functions of random variables.
Theorem
We have
 $E[aX+b]=aEX+b$, for all $a,b \in \mathbb{R}$;
 $E[X_1+X_2+\cdots+X_n]=EX_1+EX_2+\cdots+EX_n$, for any set of random variables $X_1, X_2,\cdots,X_n$.
We will prove this theorem later on in Chapter 5, but here we would like to emphasize its importance with an example.
Example
Let $X \sim Binomial(n,p)$. Find $EX$.
 Solution

We provide two ways to solve this problem. One way is as before: we do the math and
calculate $EX=\sum_{x_k \in R_X} x_k P_X(x_k)$ which will be a little tedious. A much
faster way would be to use linearity of expectation. In particular, remember that if
$X_1, X_2, ...,X_n$ are independent $Bernoulli(p)$ random variables, then the random
variable $X$ defined by $X=X_1+X_2+...+X_n$ has a $Binomial(n,p)$ distribution. Thus,
we can write
$EX$ $=E[X_1+X_2+\cdots+X_n]$ $=EX_1+EX_2+\cdots+EX_n$ $\hspace{20pt} \textrm{by linearity of expectation}$ $=p+p+\cdots+p$ $=np$.
We will provide the direct calculation of $EX=\sum_{x_k \in R_X} x_k P_X(x_k)$ in the Solved Problems section and as you will see it needs a lot more algebra than above. The bottom line is that linearity of expectation can sometimes make our calculations much easier. Let's look at another example.

We provide two ways to solve this problem. One way is as before: we do the math and
calculate $EX=\sum_{x_k \in R_X} x_k P_X(x_k)$ which will be a little tedious. A much
faster way would be to use linearity of expectation. In particular, remember that if
$X_1, X_2, ...,X_n$ are independent $Bernoulli(p)$ random variables, then the random
variable $X$ defined by $X=X_1+X_2+...+X_n$ has a $Binomial(n,p)$ distribution. Thus,
we can write
Example
Let $X \sim Pascal(m,p)$. Find $EX$. (Hint: Try to write $X=X_1+X_2+\cdots+X_m$, such that you already know $EX_i$.)
 Solution

We claim that if the $X_i$'s are independent and $X_i \sim Geometric(p)$, for $i=1$, $2$, $\cdots$,
$m$, then the random variable $X$ defined by $X=X_1+X_2+\cdots+X_m$ has $Pascal(m,p)$. To see
this, you can look at
Problem 5 in Section 3.1.6
and the discussion there. Now, since we already know $EX_i=\frac{1}{p}$, we conclude
$EX$ $=E[X_1+X_2+\cdots+X_m]$ $=EX_1+EX_2+\cdots+EX_m$ $\hspace{20pt} \textrm{by linearity of expectation}$ $=\frac{1}{p}+\frac{1}{p}+\cdots+\frac{1}{p}$ $=\frac{m}{p}$.
Again, you can try to find $EX$ directly and as you will see, you need much more algebra compared to using the linearity of expectation.

We claim that if the $X_i$'s are independent and $X_i \sim Geometric(p)$, for $i=1$, $2$, $\cdots$,
$m$, then the random variable $X$ defined by $X=X_1+X_2+\cdots+X_m$ has $Pascal(m,p)$. To see
this, you can look at
Problem 5 in Section 3.1.6
and the discussion there. Now, since we already know $EX_i=\frac{1}{p}$, we conclude