Problem 1.

P3.1 (10 pts) Generate $N=200$ samples of the input variable $X \sim \mathcal{U}(-1,1)$ and corresponding samples of $Y$ following the rule
$$Y=X\left(1-X^{2}\right)+\varepsilon, \quad \varepsilon \sim \mathcal{N}(0,0.01)$$
Note, you can use the NumPy routines to generate samples $Z \sim \mathcal{N}\left(\mu, \sigma^{2}\right)$ by running
\begin{aligned} &\mathrm{N}=200 \ &\mathrm{mu}=0.0 \ &\mathrm{sigma}=0.5 \ &\mathrm{z}=\mathrm{sigma} * \mathrm{np} \cdot \text { random } \operatorname{randn}(\mathrm{N})+\mathrm{mu} \end{aligned}
and similarly generate samples $Z \sim \mathcal{U}(-1,1)$ using the following example
\begin{aligned} &N=200 \ &u l=-1 \cdot 0 \ &u r=1.0 \ &z=(u r-u l) * n p \cdot r \text { andom } \cdot r \text { and }(N)+u l \end{aligned}
Taking these samples $\left(\mathfrak{x}{i}, \mathfrak{y}{i}\right)$ for $i=1, \ldots, N$ use the feature variables
$$h_{1}(X)=X, \quad h_{2}(X)=X^{2}, \quad h_{3}(X)=X^{3}$$
fit a linear model
$$f(X)=\beta_{0}+\sum_{j=1}^{3} \beta_{j} h_{j}(X)$$
by finding the least squares estimate $\hat{\beta}$.

Problem 2.

P3.2 ( 6 pts) Generate $M=100$ sets of $N=200$ samples for input variable $X \sim$ $\mathcal{U}(-1,1)$
$$Y=X+\varepsilon, \quad \varepsilon \sim \mathcal{N}(0,0.01)$$
Then fit a linear model $f(X)=\beta_{0}+\beta_{1} X$ for each set by finding the least squares estimates $\hat{\beta}$ for each sample set. Estimate the mean and the variance
$$\mathbb{E}[\hat{\beta}]=\frac{1}{M} \sum_{i=1}^{M} \hat{\beta}^{(i)}, \quad \operatorname{Var}[\hat{\beta}]=\mathbb{E}\left[|\hat{\beta}-\mathbb{E}[\hat{\beta}]|^{2}\right]$$

Problem 3.

P3.3 (14 pts) We will consider a data set with a lot of random features that contain no information about the output variable. That is,
\begin{aligned} Y &=X_{j}+\varepsilon \quad \text { for } j=1, \ldots, k, \ X_{j} &=\varepsilon \quad \text { for } j=k+1, \ldots, p, \end{aligned}
where $X_{j} \sim \mathcal{N}(0,1)$ and $\varepsilon \sim \mathcal{N}(0,1)$.
That is, first $k$ variables of $X$ contain information about $Y$, but not the later ones. We will construct a ridge regression model for these sets of features.
(a) Write a Python routine generating the data set $\left(\mathbf{X}{\text {train }}, \mathbf{y}{\text {train }}\right)$ for size $N_{\text {train }}=200$ with number of feature variables $p=40$, number of relevant feature variables $k=20$.
(b) Compute a ridge regression model by estimating $\hat{\beta}^{\text {ridge }}$ with regularization parameter $\lambda=1 / 20$.
(c) Compute the test error of the regression model by generating the test set of size $N_{\text {test }}=20$ and computing the MSE for the test set
$$\mathrm{MSE}{\text {test }}=\frac{1}{\sqrt{N{\text {test }}}}\left|\mathbf{X}{\text {test }} \hat{\beta}^{\text {ridge }}-\mathbf{y}{\text {test }}\right|_{\ell^{2}}$$
(d) Compute the effective degrees of freedom of the inverse of $\mathbf{X}{\text {train }}^{T} \mathbf{X}{\text {train }}$ given by
$$\operatorname{df}(\lambda)=\sum_{j=1}^{p} \frac{\sigma_{j}^{2}}{\sigma_{j}^{2}+1 / \lambda}$$
where $\sigma_{j}$ is the singular value of the matrix $\left(\mathbf{X}{\text {train }}^{T} \mathbf{X}{\text {train }}\right)^{-1}$.
(e) Compute the MSE for the training set, the MSE for the test set, and the effective degree of freedom (1.8) above, for varying total number of features $p=40,50,60, \ldots 400$. Create a plot of these values as a function of $p$.

real analysis代写analysis 2, analysis 3请认准UprivateTA™. UprivateTA™为您的留学生涯保驾护航。

# 概率论代考

## 离散数学代写

Categories: python数学代写