##### 以下是EE5904/ME5404 Neural Networks Homework #3的题目的简单解析

Important note: the due date is 22/03/2021. You should submit your scripts to the folder in LumiNus. Late submission is not allowed unless it is well justified. Please include the MATLAB code or Python Code as attachment if computer experiment is involved.
Please note that the MATLAB toolboxes for RBFN and SOM are not well developed. Please write your own codes to implement RBFN and SOM instead of using the MATLAB toolbox.

##### EE5904/ME5404 Neural Networks Homework #3第一题
Problem 1.

Q1. Function Approximation with RBFN (10 Marks) Consider using RBFN to approximate the following function:
$$y=1.2 \sin (\pi x)-\cos (2.4 \pi x), \quad \text { for } x \in[-1,1]$$
The training set is constructed by dividing the range [-1,1] using a uniform step length $0.05,$ while the test set is constructed by dividing the range [-1,1] using a uniform step length 0.01 . Assume that the observed outputs in the training set are corrupted by random noise as follows.
$$y(i)=1.2 \sin (\pi x(i))-\cos (2.4 \pi x(i))+0.3 n(i)$$
where the random noise $n(i)$ is Gaussian noise with zero mean and stand deviation of one, which can be generated by MATLAB command randn. Note that the test set is not corrupted by noises. Perform the following computer experiments:
a) Use the exact interpolation method (as described on pages $16-21$ in the slides of lecture five) and determine the weights of the RBFN. Assume the RBF is Gaussian function with standard deviation of $0.1 .$ Evaluate the approximation performance of the resulting RBFN using the test set. (3 Marks)
b) Follow the strategy of “Fixed Centers Selected at Random” (as described on page 37 in the slides of lecture five), randomly select 20 centers among the sampling points. Determine the weights of the RBFN. Evaluate the approximation performance of the resulting RBFN using test set. Compare it to the result of part a). (4 Marks)
c) Use the same centers and widths as those determined in part a) and apply the regularization method as described on pages $42-45$ in the slides for lecture five. Vary the value of the regularization factor and study its effect on the performance of RBFN. (3 Marks)

Proof .

Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer. The input can be modeled as a vector of real numbers $\mathbf{x} \in \mathbb{R}^{n}$. The output of the network is then a scalar function of the input vector, $\varphi: \mathbb{R}^{n} \rightarrow \mathbb{R},$ and is given by
$$\varphi(\mathbf{x})=\sum_{i=1}^{N} a_{i} \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)$$
where $N$ is the number of neurons in the hidden layer, $\mathbf{c}_{i}$ is the center vector for neuron $i$, and $a_{i}$ is the weight of neuron $i$ in the linear output neuron. Functions that depend only on the distance from a center vector are radially symmetric about that vector, hence the name radial basis function. In the basic form, all inputs are connected to each hidden neuron. The norm is typically taken to be the Euclidean distance (although the Mahalanobis distance appears to perform better with pattern recognition [4][5] [editorializing]) and the radial basis function is commonly taken to be Gaussian $\rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)=\exp \left[-\beta\left\|\mathbf{x}-\mathbf{c}_{i}\right\|^{2}\right]$
The Gaussian basis functions are local to the center vector in the sense that
$$\lim _{\|x\| \rightarrow \infty} \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)=0$$
i.e. changing parameters of one neuron has only a small effect for input values that are far away from the center of that neuron. Given certain mild conditions on the shape of the activation function, RBF networks are universal approximators on a compact subset of $\mathbb{R}^{n} .[6]$ This means that an RBF network with enough hidden neurons can approximate any continuous function on a closed, bounded set with arbitrary precision.
The parameters $a_{i}, \mathbf{c}_{i}$, and $\beta_{i}$ are determined in a manner that optimizes the fit between $\varphi$ and the data.

Theoretical motivation for normalization [ edit]
There is theoretical justification for this architecture in the case of stochastic data flow. Assume a stochastic kernel approximation for the joint probability density
$P(\mathbf{x} \wedge y)=\frac{1}{N} \sum_{i=1}^{N} \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right) \sigma\left(\left|y-e_{i}\right|\right)$
where the weights $\mathbf{c}_{i}$ and $e_{i}$ are exemplars from the data and we require the kernels to be normalized
$$\int \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right) d^{n} \mathbf{x}=1$$
and
$$\int \sigma\left(\left|y-e_{i}\right|\right) d y=1$$
The probability densities in the input and output spaces are
$$P(\mathbf{x})=\int P(\mathbf{x} \wedge y) d y=\frac{1}{N} \sum_{i=1}^{N} \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)$$
and
The expectation of y given an input $\mathbf{x}$ is
$$\varphi(\mathbf{x}) \stackrel{\text { def }}{=} E(y \mid \mathbf{x})=\int y P(y \mid \mathbf{x}) d y$$
where
$$P(y \mid \mathbf{x})$$
is the conditional probability of y given $\mathbf{x}$. The conditional probability is related to the joint probability through Bayes theorem $P(y \mid \mathbf{x})=\frac{P(\mathbf{x} \wedge y)}{P(\mathbf{x})}$
which yields
$$\varphi(\mathbf{x})=\int y \frac{P(\mathbf{x} \wedge y)}{P(\mathbf{x})} d y$$
This becomes
$$\varphi(\mathbf{x})=\frac{\sum_{i=1}^{N} e_{i} \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)}{\sum_{i=1}^{N} \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)}=\sum_{i=1}^{N} e_{i} u\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)$$
when the integrations are performed.

Local linear models 
It is sometimes convenient to expand the architecture to include local linear models. In that case the architectures become, to first order,
$$\varphi(\mathbf{x})=\sum_{i=1}^{N}\left(a_{i}+\mathbf{b}_{i} \cdot\left(\mathbf{x}-\mathbf{c}_{i}\right)\right) \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)$$
and
$$\varphi(\mathbf{x})=\sum_{i=1}^{N}\left(a_{i}+\mathbf{b}_{i} \cdot\left(\mathbf{x}-\mathbf{c}_{i}\right)\right) u\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right)$$
in the unnormalized and normalized cases, respectively. Here $\mathbf{b}_{i}$ are weights to be determined. Higher order linear terms are also possible.
This result can be written
$$\varphi(\mathbf{x})=\sum_{i=1}^{2 N} \sum_{j=1}^{n} e_{i j} v_{i j}\left(\mathbf{x}-\mathbf{c}_{i}\right)$$
where
$$e_{i j}=\left\{\begin{array}{ll} a_{i}, & \text { if } i \in[1, N] \\ b_{i j}, & \text { if } i \in[N+1,2 N] \end{array}\right.$$
and
$$v_{i j}\left(\mathbf{x}-\mathbf{c}_{i}\right) \stackrel{\text { def }}{=}\left\{\begin{array}{ll} \delta_{i j} \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right), & \text { if } i \in[1, N] \\ \left(x_{i j}-c_{i j}\right) \rho\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right), & \text { if } i \in[N+1,2 N] \end{array}\right.$$
in the unnormalized case and
$$v_{i j}\left(\mathbf{x}-\mathbf{c}_{i}\right) \stackrel{\text { dof }}{=}\left\{\begin{array}{ll} \delta_{i j} u\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right), & \text { if } i \in[1, N] \\ \left(x_{i j}-c_{i j}\right) u\left(\left\|\mathbf{x}-\mathbf{c}_{i}\right\|\right), & \text { if } i \in[N+1,2 N] \end{array}\right.$$
in the normalized case.
Here $\delta_{i j}$ is a Kronecker delta function defined as
$$\delta_{i j}=\left\{\begin{array}{ll} 1, & \text { if } i=j \\ 0, & \text { if } i \neq j \end{array}\right.$$

E-mail: [email protected]  微信:shuxuejun

EE5904/ME5404代写| Neural Networks代写神经网络代写请认准uprivateta

uprivateta™是一个服务全球中国留学生的专业代写公司