This is question about optimization with inequality constraints. The problem we will investigate arrises in the context of Support Vector Machines which is a field of Machine Learning. You do not need to know any machine learning to solve this problem, but here is some discussion in case you are intereted. If you wish, you can ignore the discussion and go directly to the details of the problem.

Suppose you are given $m$ points $x_{i} \in \mathbb{R}^{n}, i=1, \ldots, m$ (these points are called ‘data points’). Each point $x_{i}$ is labeled by a number $y_{i}$ which is either $+1$ or $-1$. You can think of the data $\left{\left(x_{i}, y_{i}\right) \mid\right.$ $i=1, \ldots, m}$ as a collection of points in $\mathbb{R}^{n}$ some of which are labeled $+$ and some labeled $-$. The goal is to separate the $+$ points from the $-$ points by a hyperplane. In general this may not be possible, but here we assume that it is possible to separate the $+$ points from the – points by a hyperplane. For example, in one application the $+$ data points label pictures of dogs, and the $-$ data points label pictures of cats, and we would like to find an affine function that can tell the difference between these cats and dogs, which is equivalent to finding a separating hyperplane.

Recall that any hyperplane in $\mathbb{R}^{n}$ can be specified by a vector $w \in \mathbb{R}^{n}$ and a number $b:\left{x \in \mathbb{R}^{n} \mid w \cdot x+b=0\right}$. Note that
the hyperplanes separating the $+$ points from the $-$ points are the ones satisfying: $y_{i}\left(w \cdot x_{i}+b\right) \geq 0$ for all $i=1, \ldots, m$. For any such separating hyperplane $(w, b)$, the distance to the closest data point is given by:
$$\min {i=1, \ldots, m} \frac{\left|w \cdot x{i}+b\right|}{|w|}$$
When building a SVM people want to pick, among all the many separating hyperplanes, one that is optimal in the following sense:
$$\underset{w, b \atop b_{i}\left(w \cdot x_{i}+b\right) \geq 0} \max {i=1, \ldots, m} \min {|w|} \frac{\left|w \cdot x_{i}+b\right|}{|w|} .$$
The solution to this optimization problem is a hyperplane that maximizes the distance to the closest datapoint, which is the most “stable” separating hyperplane. For example, if you choose a hyperplane that is very close to one of the data points, then a new nearby datapoint of the same class might be misclassified by your SVM in

It is not difficult to show that the above problem is equivalent to the following problem, which is the problem $\left(^{}\right)$ we will be investigating in this question: nin $\frac{1}{2}|w|^{2}$ $w \in \mathbb{R}^{n}, b \in \mathbb{R}$ subject to: $1-y_{i}\left(w \cdot x_{i}-b\right) \leq 0$, for $i=1, \ldots, m$ (a) Write the $1^{\text {st }}$ order conditions for the problem $\left(^{}\right)$ above. Note that the unknowns here are the vector $w \in \mathbb{R}^{n}$ and the number $b \in \mathbb{R} .$ Use $\mu_{i}$ as Lagrange multipliers.
(b) Suppose the data you are given has only two points $x_{1}$ and $x_{2}$ where $y_{1}=+1$ and $y_{2}=-1$. Use your conditions from part (a) to find a candidate for the optimal separating hyperplane for the two points $x_{1}$ and $x_{2} .$ Hint: your answer should be a vector $w$ and a number $b$ expressed in terms of of the points $x_{1}$ and $x_{2}$.
) This question is about the meaning of Lagrange multipliers. Let $F, g: \mathbb{R}^{3} \rightarrow \mathbb{R}$ be two $C^{1}$ functions and let $S:=\left{x \in \mathbb{R}^{3} \mid g(x)=\right.$
$0} \neq \emptyset$ be the surface defined by $g .$ Assume $S$ is a regular surface, that is every point of $S$ is a regular point. At every point $p \in S$ define the vector $\nabla_{S} F(p):=\nabla F(p)-\left(\frac{\nabla F(p) \cdot \nabla g(p)}{|\nabla g(p)|^{2}}\right) \nabla g(p) .$ Note that
$\nabla_{S} F(p)$ is the projection of $\nabla F(p)$ onto $T_{p} S$.
(a) Show that $\nabla_{S} F(p)$ is tangent to $S$ at $p:$ that is show that $\nabla_{S} F(p) \cdot \nabla g(p)=0 .$ Remark: this means that $\nabla_{S} F$ is a tangent vector field on $S$.
(b) Show that for every curve $\gamma$ on $S,\left.\frac{d}{d t}\right|{t=0} F(\gamma(t))=\nabla{S} F(\gamma(0))$. $\gamma^{\prime}(0)$
(c) Show that $\nabla_{S} F(p)=0$ iff $\nabla F(p)+\left(-\frac{\nabla F(p) \cdot \nabla g(p)}{|\nabla g(p)|^{2}}\right) \nabla g(p)=0$.
Remark: in the notation of Lagrange multipliers $-\frac{\nabla F(p) \cdot \nabla g(p)}{|\nabla g(p)|^{2}}$ is $\lambda$. In other words, saying that the point $p$ satisfies the Lagrange multiplier condition means that $\nabla_{S} F$ is zero at $p .$
) Referring to the notation in question (2) above, let $g(x, y, z)=$ $x^{2}+y^{2}+z^{2}-1$ and $F(x, y, z)=-z$ be two $C^{1}$ functions on $\mathbb{R}^{3}$. Compute $\nabla_{S} F$ and find all the points $p \in S$ where $\nabla_{S} F(p)=0$
1) Recall that in lecture I proved that at a regular point $p, T_{p} M \subseteq$ $T_{p}$ and then said that equality follows from the Implicit Function Theorem. In this problem you are asked to prove $T_{p} \subseteq T_{p} M$ in a special case.

Let $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ be a $C^{1}$ function. Recall from HW4 that the graph of $f$ is the surface $M:=\left{(x, f(x)) \in \mathbb{R}^{n} \times \mathbb{R} \mid x \in \mathbb{R}^{n}\right}$ in
$\mathbb{R}^{n} \times \mathbb{R}$
(a) Let $p:=\left(x_{0}, f\left(x_{0}\right)\right) \in M .$ Find the space $T_{p}$
(b) Show that $T_{p} \subseteq T_{p} M$. That is show that any vector in the space $T_{p}$ is a tangent vector to $M$ at $p .$ Note: unlike the case on HW4 Q7, here you are not allowed to assume that $T_{p}=T_{p} M$.
(5) Let $f: \mathbb{R}^{3} \rightarrow \mathbb{R}$, and $g: \mathbb{R}^{2} \rightarrow \mathbb{R}^{1}$ be $C^{1}$ functions. Use Lagrange multipliers $\left(1^{s t}\right.$ order conditions) to show that first constrained optimization problem and the second unconstrained problem have the same candidates for minimizers:
$$\begin{array}{cc} \min & f(x, y, z) \ \text { subject to } & h(x, y, z)=z-g(x, y)=0 \ (x, y, z) \in \mathbb{R}^{3}, \end{array}$$
and
$$\begin{array}{cc} \min & f(x, y, g(x, y)) \ & (x, y) \in \mathbb{R}^{2} \end{array}$$
Note: the first problem involves the variables $x, y, z$ while the second one only $x, y$.
(6) Let $Q$ be a $2 \times 2$ positive definite symmetric matrix. Let $f \in C^{2}$ be an increasing, non-negative, and convex function on $\mathbb{R}^{1}$. Let $F(s)$ denote the area of the sublevel sets $\left{x^{T} Q x \leq f(s)^{2}\right}$ :
$$F(s)=\text { area }\left{x \in \mathbb{R}^{2} \mid x^{T} Q x \leq f(s)^{2}\right}$$
Prove that the function $F(s)$ is convex.
(7) Let $R>0$ and assume the following problem has a solution:
$$\begin{array}{cc} \min & f(x, y, z) \ \text { subject to } & x^{2}+y^{2}+R^{2} \leq z^{2} \ & z>R \end{array}$$
Note that the Kuhn-Tucker conditions $\left(^{*}\right)$ are:
$$\begin{gathered} \nabla f(x, y, z)+\mu\left[\begin{array}{c} 2 x \ 2 y \ -2 z \end{array}\right]=\left[\begin{array}{l} 0 \ 0 \ 0 \end{array}\right] \ \mu\left(x^{2}+y^{2}-z^{2}+R^{2}\right)=0, \mu \geq 0 . \end{gathered}$$

(a) Write the Kuhn-Tucker conditions for the following problem using Lagrange multipliers $\mu_{1}$ and $\mu_{2}$ :
$\begin{array}{cc}\min & f\left(r \sqrt{z^{2}-R^{2}} \cos \theta, r \sqrt{z^{2}-R^{2}} \sin \theta, z\right) \ \text { to } & r-1 \leq 0\end{array}$
subject to
$-r \leq 0$
$R-z<0$
$\theta \in \mathbb{R}^{1}$
(b) Show that the conditions you found in part (a) for a point $(r, \theta, z)$ imply the Kuhn-Tucker conditions $\left(^{*}\right)$ for the corresponding point $(x, y, z)=\left(r \sqrt{z^{2}-R^{2}} \cos \theta, r \sqrt{z^{2}-R^{2}} \sin \theta, z\right)$.
Start by expressing $\mu$ in terms of $\mu_{1}$ and $\mu_{2}$. There should be three cases to consider: when $r=0,0<r<1$, and $r=1$.

# Linear and Non-Linear Programming Winter 2020

• Lectures: Mon 6-9pm in LM 162
• Office hours begin the week of Jan 15.
• The suggested textbook for this course is Linear and Non-Linear Programming (4th Edition) by David Luenberger & Yinyu Ye. Publisher: Springer. Should be available at the UT bookstore. The textbook is suggested and not required: I will try to make the lectures and HW self contained.Before classes begin, I suggest reviewing some multivariable calculus, especially the part about partial derivatives, gradients, Taylor expansion, and the Jacobian matrix. Refreshing basic linear algebra would also be helpful.
• Course Syllabus (approximate)
• Linear and Non-Linear Programming: parts of chapters 7,8,9,11.
• The Calculus of Variations.
• Supplementary material from lectures.
• The Course Grade will be calculated as follows:
• There will be 5 or 6 homework assignments. Assignments are to be handed in on assigned dates at the beginning of class.
• Please staple your homeworks. Unstapled homeworks will be penalized 2 points.
• NO late homework will be accepted.
• Important: There will be no make up term test! If you have a valid reason for missing the term test, the corresponding portion of the final exam will count as your term test grade.
• Exam dates:
• Term Test: Mon, Feb 26.
• Final Exam: Mon, Apr 30, 9am-12.