The Hessian matrix gives an excellent sufficient for a bivariate function to be convex. Before discussing it, we give a general result about symmetric #(2\times2)#-matrices. Recall that a #(2\times2)#-matrix is symmetric if the two non-diagonal entries are equal.
A point #u# of the plane will be viewed as a row vector, with #u^{\top}# as the corresponding column vector.
The following three statements regarding a symmetric #2\times2#-matrix #H=\matrix{h_{11}&h_{12}\\ h_{12} &h_{22}}# are equivalent:
- For every row vector #u# of length two, #{u} H\,{u}^{\top}\ge0#.
- #h_{11}\ge0#, #h_{22}\ge0#, and #\det(H) =h_{11}\cdot h_{22}-h_{12}^2\ge0#.
- For all numbers #x#, #y#, we have #h_{11}x^2+2h_{12}x\cdot y + h_{22} y^2\ge0#.
A symmetric matrix with these properties is called positive semidefinite.
The inequality in the first statement is strict for all nonzero vectors #{u}# if and only if all other inequalities in statements 2 and 3 are strict. In this case, the matrix #H# is called positive definite.
We first prove the equivalence of statements 1 and 2.
Both statements hold if #h_{11}=h_{12} =h_{22} = 0#. Therefore, we assume that this is not the case. Suppose #h_{11}=h_{22} = 0#, so #h_{12}\ne0#. Then statement 1 is not true, as the column vector #\cv{h_{12} \\ -1}# is a counterexample, and neither is statement 2, since #\det(H) = h_{11}\cdot h_{22}-h_{12}^2 = -h_{12}^2\lt0#.
Therefore, we may assume that at least one of #h_{11}#, #h_{22}# is nonzero. Since the statements remain the same if we interchange the roles of the two coordinates #x# and #y#, we may and will assume #h_{11} \ne0#.
We will use the following identity for all #x# and #y#:
\[\matrix {x&y} H\,\cv{x\\ y} = \frac{(h_{11}x+h_{12}y)^2+(h_{11}h_{22}-h_{12}^2)\cdot y^2}{h_{11}}\]
It can be verified by rewriting both sides to #h_{11}x^2+2h_{12}x\cdot y + h_{22} y^2#.
2 implies 1: First suppose the second statement holds. Then #h_{11}\gt0# and #h_{11}h_{22}-h_{12}^2\ge0#, so, for arbitrary numbers #x# and #y#:
\[ \matrix{x&y} H\,\cv{x\\ y} =\frac{(h_{11}x+h_{12}y)^2+(h_{11}h_{22}-h_{12}^2)\cdot y^2}{h_{11}}\ge0\]
which proves the first statement.
1 implies 2: Conversely, suppose the first statement holds, then, substituting #\rv{x,y} = \rv{1,0}# we find
\[\begin{array}{rcl}0&\le&\matrix{1& 0} H\,\cv{1\\ 0} \\ &=& \dfrac{ (h_{11})^2+(h_{11}h_{22}-h_{12}^2)\cdot 0^2}{h_{11}}\\ &=& h_{11} \end{array}\]
so #h_{11}\gt0# (recall #h_{11}\ne0#). Substituting #\rv{x,y} = \rv{-h_{12},h_{11}}# we find
\[\begin{array}{rcl}0&\le&\matrix{-h_{12} & h_{11}} H\,\cv{-h_{12}\\ h_{11}} \\ &=& \dfrac{ (-h_{11}h_{12}+h_{12}h_{11})^2+(h_{11}h_{22}-h_{12}^2)\cdot h_{11}^2}{h_{11}}\\ &=& \dfrac{h_{11}h_{22}-h_{12}^2}{h_{11}} \end{array}\]
which establishes #h_{11}h_{22}-h_{12}^2\ge0# since #h_{11}\gt0#.
This proves two out of the three inequalities that need to be proven. The remaining inequality, #h_{22}\ge0#, follows from the two other inequalities: #h_{22} \ge h_{22}-\frac{h_{12}^2}{h_{11}} = \frac{h_{11}h_{22}-h_{12}^2}{h_{11}}\ge0#.
Having shown that each statement implies the other, we conclude that statements 1 and 2 are equivalent. Because the quadratic function of statement 3 was already seen to be the same as #{u}H\,{u}^{\top} # with #{u} = \matrix{x & y}#, the equivalence of statements 3 and 1 is immediate. This ends the proof of the equivalence of the three statements.
By going over the above proof and replacing the inequalities #\le# and #\ge# by the strict inequalities #\lt# and #\gt#, respectively, we find a proof of the concluding statement of the theorem.
In order to establish a convexity criterion for #f# in terms of second partial derivatives, we use the Hessian matrix \[ H_f = \matrix{ f_{xx} &f_{xy}\\ f_{yx}&f_{yy}}\] In fact, #H_f# is a bivariate function. For a point #v=\rv{v_1,v_2}# of #\mathbb{R}^2#, we will write \[\left.H_f\right|_{v} = \matrix{ f_{xx}(v) &f_{xy}(v)\\ f_{yx}(v)&f_{yy}(v)}\] for its value at #v#.
Since the boundary of a domain may have global minima of a function that are not stationary points, we restrict ourselves to domains consisting of the interior of the domain and points on the boundary of the interior. The interior of a domain consists of all points that are the center of a disk that is fully contained in the domain. Domains of which each point is the center of a disk that lies in the domain are called open. But we also allow that domains have point on the boundary of the interior: these are point #p# outside the interior with the property that each disks having center #p# has points in the interior of the domain. A typical example of an open domain is the positive quadrant consisting of all points #\rv{x,y}# such that #x\gt0# and #y\gt0#. If we all points on the boundary of this domain, we obtain the domain consisting of all #\rv{x,y}# scuh that #x\ge0# and #y\ge0#.
Suppose that #f# is a function on a convex domain #S# all of whose first and second partial derivatives exist and are continuous. Then #f# is convex on #S# if and only if the Hessian matrix \(\left.H_f\right|_{v}\) of #f# at each point #v# of #S# is positive semidefinite.
Assume that the Hessian matrix of #f# is positive semidefinite at each point of #S#. Let #u# and #v# be points of #S#. We need to prove that, for \(0\le t\le 1\), we have \[f(t \cdot u+(1-t)\cdot v) \le t\cdot f(u)+ (1-t)\cdot f(v)\]
Write #g(t) =f(t \cdot u+(1-t)\cdot v)#. In terms of the univariate function #g# we need to show for each #t# with #0\le t\le 1#:
\[g(t)\le t\cdot g(1)+(1-t)\cdot g(0)\]
The chain rule for partial differentiation gives
\[\begin{array}{rcl} g'(t) &=& ({u}-{v})\boldsymbol{\cdot} \cv{f_x(t{u}+(1-t){v})\\ f_y({u}+(1-t){v})}\\ g''(t) &=& ({u}-{v}) \left.{H_ f}\right|_{t{u}+(1-t){v}} ({u}-{v})^{\top}\\ \end{array}\]
Since the Hessian matrix is positive semidefinite at the point #t{u}+(1-t){v}#, we have #g''(t)\ge0#. We now use the Taylor estimate which states that, for each #t\in\ivcc{0}{1}#, there is a number #z\in\ivcc{0}{t}# such that
\[g(t) = g(0)+g'(0)\cdot t +\frac{1}{2}g''(z)\cdot t^2\]
After reshuffling terms and use of #g''(u)\ge0#, this leads to
\[\begin{array}{rcl} g(0)&\ge& g(t)+g'(t)(-t)\\ g(1)&\ge& g(t)+g'(t)(1-t)\end{array}\]
Combining these two inequalities we find
\[\begin{array}{rcl}g(t)&=& (1-t)g(t)+t g(t)\\ &\le& (1-t)g(0)-(1-t)(-t) g'(t)+t g(1)-t(1-t)g'(t)\\ &=& (1-t)g(0)+t g(1)\end{array}\]
which gives the required inequality for showing that #f# is convex on #S#.
Conversely, assume that #f# is convex on #S# and let #u# be a point of #S# at which the Hessian matrix #\left.H_f\right|_u# is not positive semidefinite. Then there is a point #v# of #S# such that #({u}-{v})\left.H_f\right|_u({u}-{v})^{\top}\lt0#. As before, the left hand side equals #g''(1)#, where #g(t) =f(t \cdot u+(1-t)\cdot v)#, so we have #g''(1) \lt0#. This implies that #g# is strictly concave (that is, #-g# is strictly convex) in a small neighbourhood of #1# in #\ivcc{0}{1}#, so the restriction of #f# to the line segment between #u# and #v# is not convex, contradicting that #f# is convex on #S#. This shows that the Hessian matrix is positive semidefinite at each point of #S#.
The univariate analogue of this result states that a function #f(x)# of the single variable #x# on an interval whose first and second derivatives exist and are continuous, is convex if and only if its second derivative is nonegative on the whole interval.
The requirement that the Hessian matrix be positive semidefinite on the domain of #f# can be verified by use of the above theorem Positive semidefinite 2 by 2 matrices. Putting together these results with the theorem From stationary points to global extrema, we find
Let #f(x,y)# be a twice differentiable bivariate function with continuous second order derivatives defined on an open convex domain #S# in #\mathbb{R}^2#.
- If for all #\rv{x,y}# in #S#, \[f_{xx}(x,y)\leq 0, f_{yy}(x,y)\leq 0, \text{ and } f_{xx}(x,y)\cdot f_{yy}(x,y)-(f_{xy}(x,y))^2\geq 0\] then every stationary point of #f# is a global maximum.
- If for all #\rv{x,y}# in #S#, \[f_{xx}(x,y)\geq 0, f_{yy}(x,y)\geq 0, \text{ and } f_{xx}(x,y)\cdot f_{yy}(x,y)-(f_{xy}(x,y))^2\geq 0\] then every stationary point of #f# is a global minimum.
It follows from the above results that #f# is concave in the first case and convex in the second. Therefore, the conditions of theorem From stationary points to global extrema are satisfied and we conclude that stationary points of #f# are extrema.