We've updated our
Privacy Policy effective December 15. Please read our updated Privacy Policy and tap

Guias de estudo > Boundless Calculus

Partial Derivatives

Functions of Several Variables

Multivariable calculus is the extension of calculus in one variable to calculus in more than one variable.

Learning Objectives

Identify areas of application of multivariable calculus

Key Takeaways

Key Points

  • Multivariable calculus can be applied to analyze deterministic systems that have multiple degrees of freedom.
  • Unlike a single variable function f(x)f(x), for which the limits and continuity of the function need to be checked as xx varies on a line (xx-axis), multivariable functions have infinite number of paths approaching a single point.
  • In multivariable calculus, gradient, Stokes', divergence, and Green theorems are specific incarnations of a more general theorem: the generalized Stokes' theorem.

Key Terms

  • deterministic: having exactly predictable time evolution
  • divergence: a vector operator that measures the magnitude of a vector field's source or sink at a given point, in terms of a signed scalar
Multivariable calculus (also known as multivariate calculus) is the extension of calculus in one variable to calculus in more than one variable: the differentiated and integrated functions involve multiple variables, rather than just one. Multivariable calculus can be applied to analyze deterministic systems that have multiple degrees of freedom. Functions with independent variables corresponding to each of the degrees of freedom are often used to model these systems, and multivariable calculus provides tools for characterizing the system dynamics.
image

A Scalar Field: A scalar field shown as a function of (x,y)(x,y). Extensions of concepts used for single variable functions may require caution.

Multivariable calculus is used in many fields of natural and social science and engineering to model and study high-dimensional systems that exhibit deterministic behavior. Non-deterministic, or stochastic, systems can be studied using a different kind of mathematics, such as stochastic calculus. Quantitative analysts in finance also often use multivariate calculus to predict future trends in the stock market. As we will see, multivariable functions may yield counter-intuitive results when applied to limits and continuity. Unlike a single variable function f(x)f(x), for which the limits and continuity of the function need to be checked as xx varies on a line (xx-axis), multivariable functions have infinite number of paths approaching a single point.Likewise, the path taken to evaluate a derivative or integral should always be specified when multivariable functions are involved. We have also studied theorems linking derivatives and integrals of single variable functions. The theorems we learned are gradient theorem, Stokes' theorem, divergence theorem, and Green's theorem. In a more advanced study of multivariable calculus, it is seen that these four theorems are specific incarnations of a more general theorem, the generalized Stokes' theorem, which applies to the integration of differential forms over manifolds.

Limits and Continuity

A study of limits and continuity in multivariable calculus yields counter-intuitive results not demonstrated by single-variable functions.

Learning Objectives

Describe the relationship between the multivariate continuity and the continuity in each argument

Key Takeaways

Key Points

  • The function f(x,y)=x2yx4+y2f(x,y) = \frac{x^2y}{x^4+y^2} has different limit values at the origin, depending on the path taken for the evaluation.
  • Continuity in each argument does not imply multivariate continuity.
  • When taking different paths toward the same point yields different values for the limit, the limit does not exist.

Key Terms

  • continuity: lack of interruption or disconnection; the quality of being continuous in space or time
  • limit: a value to which a sequence or function converges
  • scalar function: any function whose domain is a vector space and whose value is its scalar field
A study of limits and continuity in multivariable calculus yields many counter-intuitive results not demonstrated by single- variable functions. For example, there are scalar functions of two variables with points in their domain which give a particular limit when approached along any arbitrary line, yet give a different limit when approached along a parabola. For example, the function f(x,y)=x2yx4+y2f(x,y) = \frac{x^2y}{x^4+y^2} approaches zero along any line through the origin. However, when the origin is approached along a parabola y=x2y = x^2, it has a limit of 0.50.5. Since taking different paths toward the same point yields different values for the limit, the limit does not exist.
image

Continuity: Continuity in single-variable function as shown is rather obvious. However, continuity in multivariable functions yields many counter-intuitive results.

Continuity in each argument does not imply multivariate continuity. For instance, in the case of a real-valued function with two real-valued parameters, f(x,y)f(x,y), continuity of ff in xx for fixed yy and continuity of ff in yy for fixed xx does not imply continuity of ff. As an example, consider f(x,y)={yxyif 1x>y0xyxif 1y>x01xif x=y>00else.f(x,y)= \begin{cases} \displaystyle{\frac{y}{x}}-y & \text{if } 1 \geq x > y \geq 0 \\ \displaystyle{\frac{x}{y}}-x & \text{if } 1 \geq y > x \geq 0 \\ 1-x & \text{if } x=y>0 \\ 0 & \text{else}. \end{cases} It is easy to check that all real-valued functions (with one real-valued argument) that are given by fy(x)=f(x,y)f_y(x)= f(x,y) are continuous in xx (for any fixed yy). Similarly, all fxf_x are continuous as ff is symmetric with regards to xx and yy. However, ff itself is not continuous as can be seen by considering the sequence f(1n,1n)f \left(\frac{1}{n},\frac{1}{n} \right) (for natural nn), which should converge to f(0,0)=0\displaystyle{f (0,0) = 0} if ff is continuous. However, limf(1n,1n)=1\lim f \left(\frac{1}{n},\frac{1}{n} \right) = 1.

Partial Derivatives

A partial derivative of a function of several variables is its derivative with respect to a single variable, with the others held constant.

Learning Objectives

Identify proper ways to express the partial derivative

Key Takeaways

Key Points

  • The partial derivative of a function ff with respect to the variable xx is variously denoted by fx, f,x, xf, or fxf^\prime_x,\ f_{,x},\ \partial_x f, \text{ or } \frac{\partial f}{\partial x}.
  • To every point on this surface describing a multi-variable function, there is an infinite number of tangent lines. Partial differentiation is the act of choosing one of these lines and finding its slope.
  • As an ordinary derivative, partial derivatives are defined in limit: aif(a)=limh0f(a1,,ai1,ai+h,ai+1,,an)f(a1,,ai,,an)h\frac{ \partial }{\partial a_i }f(\mathbf{a}) = \lim_{h \rightarrow 0}{ f(a_1, \dots, a_{i-1}, a_i+h, a_{i+1}, \dots,a_n) - f(a_1, \dots, a_i, \dots,a_n) \over h }.

Key Terms

  • differential geometry: the study of geometry using differential calculus
  • Euclidean: adhering to the principles of traditional geometry, in which parallel lines are equidistant
A partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Partial derivatives are used in vector calculus and differential geometry. The partial derivative of a function f with respect to the variable x is variously denoted by fx, f,x, xf, or fxf^\prime_x,\ f_{,x},\ \partial_x f, \text{ or } \frac{\partial f}{\partial x}. Suppose that f is a function of more than one variable. For instance, z=f(x,y)=x2+xy+y2z = f(x, y) = x^2 + xy + y^2. The graph of this function defines a surface in Euclidean space. To every point on this surface, there is an infinite number of tangent lines. Partial differentiation is the act of choosing one of these lines and finding its slope. Usually, the lines of most interest are those which are parallel to the xzxz-plane and those which are parallel to the yzyz-plane (which result from holding either yy or xx constant, respectively).
image

Graph of z=x2+xy+y2z = x^2 + xy + y^2: For the partial derivative at (1,1,3)(1, 1, 3) that leaves yy constant, the corresponding tangent line is parallel to the xzxz-plane.

To find the slope of the line tangent to the function at P(1,1,3)P(1, 1, 3) that is parallel to the xzxz-plane, the yy variable is treated as constant. By finding the derivative of the equation while assuming that yy is a constant, the slope of ff at the point (x,y,z)(x, y, z) is found to be: zx=2x+y\displaystyle{\frac{\partial z}{\partial x} = 2x+y} So at (1,1,3)(1, 1, 3), by substitution, the slope is 33. Therefore, zx=3\displaystyle{\frac{\partial z}{\partial x} = 3} at the point (1,1,3)(1, 1, 3). That is to say, the partial derivative of zz with respect to xx at (1,1,3)(1, 1, 3) is 33.
image

Graph of z=x2+xy+y2z = x^2 + xy + y^2 at y=1y=1: A slice of the graph at y=1y=1.

Formal Definition

Like ordinary derivatives, the partial derivative is defined as a limit. Let UU be an open subset of RnR^n and f:URf:U \rightarrow R a function. The partial derivative of ff at the point a=(a1,,an)Ua = (a_1, \cdots, a_n) \in U with respect to the iith variable  is defined as: aif(a)=limh0f(a1,,ai1,ai+h,ai+1,,an)f(a1,,ai,,an)h\displaystyle{\frac{ \partial }{\partial a_i }f(\mathbf{a}) = \lim_{h \rightarrow 0}{ f(a_1, \cdots, a_{i-1}, a_i+h, a_{i+1}, \cdots,a_n) - f(a_1, \cdots, a_i, \cdots,a_n) \over h }}

Tangent Planes and Linear Approximations

The tangent plane to a surface at a given point is the plane that "just touches" the surface at that point.

Learning Objectives

Explain why the tangent plane can be used to approximate the surface near the point

Key Takeaways

Key Points

  • For a surface given by a differentiable multivariable function z=f(x,y)z=f(x,y), the equation of the tangent plane at (x0,y0,z0)(x_0,y_0,z_0) is given as fx(x0,y0)(x−x0)+fy(x0,y0)(y−y0)−(z−z0)=0f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0) - (z-z_0) = 0.
  • Since a tangent plane is the best approximation of the surface near the point where the two meet, the tangent plane can be used to approximate the surface near the point.
  • The plane describing the linear approximation for a surface described by z=f(x,y)z=f(x,y) is given as z=z0+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)z = z_0 + f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0).

Key Terms

  • differentiable: having a derivative, said of a function whose domain and co-domain are manifolds
  • differential geometry: the study of geometry using differential calculus
  • slope: also called gradient; slope or gradient of a line describes its steepness
The tangent line (or simply the tangent) to a plane curve at a given point is the straight line that "just touches" the curve at that point. Similarly, the tangent plane to a surface at a given point is the plane that "just touches" the surface at that point. The concept of a tangent is one of the most fundamental notions in differential geometry and has been extensively generalized.
image

Tangent Plane to a Sphere: The tangent plane to a surface at a given point is the plane that "just touches" the surface at that point.

Equations

When the curve is given by y=f(x)y = f(x) the slope of the tangent is dydx\frac{dy}{dx}, so by the point–slope formula the equation of the tangent line at (x0,y0)(x_0, y_0) is: dydx(x0,y0)(xx0)(yy0)\frac{dy}{dx}(x_0,y_0) \cdot (x-x_0) - (y-y_0) where (x,y)(x, y) are the coordinates of any point on the tangent line, and where the derivative is evaluated at x=x0x=x_0. The tangent plane to a surface at a given point pp is defined in an analogous way to the tangent line in the case of curves. It is the best approximation of the surface by a plane at pp, and can be obtained as the limiting position of the planes passing through 3 distinct points on the surface close to pp as these points converge to pp. For a surface given by a differentiable multivariable function z=f(x,y)z=f(x,y), the equation of the tangent plane at (x0,y0,z0)(x_0,y_0,z_0) is given as: fx(x0,y0)(xx0)+fy(x0,y0)(yy0)(zz0)=0f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0) - (z-z_0) = 0 where (x0,y0,z0)(x_0,y_0,z_0) is a point on the surface. Note the similarity of the equations for tangent line and tangent plane.

Linear Approximation

Since a tangent plane is the best approximation of the surface near the point where the two meet, tangent plane can be used to approximate the surface near the point. The approximation works well as long as the point (x,y,z)(x,y,z)  under consideration is close enough to (x0,y0,z0)(x_0,y_0,z_0), where the tangent plane touches the surface. The plane describing the linear approximation for a surface described by z=f(x,y)z=f(x,y) is given as: z=z0+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)z = z_0 + f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0).

The Chain Rule

For a function UU with two variables xx and yy, the chain rule is given as dUdt=Uxdxdt+Uydydt\frac{d U}{dt} = \frac{\partial U}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial U}{\partial y} \cdot \frac{dy}{dt}.

Learning Objectives

Express a chain rule for a function with two variables

Key Takeaways

Key Points

  • The chain rule can be easily generalized to functions with more than two variables.
  • For a single variable functions, the chain rule is a formula for computing the derivative of the composition of two or more functions. For example, the chain rule for fg(x)f[g(x)]f \circ g (x) ≡ f [g (x)] is dfdx=dfdgdgdx\frac {df}{dx} = \frac {df}{dg} \cdot \frac {dg}{dx}.
  • The chain rule can be used when we want to calculate the rate of change of the function U(x,y)U(x,y) as a function of time tt, where x=x(t)x=x(t) and y=y(t)y=y(t).

Key Terms

  • potential energy: the energy possessed by an object because of its position (in a gravitational or electric field), or its condition (as a stretched or compressed spring, as a chemical reactant, or by having rest mass)
The chain rule is a formula for computing the derivative of the composition of two or more functions. That is, if ff is a function and gg is a function, then the chain rule expresses the derivative of the composite function fg(x)f[g(x)]f \circ g (x) ≡ f [g (x)] in terms of the derivatives of ff and gg. For example, the chain rule for fgf \circ g is dfdx=dfdgdgdx\frac {df}{dx} = \frac {df}{dg} \, \frac {dg}{dx}. The chain rule above is for single variable functions f(x)f(x) and g(x)g(x). However, the chain rule can be generalized to functions with multiple variables. For example, consider a function UU with two variables xx and yy: U=U(x,y)U=U(x,y). UU could be electric potential energy at a location (x,y)(x,y). The motion of a test charge on the xyxy-plane can be described by x=x(t)x=x(t), y=y(t)y=y(t) where tt is a parameter representing time tt. What we want to calculate is the rate of change of the potential energy UU as a function of time tt. Assuming x=x(t)x=x(t), y=y(t)y=y(t), and U=U(x,y)U=U(x,y) are all differentiable at tt and (x,y)(x,y), the chain rule is given as: dUdt=Uxdxdt+Uydydt\displaystyle{\frac{d U}{dt} = \frac{\partial U}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial U}{\partial y} \cdot \frac{dy}{dt}} This relation can be easily generalized for functions with more than two variables.
image

Scalar Field: The chain rule can be used to take derivatives of multivariable functions with respect to a parameter.

Example

For z=(x2+xy+y2)1/2z = (x^2 + xy + y^2)^{1/2} where x=x(t)x=x(t) and y=y(t)y=y(t), express dzdt\frac{dz}{dt} in terms of dxdt\frac{dx}{dt}and dydt\frac{dy}{dt}: dzdt=ddt(x2+xy+y2)1/2\displaystyle{\frac{dz}{dt} = \frac{d}{dt}(x^2 +xy+ y^2)^{1/2}}    =12(x2+xy+y2)1/2ddt(x2+xy+y2)\displaystyle{\,\,\,\quad= \frac{1}{2}(x^2 +xy + y^2)^{-1/2}\frac{d}{dt}(x^2 +xy+ y^2)}    =12(x2+xy+y2)1/2(ddt(x2)+ddt(xy)+ddt(y2))\displaystyle{\,\,\,\quad=\frac{1}{2}(x^2 +xy+ y^2)^{-1/2}\left(\frac{d}{dt}(x^2) + \frac{d}{dt}(xy) +\frac{d}{dt}(y^2) \right)}    =(x+12y)dxdt+(y+12x)dydtx2+xy+y2\displaystyle{\,\,\,\quad= \frac{ \left(x+\displaystyle{\frac{1}{2}} y \right)\displaystyle{\frac{dx}{dt}} + \left(y+\frac{1}{2} x \right) \displaystyle{\frac{dy}{dt}}}{\sqrt{x^2 +xy+ y^2}}}

Directional Derivatives and the Gradient Vector

The directional derivative represents the instantaneous rate of change of the function, moving through x\mathbf{x} with a velocity specified by v\mathbf{v}.

Learning Objectives

Describe properties of a function represented by the directional derivative

Key Takeaways

Key Points

  • The directional derivative is defined by the limit vf(x)=limh0f(x+hv)f(x)h\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \lim_{h \rightarrow 0}{\frac{f(\mathbf{x} + h\mathbf{v}) - f(\mathbf{x})}{h}}.
  • If the function ff is differentiable at x\mathbf{x}, then the directional derivative exists along any vector v\mathbf{v}, and one gets vf(x)=f(x)v\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \nabla f(\mathbf{x}) \cdot \mathbf{v}.
  • Many of the familiar properties of the ordinary derivative hold for the directional derivative.

Key Terms

  • chain rule: a formula for computing the derivative of the composition of two or more functions.
  • gradient: of a function y=f(x)y=f(x) or the graph of such a function, the rate of change of yy with respect to xx; that is, the amount by which yy changes for a certain (often unit) change in xx.
The directional derivative of a multivariate differentiable function along a given vector v\mathbf{v} at a given point x\mathbf{x} intuitively represents the instantaneous rate of change of the function, moving through x\mathbf{x} with a velocity specified by v\mathbf{v}. It therefore generalizes the notion of a partial derivative, in which the rate of change is taken along one of the coordinate curves, all other coordinates being constant.

Definition

The directional derivative of a scalar function f(x)=f(x1,x2,,xn)f(\mathbf{x}) = f(x_1, x_2, \ldots, x_n) along a vector v=(v1,,vn)\mathbf{v} = (v_1, \ldots, v_n) is the function defined by the limit: vf(x)=limh0f(x+hv)f(x)h\displaystyle{\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \lim_{h \rightarrow 0}{\frac{f(\mathbf{x} + h\mathbf{v}) - f(\mathbf{x})}{h}}} If the function ff is differentiable at x\mathbf x, then the directional derivative exists along any vector v\mathbf v, and one has vf(x)=f(x)v\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \nabla f(\mathbf{x}) \cdot \mathbf{v}, where the f(x)\nabla f(\mathbf{x}) is the gradient vector and \cdot is the dot product. At any point x\mathbf x, the directional derivative of ff intuitively represents the rate of change of ff with respect to time when it is moving at a speed and direction given by v\mathbf v at the point x\mathbf x. The name "directional derivative" is a bit misleading since it depends on both the length and direction of v\mathbf v. We can imagine the directional derivative vf(x)\nabla_{\mathbf{v}}{f}(\mathbf{x}) as the slope of the tangent line to the 2-dimensional slice of the graph of ff that lies parallel to the vector v\mathbf{v}. However, this slice will be stretched or compressed horizontally unless v=1\mathbf{v}=1.
image

Gradient of a Function: The gradient of the function f(x,y)=((cosx)2+(cosy)2)f(x,y) = −\left((\cos x)^2 + (\cos y)^2\right) depicted as a projected vector field on the bottom plane. Directional derivative represents the rate of change of the function along any direction specified by v\mathbf{v}.

Properties

Many of the familiar properties of the ordinary derivative hold for the directional derivative.

The Sum Rule

v(f+g)=vf+vg\nabla_\mathbf{v} (f + g) = \nabla_\mathbf{v} f + \nabla_\mathbf{v} g

The Constant Factor Rule

For any constant cc, v(cf)=cvf\nabla_\mathbf{v} (cf) = c\nabla_\mathbf{v} f.

The Product Rule (or Leibniz Rule)

v(fg)=gvf+fvg\nabla_\mathbf{v} (fg) = g\nabla_\mathbf{v} f + f\nabla_\mathbf{v} g

The Chain Rule

If gg is differentiable at pp and hh is differentiable at g(p)g(p), then vhg(p)=h(g(p))vg(p)\nabla_\mathbf{v} h\circ g (p) = h'(g(p)) \nabla_\mathbf{v} g (p).

Maximum and Minimum Values

The second partial derivative test is a method used to determine whether a critical point is a local minimum, maximum, or saddle point.

Learning Objectives

Apply the second partial derivative test to determine whether a critical point is a local minimum, maximum, or saddle point

Key Takeaways

Key Points

  • For a function of two variables, the second partial derivative test is based on the sign of M(x,y)=fxx(x,y)fyy(x,y)(fxy(x,y))2M(x,y)= f_{xx}(x,y)f_{yy}(x,y) - \left( f_{xy}(x,y) \right)^2 and fxx(a,b)f_{xx}(a,b), where (a,b)(a,b) is a critical point.
  • There are substantial differences between the functions of one variable and the functions of more than one variable in the identification of global extrema.
  • The maximum and minimum of a function, known collectively as extrema, are the largest and smallest values that the function takes at a point either within a given neighborhood (local or relative extremum) or on the function domain in its entirety (global or absolute extremum).

Key Terms

  • critical point: a maximum, minimum, or point of inflection on a curve; a point at which the derivative of a function is zero or undefined
  • intermediate value theorem: a statement that claims that, for each value between the least upper bound and greatest lower bound of the image of a continuous function, there is a corresponding point in its domain that the function maps to that value
  • Rolle's theorem: a theorem stating that a differentiable function which attains equal values at two distinct points must have a point somewhere between them where the first derivative (the slope of the tangent line to the graph of the function) is zero
The maximum and minimum of a function, known collectively as extrema, are the largest and smallest values that the function takes at a point either within a given neighborhood (local or relative extremum) or on the function domain in its entirety (global or absolute extremum).

Finding Maxima and Minima of Multivariable Functions

The second partial derivative test is a method in multivariable calculus used to determine whether a critical point (a,b,)(a,b, \cdots ) of a function f(x,y,)f(x,y, \cdots ) is a local minimum, maximum, or saddle point.
image

Saddle Point: A saddle point on the graph of z=x2y2z=x^2−y^2 (in red).

For a function of two variables, suppose that M(x,y)=fxx(x,y)fyy(x,y)(fxy(x,y))2M(x,y)= f_{xx}(x,y)f_{yy}(x,y) - \left( f_{xy}(x,y) \right)^2.
  1. If M(a,b)>0M(a,b)>0 and fxx(a,b)>0f_{xx}(a,b)>0, then (a,b)(a,b) is a local minimum of ff.
  2. If M(a,b)>0M(a,b)>0 and fxx(a,b)<0f_{xx}(a,b)<0, then (a,b)(a,b) is a local maximum of ff.
  3. If M(a,b)<0M(a,b)<0, then (a,b)(a,b) is a saddle point of ff.
  4. If M(a,b)=0M(a,b)=0, then the second derivative test is inconclusive.
There are substantial differences between functions of one variable and functions of more than one variable in the identification of global extrema. For example, if a bounded differentiable function ff defined on a closed interval in the real line has a single critical point, which is a local minimum, then it is also a global minimum (use the intermediate value theorem and Rolle's theorem). In two and more dimensions, this argument fails, as the function f(x,y)=x2+y2(1x)3,  x,yRf(x,y)= x^2+y^2(1-x)^3,\,\, x,y\in\mathbb{R} shows. Its only critical point is at (0,0)(0,0), which is a local minimum with f(0,0)=0f(0,0) = 0. However, it cannot be a global one, because f(4,1)=11f(4,1) = 11.

Lagrange Multiplers

The method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints.

Learning Objectives

Describe application of the method of Lagrange multipliers

Key Takeaways

Key Points

  • To maximize f(x,y)f(x,y) subject to g(x,y)=cg(x,y)=c, we introduce a new variable λ\lambda, called a Lagrange multiplier, and study the Lagrange function (or Lagrangian) defined by Λ(x,y,λ)=f(x,y)+λ(g(x,y)c)\Lambda(x,y,\lambda) = f(x,y) + \lambda \cdot \Big(g(x,y)-c\Big).
  • When the contour line for g=cg = c meets the contour lines of ff tangentially do we not increase or decrease the value of ff — that is, when the contour lines touch but do not cross. This will often be the situation where a solution to the constrained maximum problem above exists.
  • Solve x,y,λΛ(x,y,λ)=0\nabla_{x,y,\lambda} \Lambda(x, y, \lambda)=0, and we find a necessary condition for extrema under the given constraint.

Key Terms

  • gradient: of a function y=f(x)y = f(x) or the graph of such a function, the rate of change of yy with respect to xx; that is, the amount by which yy changes for a certain (often unit) change in xx
  • contour: a line on a map or chart delineating those points which have the same altitude or other plotted quantity: a contour line or isopleth
In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) is a strategy for finding the local maxima and minima of a function subject to equality constraints. For instance, consider the following optimization problem: Maximize f(x,y)f(x,y) subject to g(x,y)=cg(x,y)=c. We need both ff and gg to have continuous first partial derivatives. We introduce a new variable (λ\lambda) called a Lagrange multiplier, and study the Lagrange function (or Lagrangian) defined by: Λ(x,y,λ)=f(x,y)+λ(g(x,y)c)\Lambda(x,y,\lambda) = f(x,y) + \lambda \cdot \left(g(x,y)-c\right) where the λ\lambda term may be either added or subtracted. If f(x0,y0)f(x_0, y_0) is a maximum of f(x,y)f(x,y) for the original constrained problem, then there exists λ0\lambda_0 such that (x0,y0,λ0)(x_0,y_0,\lambda_0) is a stationary point for the Lagrange function (stationary points are those points where the partial derivatives of Λ\Lambda are zero, i.e., Λ=0\nabla\Lambda = 0). However, not all stationary points yield a solution of the original problem. Thus, the method of Lagrange multipliers yields a necessary condition for optimality in constrained problems. Sufficient conditions for a minimum or maximum also exist.
image

Maximizing f(x,y): Find x and y to maximize f(x,y) subject to a constraint (shown in red) g(x,y)=c.

Introduction

One of the most common problems in calculus is that of finding maxima or minima (in general, "extrema") of a function, but it is often difficult to find a closed form for the function being extremized. Such difficulties often arise when one wishes to maximize or minimize a function subject to fixed outside conditions or constraints. The method of Lagrange multipliers is a powerful tool for solving this class of problems without the need to explicitly solve the conditions and use them to eliminate extra variables. Consider the two-dimensional problem introduced above. Maximize f(x,y)f(x,y) subject to g(x,y)=cg(x,y)=c. We can visualize contours of ff given by f(x,y)=df(x, y)=d for various values of dd, and the contour of gg given by g(x,y)=cg (x, y) = c. Suppose we walk along the contour line with g=cg = c. In general, the contour lines of ff and gg may be distinct, so following the contour line for g=cg = c, one could intersect with or cross the contour lines of ff. This is equivalent to saying that while moving along the contour line for g=cg = c, the value of ff can vary. When the contour line for g=cg = c meets contour lines of ff tangentially we neither increase nor decrease the value of ff—that is, when the contour lines touch but do not cross. The contour lines of ff and gg touch when the tangent vectors of the contour lines are parallel. Since the gradient of a function is perpendicular to the contour lines, this is the same as saying that the gradients of ff and gg are parallel. Thus, we want points: (x,y)[/latex] where [latex]g(x,y)=c(x,y)[/latex] where [latex]g(x,y)=c and x,yf=λx,yg\nabla_{x,y} f = - \lambda \nabla_{x,y} g, where x,yf=(fx,fy)\nabla_{x,y} f= \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) and x,yg=(gx,gy)\nabla_{x,y} g= \left( \frac{\partial g}{\partial x}, \frac{\partial g}{\partial y} \right) are the respective gradients. The constant is required because, although the two gradient vectors are parallel, the magnitudes of the gradient vectors are generally not equal. Note that λ0\lambda \neq 0; otherwise we cannot assert the two gradients are parallel. To incorporate these conditions into one equation, we introduce an auxiliary function, Λ(x,y,λ)=f(x,y)+λ(g(x,y)c)\Lambda(x,y,\lambda) = f(x,y) + \lambda \cdot \Big(g(x,y)-c\Big), and solve x,y,λΛ(x,y,λ)=0\nabla_{x,y,\lambda} \Lambda(x, y, \lambda)=0. This is the method of Lagrange multipliers. Note that λΛ(x,y,λ)=0\nabla_{\lambda} \Lambda(x, y, \lambda)=0 implies g(x,y)=cg(x,y)=c. Where the Lagrange multiplier λ=0\lambda=0 we can have a local extremum and the two contours cross instead of meeting tangentially. Consider the following example. Minimize f(x,y)=sin(x)f(x,y) = \sin(x), given that g(x,y)=x2+y2=9g(x,y) = x^2 + y^2=9. Every point (π2,y)\left(\frac{-\pi}{2}, y\right)f=1f=-1 is a global minimum of ff with value 1-1. Therefore where the constraint g=cg=c crosses the contour line f=1f=-1, is a local minimum of ff on the constraint. The trace and the contour f=1f=-1 cross at the minimum as we can see in the figure. It is easy to verify that fx=0f_x=0  and fy=0f_y=0 when x=π2x = \frac{\pi}{2}. Since both gx0g_x \neq 0 and gy0g_y \neq 0, the Lagrange multiplier λ=0\lambda = 0 at the minimum.
image

Example where the contour and constraint cross at an extremum.

Optimization in Several Variables

To solve an optimization problem, formulate the function f(x,y,)f(x,y, \cdots ) to be optimized and find all critical points first.

Learning Objectives

Solve a simple problem that requires optimization of several variables

Key Takeaways

Key Points

  • Mathematical optimization is the selection of a best element (with regard to some criteria) from some set of available alternatives.
  • An optimization process that involves only a single variable is rather straightforward. After finding out the function f(x)f(x) to be optimized, local maxima or minima at critical points can easily be found. End points may have maximum/minimum values as well.
  • For a rectangular cuboid shape, given the fixed volume, a cube is the geometric shape that minimizes the surface area.

Key Terms

  • optimization: the design and operation of a system or process to make it as good as possible in some defined sense
  • cuboid: a parallelepiped having six rectangular faces
Mathematical optimization is the selection of a best element (with regard to some criteria) from some set of available alternatives. An optimization process that involves only a single variable is rather straightforward. After finding out the function f(x)f(x) to be optimized, local maxima or minima at critical points can be easily found. (Of course, end points may have maximum/minimum values as well.) The same strategy applies for optimization with several variables. In this atom, we will solve a simple example to see how optimization involving several variables can be achieved.

Cardboard Box with a Fixed Volume

A packaging company needs cardboard boxes in rectangular cuboid shape with a given volume of 1000 cubic centimeters and would like to minimize the material cost for the boxes. What should be the dimensions xx, yy, zz of a box? First of all, the material cost would be proportional to the surface area SS of the cuboid. Therefore, the goal of the optimization is to minimize a function S(x,y,z)=2(xy+yz+zx)S(x,y,z) = 2(xy + yz+zx). The constraint in the case is that the volume is fixed: V=xyz=1000V = xyz = 1000.
image

Rectangular Cuboid: Mathematical optimization can be used to solve problems that involve finding the right size of a volume such as a cuboid.

We will first remove zz from S(x,y,z)S(x,y,z). We can do this by using the constraint z=1000xyz = \frac{1000}{xy}. Inserting the expression for zz in S(x,y,z)S(x,y,z) yields: S(x,y,z)=2(xy+1000x+1000y)\displaystyle{S(x,y,z) = 2\left(xy + \frac{1000}{x} + \frac{1000}{y}\right)} To find the critical points: Sx=2(y1000x2)=0y=1000x2\displaystyle{\frac{\partial S}{\partial x} = 2 \left(y - \frac{1000}{x^2} \right) = 0\\ \therefore y = \frac{1000}{x^2}} and Sy=2(x1000y2)=0x=1000y2\displaystyle{\frac{\partial S}{\partial y} = 2\left(x - \frac{1000}{y^2}\right) = 0\\ \therefore x = \frac{1000}{y^2}} Then, substituting in the expression found equal to yy above yields: x3=1000x^3 = 1000 Therefore, we find that: x=y=z=10x=y=z=10 That is to say, the box that minimizes the cost of materials while maintaining the desired volume should be a 10-by-10-by-10 cube.

Applications of Minima and Maxima in Functions of Two Variables

Finding extrema can be a challenge with regard to multivariable functions, requiring careful calculation.

Learning Objectives

Identify steps necessary to find the minimum and maximum in multivariable functions

Key Takeaways

Key Points

  • The second derivative test is a criterion for determining whether a given critical point of a real function of one variable is a local maximum or a local minimum using the value of the second derivative at the point.
  • To find minima/maxima for functions with two variables, we must first find the first partial derivatives with respect to xx and yy of the function.
  • The function z=f(x,y)=(x+y)(xy+xy2)z = f(x, y) = (x+y)(xy + xy^2) has saddle points at (0,1)(0,-1) and (1,1)(1,-1) and a local maximum at (38,3.4)\left(\frac{3}{8}, -3.4\right).

Key Terms

  • multivariable: concerning more than one variable
  • critical point: a maximum, minimum, or point of inflection on a curve; a point at which the derivative of a function is zero or undefined
We have learned how to find the minimum and maximum in multivariable functions. As previously mentioned, finding extrema can be a challenge with regard to multivariable functions. In particular, we learned about the second derivative test, which is a criterion for determining whether a given critical point of a real function of one variable is a local maximum or a local minimum, using the value of the second derivative at the point. In this atom, we will find extrema for a function with two variables.

Example

Find and label the critical points of the following function: z=f(x,y)=(x+y)(xy+xy2)z = f(x, y) = (x+y)(xy + xy^2)
 
Plot of z=(x+y)(xy+xy2)z = (x+y)(xy+xy^2): The maxima and minima of this plot cannot be found without extensive calculation.
To solve this problem we must first find the first partial derivatives of the function with respect to xx and yy: zx=y(2x+y)(y+1)\displaystyle{\frac{\partial z}{\partial x} = y(2x +y)(y+1)} zy=x(3y2+2y(x+1)+x)\displaystyle{\frac{\partial z}{\partial y} = x \left( 3y^2 +2y(x+1) + x \right)} Looking at zx=0\frac{\partial z}{\partial x} = 0, we see that yy must equal 00, 1-1, or 2x-2x. We plug the first solution, y=0y=0, into the next equation, and get: zy=x(3y2+2y(x+1)+x)=x2\displaystyle{\frac{\partial z}{\partial y} = x \left( 3y^2 +2y(x+1) + x \right)\\ \,\quad = x^2} There were other possibilities for yy, so for y=1y=-1 we have: zy=x(32(x+1)+x)=x(1x)=0\displaystyle{\frac{\partial z}{\partial y} = x \left( 3 -2(x+1) + x \right) \\ \,\quad= x(1-x)\\ \,\quad= 0} So xx must be equal to 11 or 00. Finally, for y=2xy=-2x: zy=x(3(2x)2+2(2x)(x+1)+x)=x2(8x3)=0\displaystyle{\frac{\partial z}{\partial y} = x \left( 3(-2x)^2 +2(-2x)(x+1) + x \right) \\ \,\quad= x^2(8x-3) \\ \,\quad= 0} So xx must equal 00 or  for y=0y = 0 and y=34y = -\frac{3}{4}, respectively. Let's list all the critical values now: (x,y)(0,0),(0,1),(1,1),(38,34)\displaystyle{(x,y) \in {(0,0), (0, -1), (1,-1), \left(\frac{3}{8}, -\frac{3}{4}\right)}} Now we have to label the critical values using the second derivative test. Plugging in all the different critical values we found to label them, we have:
  • D(0,0)=0D(0, 0) = 0
  • D(0,1)=1D(0, -1) = -1
  • D(1,1)=1D(1, -1) = -1
  • D(38,34)=0.210938D\left(\frac{3}{8}, -\frac{3}{4}\right) = 0.210938
We can now label some of the points:
  • at (0, −1), f(x,y)f(x, y) has a saddle point
  • at (1, −1), f(x,y)f(x, y) has a saddle point
  • at (38,34)\left(\frac{3}{8}, -\frac{3}{4}\right) f(x,y)f(x, y) has a local maximum, since fxx=38<0f_{xx} = -\frac{3}{8} < 0
At the remaining point we need higher-order tests to find out what exactly the function is doing.

Licenses & Attributions