Derivative rules
(upd. )

The “secant” of a function, i.e. the formula for slope between two given points on the function, is

yx=y2y1x2x1yx=f(x+h)f(x)h\begin{aligned} \frac{\triangle y}{\triangle x} &= \frac{y_2 - y_1}{x_2 - x_1} \\ \frac{\triangle y}{\triangle x} &= \frac{f(x + h) - f(x)}{h} \end{aligned}
secant

If we make the difference between the x-values of the points really, really small, lowering hh (the distance between x2x_2 and x1x_1) towards an infinitesimal (almost 00), we can find the function’s derivative — this tells us the instant slope at any given point of the function.

ddxf(x)=limh0f(x+h)f(x)h\frac{d}{dx}f(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}
the secant at an instant point is the 'tangent' line

This “difference formula” is the first half of calculus.

constant rule

Let’s skip the formula and go right to the picture.

If f(x)f(x) is equal to a constant CC, such as 3, it’s obvious that the formula for instant slope at every point of f(x)f(x) would just be f(x)=0f'(x) = 0.

instant slope of a constant

Hence,

ddxC=0\begin{align} \frac{d}{dx} C &= 0 \end{align}

coefficient rule

We love pictures here.

For a linear function y=mx+by=mx+b, it’s obvious that the formula for instant slope at every point would just be mm. By definition, that’s the slope!

the instant slope of a linear function stays constant

Hence,

ddxmx=m\begin{align} \frac{d}{dx} mx &= m \end{align}

sum and difference rule

When we take a derivative of two added functions, it’s the same as adding their derivatives.

ddx(f(x)+g(x))=ddxf(x)+ddxg(x)\begin{align} \frac{d}{dx} (f(x) + g(x)) &= \frac{d}{dx} f(x) + \frac{d}{dx} g(x) \end{align}

This is just splitting the difference formula’s fraction, like a+bx=ax+bx\frac{a+b}{x} = \frac{a}{x} + \frac{b}{x}

ddxf(x)=limh0f(x+h)f(x)hddx(f(x)+g(x))=limh0f(x+h)+g(x+h)f(x)g(x)hddx(f(x)+g(x))=limh0f(x+h)f(x)h+limh0g(x+h)g(x)hddx(f(x)+g(x))=ddxf(x)+ddxg(x)\begin{aligned} \frac{d}{dx} f(x) &= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \\ \frac{d}{dx} (f(x) + g(x)) &= \lim_{h \to 0} \frac{f(x+h) + g(x+h) - f(x) - g(x)}{h} \\ \frac{d}{dx} (f(x) + g(x)) &= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} + \lim_{h \to 0} \frac{g(x+h) - g(x)}{h} \\ \frac{d}{dx} (f(x) + g(x)) &= \frac{d}{dx} f(x) + \frac{d}{dx} g(x) \end{aligned}

And of course, all this jazz applies to subtraction too — subtraction is just negative addition.

ddx(f(x)+g(x))=ddxf(x)+ddxg(x)\begin{aligned} \frac{d}{dx} (f(x) + - g(x)) &= \frac{d}{dx} f(x) + - \frac{d}{dx} g(x) \end{aligned}

product rule

This is where things start to get less trivial.

Before algebra, let’s visualize what y=f(x)g(x)y=f(x)g(x) would look like.3b1b

a square with sides f(x) and g(x), whose area is y

Let’s make a rectangle whose area is

y=f(x)g(x)y=f(x)g(x)

Now… let’s extend each side to find the next instant point.

  • Imagine that the difference between f(x)f(x) and f(x+h)f(x+h) is dfdf
  • Imagine that the difference between g(x)g(x) and g(x+h)g(x+h) is dgdg
  • the square's sides grow to f(x+h) and g(x+h)

    The tiny differential bits that we added, i.e. df,df, dg,dg, and the tiny×tiny\text{tiny}\times\text{tiny} negligible-area nub on the bottom right, now expand our area a Tiny Bit… by dy.dy.

    dy=f(x)dg+g(x)df+dfdgdydx=f(x)dgdx+g(x)dfdx\begin{align} dy &= f(x)dg + g(x)df + dfdg \notag \\ \frac{dy}{dx} &= f(x)\frac{dg}{dx} + g(x)\frac{df}{dx} \end{align}

    Cool, right? Of course, we could’ve just done algebra from the start.

    ddxf(x)=limh0f(x+h)f(x)hddxf(x)g(x)=limh0f(x+h)g(x+h)f(x)g(x)hddxf(x)g(x)=limh0f(x+h)g(x+h)f(x+h)(g(x))+f(x+h)(g(x))f(x)g(x)hddxf(x)g(x)=limh0f(x+h)(g(x+h)g(x))+g(x)(f(x+h)f(x))hddxf(x)g(x)=f(x)g(x)+g(x)f(x)ddxf(x)g(x)=fg+fg\begin{align} \frac{d}{dx} f(x) &= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \notag \\ \frac{d}{dx} f(x)g(x) &= \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x)g(x)}{h} \notag \\ \frac{d}{dx} f(x)g(x) &= \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x+h)(g(x)) + f(x+h)(g(x)) - f(x)g(x)}{h} \notag \\ \frac{d}{dx} f(x)g(x) &= \lim_{h \to 0} \frac{f(x+h)(g(x+h) - g(x)) + g(x)(f(x+h) - f(x))}{h} \notag \\ \frac{d}{dx} f(x)g(x) &= f(x)g'(x) + g(x)f'(x) \notag \\ \frac{d}{dx} f(x)g(x) &= f'g + fg' \\ \end{align}

    power rule

    The power rule is just a special case of the product rule*. Don’t believe me? Let’s knock your socks off…

    *if we look at integer powers only
    power rule as n-dimensional cubes of length x

    When we apply product rule logic to an x2x^2 square, growing each side by dx,dx, the (dx)2(dx)^2 corner becomes a negligible tiny×tiny.\text{tiny} \times \text{tiny}.

    When we generalize that to an x3x^3 cube, we get three x2×dxx^2 \times dx growths from the faces; the edge growths x(dx)2x(dx)^2 and the corner growth (dx)3(dx)^3 become negligible due to tiny×tiny.\text{tiny} \times \text{tiny}.

    y=x2y+dy=(x+dx)2dy=x(dx)+(dx)x+(dx)2Foolish square.dydx=2xy=x3y+dy=(x+dx)3dy=xx(dx)+x(dx)x+(dx)xx+3x(dx)2+(dx)3dydx=3x2GET CUBED.\begin{align} y &= x^2 \notag \\ y + dy &= (x+dx)^2 \notag \\ dy &= x(dx) + (dx)x + (dx)^2 \quad \text{Foolish square.} \notag \\ \frac{dy}{dx} &= 2x \notag \\ \notag \\ y &= x^3 \notag \\ y + dy &= (x+dx)^3 \notag \\ dy &= xx(dx) + x(dx)x + (dx)xx + 3x(dx)^2 + (dx)^3 \notag \\ \frac{dy}{dx} &= 3x^2 \quad \text{GET CUBED.} \notag \end{align}

    So our pattern is, for any multi-dimensional square or cube or hypercube:

    • Extrude every side by dxdx
      • e.g. Cube: xxxxxx becomes (x+dx)(x+dx)(x+dx)(x+dx)(x+dx)(x+dx)
    • Multiply this dxdx by every other side (lengths of xx) to find the added area/volume/hypervolume of that extrusion
      • e.g. Faces: (dx)xx+x(dx)x+xx(dx)=3x2dx(dx)xx + x(dx)x + xx(dx) = 3x^2dx
    • The growths on the extrusions’ outskirts, like the corner growth in a square, or the edges and corners of a cube, have multiple dxdx‘s and are thus negligible
      • e.g. (dx)(dx)x+(dx)x(dx)+x(dx)(dx)+(dx)(dx)(dx)(dx)(dx)x + (dx)x(dx) + x(dx)(dx) + (dx)(dx)(dx)
    ddxxa=axa1\begin{align} \frac{d}{dx} x^a &= ax^{a-1} \end{align}

    The algebra isn’t as satisfying, because it requires the binomial coefficient. Why? Let’s look at what’s going on when you power up, first.

    (x+h)5=(x+h)(x+h)(x+h)(x+h)(x+h)(x+h)5=xxxxx+xxxxh+xxxhx+xxhxx+xhxxx+hxxxx...+xxxhh+xxhxh+xxhhx+xhxhx+xhhxx...+hhhhx+hhhxh+hhxhh+hxhhh+xhhhh+hhhhh\begin{aligned} (x+h)^5 &= (x+h) * (x+h) * (x+h) * (x+h) * (x+h) \\ (x+h)^5 &= xxxxx + xxxxh + xxxhx + xxhxx + xhxxx + hxxxx ... \\ & \quad + xxxhh + xxhxh + xxhhx + xhxhx + xhhxx ... \\ & \quad + hhhhx + hhhxh + hhxhh + hxhhh + xhhhh + hhhhh \end{aligned}

    So, for every amount of x’s, e.g. 3/5 variables are x or 2/5 variables are h, we also have every possible unique position of this amount of x’es.

    If we want to pick out 3/5 objects from a Big Ordered Set, where the Small Set order doesn’t matter, we use the binomial coefficient… which I’ll cover in a later article ¯\_(⌐■_■)_/¯

    f(x)=limh0f(x+h)f(x)hddxxa=limh0(x+h)axahddxxa=limh0(a0)x0ha+(a1)x1ha1+...+(aa)xah0xahDivide out the h.ddxxa=limh0(a0)x0ha1+...+(aa1)xa1h0+(aa)xah1xahNumerators with h remaining are pretty much 0, negligibleddxxa=limh0(aa1)xa1h0+(aa)xah1xahddxxa=limh0axa1h0+1xah1xahddxxa=limh0axa1+(xah1xah1)ddxxa=axa1\begin{align} f'(x) &= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \notag \\ \frac{d}{dx} x^a &= \lim_{h \to 0} \frac{(x+h)^a - x^a}{h} \notag \\ \frac{d}{dx} x^a &= \lim_{h \to 0} \frac{{a\choose 0}x^0 h^a + {a\choose 1}x^1 h^{a-1} + ... + {a\choose a}x^a h^0 - x^a}{h} \notag \\ & \quad \text{Divide out the h.} \notag \\ \frac{d}{dx} x^a &= \lim_{h \to 0} {a\choose 0}x^0 h^{a-1} + ... + {a\choose a-1}x^{a-1} h^{0} + {a\choose a}x^a h^{-1} - \frac{x^a}{h} \notag \\ & \quad \text{Numerators with h remaining are pretty much 0, negligible} \notag \\ \frac{d}{dx} x^a &= \lim_{h \to 0} {a\choose a-1}x^{a-1} h^{0} + {a\choose a}x^a h^{-1} - \frac{x^a}{h} \notag \\ \frac{d}{dx} x^a &= \lim_{h \to 0} a \bullet x^{a-1} h^{0} + 1 \bullet x^a h^{-1} - \frac{x^a}{h} \notag \\ \frac{d}{dx} x^a &= \lim_{h \to 0} a \bullet x^{a-1} + (x^a h^{-1} - x^a h^{-1}) \notag \\ \frac{d}{dx} x^a &= ax^{a-1} \\ \end{align}

    But what about non-integers?

    The chain rule (discussed later) generalizesMath SE the power rule to the rationals:

    Let p=an integerddxxp/q=ddx(xp)1/q=1q(xp)1/q1ddx(xp)=pq(xp)1/q1xp1=pqxp/qpxp1ddxxp/q=pqxp/q1\begin{aligned} \text{Let } p &= \text{an integer} \\ \frac{d}{dx} x^{p/q} &= \frac{d}{dx} (x^p)^{1/q} \\ &= \frac{1}{q} (x^p) ^{1/q-1} \frac{d}{dx} (x^p) \\ &= \frac{p}{q} (x^p)^{1/q-1}x^{p-1} \\ &= \frac{p}{q} x^{p/q - p}x^{p-1} \\ \frac{d}{dx} x^{p/q} &= \frac{p}{q} x^{p/q-1} \end{aligned}

    and the reals:

    Let r=a real numberddxxr=ddxerlnx=erlnx(rddxlnx)=erlnx×r×1x=xr×rx=rxr1\begin{aligned} \text{Let }r &= \text{a real number} \\ \frac{d}{dx} x^r &= \frac{d}{dx} e^{r\ln x} \\ &= e^{r \ln x} (r \frac{d}{dx} \ln x) \\ &= e^{r \ln x} \times r \times \frac{1}{x} \\ &= x^r \times \frac{r}{x} \\ &= rx^{r-1} \end{aligned}

    quotient rule

    Just reverse that product rule diagram from earlier. In ddxf(x)g(x),\frac{d}{dx}\frac{f(x)}{g(x)}, the area is f(x),f(x), and the sides are g(x)g(x) and the output y.y.

    a square with sides g(x) and f(x)/g(x), whose area is f(x). the square's area changes from f(x) to f(x+h)

    Remember our definitions:

    • df=f(x+h)f(x)df = f(x+h) - f(x)
    • dg=g(x+h)g(x)dg = g(x+h) - g(x)
    • dy=dy= the “resultant nudge” we hope to find (with respect to the nudge dxdx)

    g(x)dy+f(x)g(x)dg+tiny2=dfg(x)dy=dff(x)g(x)dgdy=dff(x)g(x)dgg(x)dy=(df)(g(x))(f(x))(dg)g(x)2dydx=dfdxg(x)f(x)dgdxg(x)2ddxf(x)g(x)=f(x)g(x)f(x)g(x)g(x)2ddxf(x)g(x)=fgfgg2\begin{align} g(x)dy + \frac{f(x)}{g(x)}dg + \text{tiny}^2 &= df \notag \\ g(x)dy &= df - \frac{f(x)}{g(x)}dg \notag \\ dy &= \frac{df - \frac{f(x)}{g(x)}dg}{g(x)} \notag \\ dy &= \frac{(df)(g(x)) - (f(x))(dg)}{g(x)^2} \notag \\ \frac{dy}{dx} &= \frac{\frac{df}{dx}g(x) - f(x)\frac{dg}{dx}}{g(x)^2} \notag \\ \frac{d}{dx} \frac{f(x)}{g(x)} &= \frac{f'(x)g(x) - f(x)g'(x)}{g(x)^2} \notag \\ \frac{d}{dx} \frac{f(x)}{g(x)} &= \frac{f'g - fg'}{g^2} \end{align}

    chain rule

    Let’s say we want to find dydx\frac{dy}{dx} of the function composition f(g(x)).f(g(x)). To do this, we’ll have to remember that a derivative is just

    yy1xx1=resultant nudge in output dynudge in input dx\frac{y-y_1}{x-x_1} = \frac{ \text{resultant nudge in output } dy }{ \text{nudge in input } dx }

    at very tiny intervals of dx.dx.

    the chain rule as two number lines. the output change in the first number line is the input change of the other

    A nudge in xx will result in a nudge in y1=g(x)y_1=g(x). We’ll call this resultant nudge dy1.dy_1.

    dy1dx1=y1x1\frac{dy_1}{dx_1}=\frac{\triangle y_1}{\triangle x_1}

    Since g(x)g(x) is the “input” of ff, a nudge in y1y_1 will result in a nudge in y2=f(g(x))y_2=f(g(x)). We’ll call this resultant nudge dy2dy_2.

    dy2dy1=y2y1\frac{dy_2}{dy_1}=\frac{\triangle y_2}{\triangle y_1}

    Because of these definitions, it is 100% sound to do the following to relate y2y_2 to x1x_1:

    y2y1×y1x1=y2x1\frac{\triangle y_2}{\triangle y1} \times \frac{\triangle y_1}{\triangle x_1} = \frac{\triangle y_2}{\triangle x_1}

    That above equality is often represented as dydu×dudx=dydx,\frac{dy}{du} \times \frac{du}{dx} = \frac{dy}{dx}, which means the same thing.3b1b

    ddxf(g(x))=f(g(x))×g(x)\begin{align} \frac{d}{dx} f(g(x)) &= f'(g(x)) \times g'(x) \end{align}
    A note on implicit differentiation

    Implicit differentiation, an application of the chain rule, is when we differentiate some variable (like yy or zz) with respect to another (like xx). Here’s an example:

    ddx(x2+y2)=ddx12x+2ydydx=0\begin{aligned} \frac{d}{dx} (x^2 + y^2) &= \frac{d}{dx} 1 \\ 2x + 2y \frac{dy}{dx} &= 0 \end{aligned}

    Why does this happen? Well, if we differentiate yy with respect to x,x, we imply* that yy is a “function” of x,x, in the sense that there must be a “way” to map xx to yy on a graph (like y=±xy=\pm\sqrt{x} or y=lnxy=\ln x).

    (* not actually why it’s called this ;_; but it should be)

    Consider y=f(x).y = f(x).

    The equation earlier would now look like

    ddx(x2+f(x)2)=ddx12x+2f(x)dfdx=0\begin{aligned} \frac{d}{dx} (x^2 + f(x)^2) &= \frac{d}{dx} 1 \\ 2x + 2f(x) \frac{df}{dx} &= 0 \end{aligned}

    Think about it with our previous chain rule logic. Since yy is a “function” of x,x, when we shift the input x,x, by dx,dx, we will cause a resultant shift by the output y,y, by dy.dy. Furthermore, this shift in the new input y,y, by dy,dy, will cause a shift in the next output y2.y^2. It’s the same exact idea as differentiating g(f(x)).g(f(x)).

    ddx(x2+g(f(x)))=ddx12x+dgdfdfdx=0\begin{aligned} \frac{d}{dx} (x^2 + g(f(x))) &= \frac{d}{dx} 1 \\ 2x + \frac{dg}{df} \frac{df}{dx} &= 0 \end{aligned}

    Technically, implicit differentiation is applied on every variable-that-is-a-function in an equation.

    ddxy=ddx3x1×dydx=3×dxdx\begin{aligned} \frac{d}{dx} y &= \frac{d}{dx} 3x \\ 1 \times \frac{dy}{dx} &= 3 \times \frac{dx}{dx} \end{aligned}

    inverse functions

    This is just an application of the chain rule on the definition of an inverse function.

    Let’s say that f(x)=yf(x) = y. For any input xx, we can get an output f(x)=yf(x)=y.
    Let’s then say that function gg was the inverse of function ff. For any output yy, we can find the input g(y)=xg(y)=x.

    Formally, f(g(x))=xf(g(x)) = x.

    Just differentiate that!

    f(g(x))=xddxf(g(x))=ddxxf(g(x))g(x)=1g(x)=1f(g(x))\begin{align} f(g(x)) &= x \notag \\ \frac{d}{dx} f(g(x)) &= \frac{d}{dx} x \notag \\ f'(g(x))g'(x) &= 1 \notag \\ g'(x) &= \frac{1}{f'(g(x))} \end{align}

    trig derivatives

    There’s a very useful pattern in the sine and cosine derivatives.

    f(x)=sinxf(x)=cosxf(x)=sinxf(x)=cosxf(x)=sinx\begin{aligned} f(x) &= \sin x \\ f'(x) &= \cos x \\ f''(x) &= -\sin x \\ f'''(x) &= -\cos x \\ f''''(x) &= \sin x \end{aligned}

    This is because differentiating sine and cosine causes a “phase shift,” a shift in the entire form of the graph, towards the left. Let’s look at this below:

    sin, cos, -sin, and -cos. the peaks, troughs, and the steep midpoints of each function lead to the values of the function below them

    Observe f(x)=sinx.f(x)=\sin x.

    Instant slope at the highest and lowest points of sinx\sin x is 0.

    Instant slope at the spot halfway between any of these “zero-slope” points is when rate of change gets the fastest, where dydx\frac{dy}{dx} reaches the peak of ±1\pm 1.

    Between said points, we can also discern how the rate of change itself is changing. For example, in sinx,\sin x, when we go from dydx=0\frac{dy}{dx}=0 to dydx=1\frac{dy}{dx}=-1, the rate of lowering gets faster and faster.

    With these ideas, we can sketch out a graph for ddxsinx.\frac{d}{dx}\sin x. And every time we differentiate, we just shift left…

    Now that we know this, we can differentiate every other angle formula in terms of sinx\sin x and cosx.\cos x. For example, using the quotient rule, we can find tanx\tan x as such:

    ddxtanx=ddxsinxcosx=sinxcosxsinxcosxcosxcosx=cosxcosxsinx(sinx)cosxcosx=(cosx)2+(sinx)2(cosx)2Pyth. Identity=1(cosx)2ddxtanx=(secx)2\begin{align} \frac{d}{d x} \tan x &= \frac{d}{d x} \frac{\sin x}{\cos x} \notag \\ &= \frac{\sin' x \cos x - \sin x \cos' x}{\cos x\cos x} \notag \\ &= \frac{\cos x \cos x - \sin x (-\sin x)}{\cos x\cos x} \notag \\ &= \frac{(\cos x)^2 + (\sin x)^2}{(\cos x)^2} \quad \text{Pyth. Identity} \notag \\ &= \frac{1}{(\cos x)^2} \notag \\ \frac{d}{d x} \tan x &= (\sec x)^2 \end{align}

    If you’re in a rush and forgot the other angle formulas, you can do the same quotient rule, too! The quotient rule is the straightest path to solving every single one.

    ddxsecx=ddx1cosx=1cosx1cosxcosxcosx=0cosx1(sinx)cosxcosx=sinxcosxcosxddxsecx=secxtanx\begin{align} \frac{d}{d x} \sec x &= \frac{d}{d x} \frac{1}{\cos x} \notag \\ &= \frac{1'\cos x - 1 \cos' x}{\cos x \cos x} \notag \\ &= \frac{0 \cos x - 1 (-\sin x)}{\cos x \cos x} \notag \\ &= \frac{\sin x}{\cos x \cos x} \notag \\ \frac{d}{d x} \sec x &= \sec x \tan x \\ \end{align}

    To prank yourself, try saying secxtanxsecxtanx out loud, as if it were a word. It might help you remember it…

    ddxcscx=ddx1sinx=1sinx1sinxsinxsinx=0sinx1(cosx)sinxsinx=cosxsinxsinxddxcscx=cscxcotxddxcotx=ddxcosxsinx=cosxsinxcosxsinxsinxsinx=(sinx)sinxcosxcosxsinxsinx=((sinx)2+(cosx)2)(sinx)2Pyth. Identity=1(sinx)2ddxcotx=(cscx)2\begin{align} \frac{d}{d x} \csc x &= \frac{d}{d x} \frac{1}{\sin x} \notag \\ &= \frac{1'\sin x - 1 \sin' x}{\sin x \sin x} \notag \\ &= \frac{0\sin x - 1 (\cos x)}{\sin x \sin x} \notag \\ &= \frac{-\cos x}{\sin x \sin x} \notag \\ \frac{d}{d x} \csc x &= -\csc x \cot x \\ \notag \\ \frac{d}{d x} \cot x &= \frac{d}{d x} \frac{\cos x}{\sin x} \notag \\ &= \frac{\cos' x \sin x - \cos x \sin' x}{\sin x \sin x} \notag \\ &= \frac{(-\sin x) \sin x - \cos x \cos x}{\sin x \sin x} \notag \\ &= \frac{- ((\sin x)^2 + (\cos x)^2)}{(\sin x) ^2} \quad \text{Pyth. Identity} \notag \\ &= \frac{- 1}{(\sin x) ^2} \notag \\ \frac{d}{d x} \cot x &= - (\csc x)^2 \end{align}

    A lot of you might be unsatisfied with the proof of the sine, cosine, negative sine, negative cosine chain.

    Just because it looks like it’s true, doesn’t mean it’s 100% true.

    That’s right. It’s a convenient (but artistic) leap to differentiate the peaks, troughs, and quickest velocities of sinx,\sin x, and to furthermore eyeball every other point on sinx\sin x to guess that dydx=cosx.\frac{dy}{dx}=\cos x.

    For those who reject this handwavy intuition, let’s look at a slightly more huge diagram.

    sinx\sin x: a difference formula diagram
    within a unit circle, a triangle of angle theta and a triangle of angle slightly-bigger-than-theta.

    This unit circle diagram(UNSW, 2009) shows what actually happens when we turn cosθ\cos\theta to cos(θ+dθ)\cos(\theta+d\theta) and sinθ\sin\theta to sin(θ+dθ).\sin(\theta+d\theta).

    For the bottom right d(cosθ),-d(\cos\theta), remember that

    cos(θ+dθ)cosθ=d(cosθ),\cos(\theta+d\theta)-\cos\theta=d(\cos\theta),

    so

    cosθcos(θ+dθ)=d(cosθ)\cos\theta - \cos(\theta+d\theta)=-d(\cos\theta)

    When we move from θ\triangle\theta to (θ+dθ),\triangle(\theta+d\theta), a new, tiny triangle appears at the top right.

    The purple sides are the tiny resultant shifts in the angle functions’ outputs, and the purple hypotenuse tends closer to the arc dθd\theta making its length basically dθ.d\theta. (Radians let us do this.)

    The angles of equilateral dθ\triangle d\theta are nearly (0,90,90),(0,90,90), so we can make a small trip of θ,90θ,θ\theta, 90-\theta, \theta to find all of the purple triangle’s angles…

    demonstration of the similar triangles in the previous diagram. the similarity occurs as d-theta approaches zero

    Wowzers: the purple triangle shares all the angles of the old θ.\triangle\theta. They’re similar! So:

    cosθ1=d(sinθ)dθsin(θ)1=d(cosθ)dθ\begin{aligned} \frac{\cos\theta}{1} &= \frac{d(\sin \theta)}{d\theta} \\ \frac{\sin(\theta)}{1} &= -\frac{d(\cos\theta)}{d\theta} \\ \end{aligned}

    For the even deeper nonbelievers, here’s the algebra. However, it takes two lemmas (baby proofs) and the angle addition formula, so I’m ignoring it.

    euler and the logarithm

    You remember when you learned about π\pi?

    When you divide circumference by diameter, you get π\pi.

    It’s not exactly “a special number that has that property”; it’s more like, mathematicians imagined “I want this property to give me a special number,” and then they found it afterwards.

    To make this sound less like nonsense, let’s come back to Euler and do some inspection by calculator.

    • ddx2x=2x×0.69314...\frac{d}{dx} 2^x = 2^x \times 0.69314...
    • ddx3x=3x×1.09861...\frac{d}{dx} 3^x = 3^x \times 1.09861...
    • ddx2.5x=2.5x×0.91629...\frac{d}{dx} 2.5^x = 2.5^x \times 0.91629...
    • ddx2.7x=2.7x×0.99325...\frac{d}{dx} 2.7^x = 2.7^x \times 0.99325...

    By inspection, the weird multiplier tends to 1 as the exponent base tends to 2.7182818… mathematicians call this transcendental point e.e.

    • ddxex=ex×1\frac{d}{dx} e^x = e^x \times 1

    This is one of the many definitions of e.e. So, in a maybe ugly way, the derivative of exe^x is itself “by definition”.

    Let’s do some algebra on ee‘s other definitions.

    1. ee as continuous compound interest.

      e=limn(1+1n)ne=limh0(1+h)1hSame thing.ddxex=limh0ex+hexh Difference formula=limh0ex(eh1)hFactor out ex=exlimh0eh1h=exlimh0((1+h)1h)h1hSubstitute definition=exlimh01+h1hddxex=ex×1\begin{align} e &= \lim_{n\to\infty} (1 + \frac{1}{n})^n \notag \\ e &= \lim_{h\to 0} (1 + h)^\frac{1}{h} \quad \quad \text{Same thing.}\notag \\ \frac{d}{dx} e^x &= \lim_{h\to 0}\frac{e^{x + h} - e^x}{h} \quad \text{ Difference formula} \notag \\ &= \lim_{h\to 0}\frac{e^x (e^h - 1)}{h} \quad \text{Factor out }e^x \notag \\ &= e^x\lim_{h\to 0}\frac{e^h - 1}{h} \notag \\ &= e^x\lim_{h\to 0}\frac{((1 + h)^\frac{1}{h})^h - 1}{h} \quad \text{Substitute definition}\notag \\ &= e^x\lim_{h\to 0}\frac{1 + h - 1}{h} \notag \\ \frac{d}{dx} e^x &= e^x \times 1 \end{align}
    2. exe^x as a Taylor series whose derivative is itself.

      For reference, a Taylor series is just an infinitely long polynomial, based on the form a0x0+a1x1+a2x2+a3x3...a_0x^0 + a_1x^1 + a_2x^2 + a_3x^3...

      ex=a0+a1x1+a2x2+a3x3...e0=1ex=1+a1x1+a2x2+a3x3...ddxex=a1+2a2x+3a3x2...1+a1x1+a2x2...=a1+2a2x1+3a3x2...\begin{aligned} e^x &= a_0 + a_1x^1 + a_2x^2 + a_3x^3 ... \\ e^0 &= 1 \\ e^x &= 1 + a_1x^1 + a_2x^2 + a_3x^3 ... \\ \frac{d}{dx} e^x &= a_1 + 2a_2x + 3a_3x^2 ... \\ 1 + a_1x^1 + a_2x^2 ... &= a_1 + 2a_2x^1 + 3a_3x^2 ... \\ \end{aligned}

      Because a0=1,a1=a0,a2=12a1,a3=13a2...a_0=1,\quad a_1=a_0,\quad a_2=\frac{1}{2}a_1,\quad a_3=\frac{1}{3}a_2...

      we see that a0=1,a1=1,a2=12×1,a3=13(12×1)...a_0=1, \quad a_1=1, \quad a_2= \frac{1}{2} \times 1,\quad a_3 = \frac{1}{3}(\frac{1}{2} \times 1)...

      ex=1+1x+12x2+12×3x3+12×3×4x4...ex=1+x+12!x2+13!x3+14!x4...ddxex=0+1+212!x+313!x2+414!x3...ddxex=1+1x+12!x2+13!x3+14!x4...\begin{align} e^x &= 1 + 1x + \frac{1}{2}x^2 + \frac{1}{2 \times 3}x^3 + \frac{1}{2 \times 3 \times 4}x^4 ... \notag \\ e^x &= 1 + x + \frac{1}{2!}x^2 + \frac{1}{3!}x^3 + \frac{1}{4!}x^4... \notag \\ \frac{d}{dx}e^x &= 0 + 1 + 2\frac{1}{2!}x + 3\frac{1}{3!}x^2 + 4\frac{1}{4!}x^3 ... \\ \frac{d}{dx}e^x &= 1 + 1x + \frac{1}{2!}x^2 + \frac{1}{3!}x^3 + \frac{1}{4!}x^4 \notag ... \end{align}

      so the equality holds.

    That’s a lot. Thankfully, once we say that we know e,e, acquiring the derivative of lnx\ln x (the inverse of ex=y,e^x=y, aka loge(y)=ln(y)=x\log_e(y)=\ln(y)=x), is a simple chain rule:

    x=xeln(x)=xddxeln(x)=ddxxeln(x)×ddxln(x)=1x×ddxln(x)=1ddxln(x)=1x\begin{align} x &= x \notag \\ e^{\ln(x)} &= x \notag \\ \frac{d}{dx} e^{\ln(x)} &= \frac{d}{dx} x \notag \\ e^{\ln(x)} \times \frac{d}{dx} \ln(x) &= 1 \notag \\ x \times \frac{d}{dx} \ln(x) &= 1 \notag \\ \frac{d}{dx} \ln(x) &= \frac{1}{x} \end{align}

    and finally, the derivative of any power axa^x is a simple rearrangement into terms of ee:

    Let a=some constantddxax=ddx(eln(a))x=ddxexlna=exlna×lna=(elna)xlnaddxax=axlna\begin{align} \text{Let } a &= \text{some constant} \notag \\ \frac{d}{dx} a^x &= \frac{d}{dx} (e^{\ln(a)})^x \notag \\ &= \frac{d}{dx} e^{x \ln a} \notag \\ &= e^{x \ln a} \times \ln a \notag \\ &= (e^{\ln a})^x \ln a \notag \\ \frac {d}{dx} a^x &= a^x \ln a \end{align}

    inverse trig derivatives

    default triangle

    An inverse trig function reverses some

    sinθ=oppositehypotenuse\sin\theta=\frac{\text{opposite}}{\text{hypotenuse}}

    into

    arcsin(oppositehypotenuse)=θ\arcsin\left(\frac{\text{opposite}}{\text{hypotenuse}}\right) = \theta

    Of course, this means that in an arcsinx\arcsin x triangle, aka an arcsin(x1)\arcsin(\frac{x}{1}) triangle, the opposite has length xx and the hypotenuse has length 1.

    NOTE: The inverse trig functions are very man-made, in order to pass the vertical line test!

    Vertical what now?

    If function f(x)f(x) can be crossed at two points by a vertical line, then f(x)f(x) has two output yy‘s for one single input x.x. This makes f(x)f(x) not a function.

    To prevent relations such as

    sin(0)=0sin(2π)=0\begin{aligned} \sin(0) &= 0 \\ \sin(2\pi) &= 0 \end{aligned}

    from turning into the multiple-output

    arcsin(0)={...0,2π,...}\arcsin(0)=\{...0, 2\pi, ...\}

    we need to restrict both sinx\sin x and arcsinx\arcsin x into “one-to-one” functions.

    • Restrict sinx\sin x‘s domain and therefore arcsinx\arcsin x‘s range to

      (π2,π2)\left(-\frac{\pi}{2}, \frac{\pi}{2}\right)
    • This will restrict sinx\sin x‘s range and therefore arcsinx\arcsin x‘s range to (1,1)(-1, 1)

    Ready? Let’s go!

    arcsin triangle

    Let’s look at arcsinx1,\arcsin \frac{x}{1}, where

    oppositehypotenuse=x1\frac{\text{opposite}}{\text{hypotenuse}}=\frac{x}{1}
    Remember our thought process for inverse functions: ddxsin(arcsinx)=ddxxcos(arcsinx)arcsinx=1arcsinx=1cos(arcsinx)\begin{aligned} \frac{d}{dx} \sin(\arcsin x) &= \frac{d}{dx} x \\ \cos(\arcsin x) \arcsin' x &= 1 \\ \end{aligned} \\ \arcsin' x = \frac{1}{\cos(\arcsin x)}

    Suddenly, we have a cos\cos in there! Of course, using a2+b2=c2,a^2 + b^2 = c^2, we can deduce the adjacent side to be 1x2.\sqrt{1-x^2}.

    Hence,

    arcsinx=1cos(arcsinx)=1cosθarcsinx=11x2\begin{align} \arcsin'x &= \frac{1}{\cos(\arcsin x)} \notag \\ &= \frac{1}{\cos\theta} \notag \\ \arcsin'x &= \frac{1}{\sqrt{1 - x^2}} \end{align}

    Looking at the graph of arcsinx,\arcsin x, this always-positive derivative checks out!

    arccos triangle

    The general process is the same for the other functions. Build a triangle, Pythagorize, differentiate, check.

    Let’s move to arccos(x1),\arccos(\frac{x}{1}), where

    adjacenthypotenuse=x1\frac{\text{adjacent}}{\text{hypotenuse}}=\frac{x}{1}

    First, we rawly differentiate the inverse function:

    ddxcos(arccosx)=ddxxsin(arccosx)arccosx=1arccosx=1sin(arccosx)\begin{aligned} \frac{d}{dx} \cos(\arccos x) &= \frac{d}{dx} x \\ -\sin(\arccos x) \arccos ' x &= 1 \\ \end{aligned} \\ \arccos' x = \frac{1}{-sin(\arccos x)}

    then we find our missing side 1x2\sqrt{1-x^2} using Pythagoras.

    After that, substitute:arccosx=1sin(arccosx)=1sinθarccosx=11x2\begin{align} \arccos' x &= \frac{1}{-\sin(\arccos x)} \notag \\ &= \frac{1}{-\sin\theta} \notag \\ \arccos' x &= \frac{1}{-\sqrt{1 - x^2}} \end{align}

    Just like the picture, this derivative is always negative.

    arctan triangle

    For arctan(x1),\arctan(\frac{x}{1}), we have

    oppositeadjacent=x1\frac{\text{opposite}}{\text{adjacent}}=\frac{x}{1}

    Recall that tanθ=(secθ)2.\tan'\theta=(\sec\theta)^2.

    Rawly differentiate:ddxtan(arctanx)=ddxxsec2(arctanx)arctanx=1arctanx=1sec2(arctanx)\begin{aligned} \frac{d}{dx} \tan (\arctan x) &= \frac{d}{dx} x \\ \sec^2 (\arctan x) \arctan'x &= 1 \\ \end{aligned} \\ \arctan' x = \frac{1}{\sec^2 (\arctan x)}

    then Pythagorize for the missing hypotenuse length, 1+x2.\sqrt{1 + x^2}.

    After that, substitute:arctanx=1sec2(arctanx)=1sec2θ=(cosθ)2=(11+x2)2arctanx=11+x2\begin{align} \arctan' x &= \frac{1}{\sec^2(\arctan x)} \notag \\ &= \frac{1}{\sec^2\theta} \notag \\ &= (\cos\theta)^2 \notag \\ &= \left(\frac{1}{\sqrt{1 + x^2}}\right)^2 \notag \\ \arctan' x &= \frac{1}{1 + x^2} \end{align}

    And finally, check the graph. Our always-positive derivative checks out!

    arcsec triangle

    For arcsec (x1),\text{arcsec }(\frac{x}{1}), we have

    hypotenuseadjacent=x1\frac{\text{hypotenuse}}{\text{adjacent}} = \frac{x}{1}

    Recall that secθ=secθtanθ.\sec'\theta = \sec\theta\tan\theta.

    Rawly differentiate: ddxsec(arcsec x)=ddxxsec(arcsec x)tan(arcsec x)arcsec x=1arcsec x=1sec(arcsec x)tan(arcsec x)\begin{aligned} \frac{d}{dx} \sec(\text{arcsec } x) &= \frac{d}{dx} x \\ \sec(\text{arcsec } x)\tan(\text{arcsec } x) \text{arcsec }' x &= 1 \\ \end{aligned} \\ \text{arcsec } ' x = \frac{1}{\sec(\text{arcsec } x)\tan(\text{arcsec } x)}

    then Pythagorize for the missing opposite length, which is x21.\sqrt{x^2 - 1}.

    Finally, substitute: arcsec x=1sec(arcsec x)tan(arcsecx)=1secθtanθ=1xx21=1xx21arcsec x=1xx21\begin{align} \text{arcsec }'x &= \frac{1}{\sec(\text{arcsec } x)\tan(\text{arcsec} x)} \notag \\ &= \frac{1}{\sec\theta \tan\theta} \notag \\ &= \frac{1}{x \sqrt{x^2 - 1}} \notag \\ &= |\frac{1}{x \sqrt{x^2 - 1}}| \notag \\ \text{arcsec }'x &= \frac{1}{|x|\sqrt{x^2 - 1}} \end{align}

    Note that the lines on arcsec x\text{arcsec }x that mathematicians prefer to use have an eternally positive derivative. To stay positive, we slap an absolute value around x.|x|.

    arccsc triangle

    For arccsc(x1),\text{arccsc}(\frac{x}{1}), we have

    hypotenuseopposite=x1\frac{\text{hypotenuse}}{\text{opposite}}=\frac{x}{1}

    xx and our opposite is 1. Recall that cscθ=csccotθ.\csc'\theta = -\csc\cot \theta.

    Rawly differentiate: ddxcsc(arccsc x)=ddxxcsc(arccsc x)cot(arccsc x)arccsc x=1arccsc x=1csc(arccsc x)cot(arccsc x)\begin{aligned} \frac{d}{dx} \csc(\text{arccsc } x) &= \frac{d}{dx} x \\ -\csc(\text{arccsc } x)\cot(\text{arccsc }x)\text{arccsc }'x &= 1 \\ \end{aligned} \\ \text{arccsc } ' x = \frac{1}{-\csc(\text{arccsc } x)\cot(\text{arccsc } x)}

    then Pythagorize for the missing adjacent length, which is x21.\sqrt{x^2 - 1}.

    Finally, substitute: arccsc x=1csc(arccsc x)cot(arccsc x)=1cscθcotθ=1xx21arccsc x=1xx21\begin{align} \text{arccsc } ' x &= \frac{1}{-\csc(\text{arccsc } x)\cot(\text{arccsc } x)} \notag \\ &= \frac{1}{-\csc\theta \cot\theta} \notag \\ &= \frac{1}{- x \sqrt{x^2 - 1}} \notag \\ \text{arccsc }'x &= \frac{1}{-|x|\sqrt{x^2 - 1}} \end{align}

    Since the part of arccsc x\text{arccsc } x we use in-practice has a negative slope at every point, we slap an absolute value in x.-|x|.

    arccot triangle

    For arccot (x1),\text{arccot }(\frac{x}{1}), we have

    adjacentopposite=x1\frac{\text{adjacent}}{\text{opposite}} = \frac{x}{1}

    Recall that cotθ=(cscθ)2.\cot'\theta = -(\csc\theta)^2.

    Rawly differentiate: ddxcot(arccot x)=ddxxcsc(arccot x)2arccot x=1arccot x=1csc2(arccot x)\begin{aligned} \frac{d}{dx} \cot(\text{arccot } x) &= \frac{d}{dx} x \\ -\csc(\text{arccot } x)^2 \text{arccot }'x &= 1 \\ \end{aligned} \\ \text{arccot } ' x = \frac{1}{-\csc^2(\text{arccot } x)}

    then Pythagorize for the missing adjacent length, which is x2+1.\sqrt{x^2 + 1}.

    Finally, substitute: arccot x=1csc2(arccot x)=1csc2θ=1(x2+1)2arccot x=1x2+1\begin{align} \text{arccot }' x &= \frac{1}{-\csc^2(\text{arccot } x)} \notag \\ &= \frac{1}{-\csc^2\theta} \notag \\ &= \frac{1}{-(\sqrt{x^2 + 1})^2} \notag \\ \text{arccot }' x &= -\frac{1}{x^2 + 1} \end{align}

    Just like the picture, this derivative is always negative.