The “secant” of a function, i.e. the formula for slope between two given
points on the function, is
△x△y△x△y=x2−x1y2−y1=hf(x+h)−f(x)
If we make the difference between the x-values of the points really, really small, lowering h (the distance between
x2 and x1) towards an infinitesimal (almost 0), we
can find the function’s derivative — this tells us the
instant slope at any given point of the function.
dxdf(x)=h→0limhf(x+h)−f(x)
This “difference formula” is the first half of calculus.
Before algebra, let’s visualize what y=f(x)g(x)
would look like.3b1b
Let’s make a rectangle whose area is
y=f(x)g(x)
Now… let’s extend each side to find the next instant point.
Imagine that the difference between f(x) and f(x+h) is df
Imagine that the difference between g(x) and g(x+h) is dg
The tiny differential bits that we added, i.e. df,dg,
and the tiny×tiny negligible-area nub on
the bottom right, now expand our area a Tiny Bit… by dy.
dydxdy=f(x)dg+g(x)df+dfdg=f(x)dxdg+g(x)dxdf
Cool, right? Of course, we could’ve just done algebra from the start.
The power rule is just a special case of the product rule*. Don’t believe me?
Let’s knock your socks off…
*if we look at integer powers only
When we apply product rule logic to an x2
square, growing each side by dx, the (dx)2 corner becomes
a negligible tiny×tiny.
When we generalize that to an x3 cube, we get three
x2×dx growths from the faces; the edge growths
x(dx)2 and the corner growth (dx)3 become
negligible due to tiny×tiny.
So our pattern is, for any multi-dimensional square or cube or hypercube:
Extrude every side by dx
e.g. Cube: xxx becomes (x+dx)(x+dx)(x+dx)
Multiply this dx by every other side (lengths of x) to
find the added area/volume/hypervolume of that extrusion
e.g. Faces: (dx)xx+x(dx)x+xx(dx)=3x2dx
The growths on the extrusions’ outskirts,
like the corner growth in a square, or the edges and
corners of a cube, have multiple dx‘s and are thus
negligible
e.g. (dx)(dx)x+(dx)x(dx)+x(dx)(dx)+(dx)(dx)(dx)
dxdxa=axa−1
The algebra isn’t as satisfying, because it requires the binomial
coefficient. Why? Let’s look at what’s going on when you power up, first.
So, for every amount of x’s, e.g. 3/5 variables are x or 2/5 variables are h,
we also have every possible unique position of this amount of x’es.
If we want to pick out 3/5 objects from a Big Ordered Set, where the Small Set order
doesn’t matter, we use the binomial coefficient… which I’ll cover in a later
article ¯\_(⌐■_■)_/¯
f′(x)dxdxadxdxadxdxadxdxadxdxadxdxadxdxa=h→0limhf(x+h)−f(x)=h→0limh(x+h)a−xa=h→0limh(0a)x0ha+(1a)x1ha−1+...+(aa)xah0−xaDivide out the h.=h→0lim(0a)x0ha−1+...+(a−1a)xa−1h0+(aa)xah−1−hxaNumerators with h remaining are pretty much 0, negligible=h→0lim(a−1a)xa−1h0+(aa)xah−1−hxa=h→0lima∙xa−1h0+1∙xah−1−hxa=h→0lima∙xa−1+(xah−1−xah−1)=axa−1
But what about non-integers?
The chain rule (discussed later) generalizesMath SE the power
rule to the rationals:
Let pdxdxp/qdxdxp/q=an integer=dxd(xp)1/q=q1(xp)1/q−1dxd(xp)=qp(xp)1/q−1xp−1=qpxp/q−pxp−1=qpxp/q−1
and the reals:
Let rdxdxr=a real number=dxderlnx=erlnx(rdxdlnx)=erlnx×r×x1=xr×xr=rxr−1
Let’s say we want to find dxdy of
the function composition f(g(x)). To do this,
we’ll have to remember that a derivative is just
x−x1y−y1=nudge in input dxresultant nudge in output dy
at very tiny intervals of dx.
A nudge in x will result in a nudge in
y1=g(x).
We’ll call this resultant nudge dy1.
dx1dy1=△x1△y1
Since g(x) is the “input” of f, a nudge in
y1 will result in a nudge in y2=f(g(x)).
We’ll call this resultant nudge dy2.
dy1dy2=△y1△y2
Because of these definitions, it is 100% sound to
do the following to relate y2 to x1:
△y1△y2×△x1△y1=△x1△y2
That above equality is often represented as
dudy×dxdu=dxdy,
which means the same thing.3b1b
dxdf(g(x))=f′(g(x))×g′(x)A note on implicit differentiation
Implicit differentiation, an application of the
chain rule, is when we differentiate
some variable (like y or z) with respect to
another (like x). Here’s an example:
dxd(x2+y2)2x+2ydxdy=dxd1=0
Why does this happen? Well, if we differentiate
y with respect to x, we imply* that y is
a “function” of x, in the sense that there must be
a “way” to map x to y on a graph
(like y=±x or y=lnx).
(* not actually why it’s called this ;_; but it should be)
Consider y=f(x).
The equation earlier would now look like
dxd(x2+f(x)2)2x+2f(x)dxdf=dxd1=0
Think about it with our previous chain rule logic.
Since y is a “function” of x, when we
shift the input x, by dx, we will cause a
resultant shift by the output y, by dy. Furthermore, this shift in the new input y, by dy,
will cause a shift in the next output y2. It’s
the same exact idea as differentiating g(f(x)).
dxd(x2+g(f(x)))2x+dfdgdxdf=dxd1=0
Technically, implicit differentiation is applied on
every variable-that-is-a-function in an equation.
This is just an application of the chain rule on the definition
of an inverse function.
Let’s say that f(x)=y. For any input x, we can get an output f(x)=y.
Let’s then say that function g was the inverse of function f. For any output
y, we can find the input g(y)=x.
This is because differentiating sine and cosine
causes a “phase shift,” a shift in the entire form of the
graph, towards the left. Let’s look at this below:
Observe f(x)=sinx.
Instant slope at the highest and lowest points of sinx is 0.
Instant slope at the spot halfway between any of these “zero-slope” points is when rate of change gets the fastest, where
dxdy reaches the peak of ±1.
Between said points, we can also discern how the rate of change itself
is changing. For example, in sinx, when we go from
dxdy=0 to dxdy=−1, the rate of lowering gets faster and faster.
With these ideas, we can sketch out a graph for dxdsinx.
And every time we differentiate, we just shift left…
Now that we know this, we can differentiate every other angle formula
in terms of sinx and cosx. For example, using the quotient rule,
we can find tanx as such:
If you’re in a rush and forgot the other angle formulas, you can do the
same quotient rule, too! The quotient rule is the straightest
path to solving every single one.
A lot of you might be unsatisfied with the proof of the sine, cosine,
negative sine, negative cosine chain.
Just because it looks like it’s true, doesn’t mean it’s 100% true.
That’s right. It’s a convenient (but artistic) leap to differentiate
the peaks, troughs, and quickest velocities of sinx, and to furthermore
eyeball every other point on sinx to guess that dxdy=cosx.
For those who reject this handwavy intuition, let’s look at a slightly
more huge diagram.
sinx: a difference formula diagram
This unit circle diagram(UNSW, 2009)
shows what actually happens when we turn
cosθ to cos(θ+dθ) and
sinθ to sin(θ+dθ).
For the bottom right −d(cosθ), remember that
cos(θ+dθ)−cosθ=d(cosθ),
so
cosθ−cos(θ+dθ)=−d(cosθ)
When we move from △θ to
△(θ+dθ), a new, tiny triangle
appears at the top right.
The purple sides are the tiny resultant shifts in the angle functions’ outputs, and the purple
hypotenuse tends closer to the arc dθ
making its length basically dθ. (Radians
let us do this.)
The angles of equilateral △dθ are nearly (0,90,90), so we can make a small trip
of θ,90−θ,θ to find all of the
purple triangle’s angles…
Wowzers: the purple triangle shares all the angles of the old △θ. They’re similar! So:
1cosθ1sin(θ)=dθd(sinθ)=−dθd(cosθ)
For the even deeper nonbelievers, here’s the
algebra. However, it takes two lemmas (baby proofs)
and the angle addition formula, so I’m ignoring it.
When you divide circumference by diameter, you get π.
It’s not exactly “a special number that has that property”; it’s
more like, mathematicians imagined “I want this property to give
me a special number,” and then they found it afterwards.
To make this sound less like nonsense, let’s come back to Euler
and do some inspection by calculator.
dxd2x=2x×0.69314...
dxd3x=3x×1.09861...
dxd2.5x=2.5x×0.91629...
dxd2.7x=2.7x×0.99325...
By inspection, the weird multiplier tends to 1 as the exponent
base tends to 2.7182818… mathematicians call this transcendental
point e.
dxdex=ex×1
This is one of the many definitions of e. So, in a maybe ugly
way, the derivative of ex is itself “by definition”.
Let’s do some algebra on e‘s other definitions.
e as continuous compound interest.
eedxdexdxdex=n→∞lim(1+n1)n=h→0lim(1+h)h1Same thing.=h→0limhex+h−ex Difference formula=h→0limhex(eh−1)Factor out ex=exh→0limheh−1=exh→0limh((1+h)h1)h−1Substitute definition=exh→0limh1+h−1=ex×1
ex as a Taylor series whose derivative is itself.
For reference, a Taylor series is just an infinitely long
polynomial, based on the form
a0x0+a1x1+a2x2+a3x3...
That’s a lot. Thankfully, once we say that we know e,
acquiring the derivative of lnx (the inverse of ex=y, aka loge(y)=ln(y)=x), is a simple chain rule:
Note that the lines on arcsec x that
mathematicians prefer to use have an eternally
positive derivative. To stay positive,
we slap an absolute value around ∣x∣.
For arccsc(1x), we have
oppositehypotenuse=1x
x and
our opposite is 1. Recall that csc′θ=−csccotθ.