Worlds Revolve Around U

The planets and other bodies in our solar system orbit the sun along elliptical trajectories. For a given body, its elliptical orbit has the sun located at one of the two foci. For most of the planets, the eccentricity of their orbits is low, so they are approximately circular, and other bodies like comets have high eccentricity. This still begs the question: how are stable orbits asymmetric if the gravitational field from the sun is spherically symmetric? The reason is the periodic exchange of energy between kinetic and potential. You see this phenomenon occur in a pendulum or a spring. As these objects oscillate, energy is changing form while the total remains constant (neglecting friction). In a perfectly circular orbit, however, there is no exchange, since kinetic and potential energy each remain constant.

The physics of this can be worked out using Lagrangian mechanics. The first step is writing a Lagrangian describing an orbiting body. This is composed of the kinetic and potential energy of the body. The kinetic energy is dependent on its velocity. The potential energy is dependent on its distance from the sun. These can be put together to give us the Lagrangian L:
L = \frac{1}{2} m v^2 - U(r)
where v is the velocity of the planet, m is the mass of the planet, and U is the potential energy from the sun’s gravity, which is dependent on radial distance r. The explicit form of U is not relevant yet.

It is easier to work in polar coordinates rather than Cartesian coordinates for this problem. We can simplify things by noting that the planet is not leaving the plane of its orbit. So the z coordinate is zero. Then the x and y coordinates can be converted to polar coordinates r and \theta.
\vec{r} =(x,y,z) = r(\cos\theta, \sin\theta, 0)
Take the derivative with respect to time to get the velocity vector, and then square it for the kinetic energy term.
\vec{v} = \dot{r} (\cos\theta, \sin\theta, 0) + r \dot{\theta} (-\sin\theta, \cos\theta,0)
v^2 = \dot{r}^2 + r^2 \dot{\theta}^2
We are then left with a Lagrangian in polar coordinates:
L = \frac{1}{2} m (\dot{r}^2 + r^2 \dot{\theta}^2) - U(r)

There are two degrees of freedom here: radial and angular. To get the equations of motion, we need to evaluate the Euler-Lagrange equation for each degree of freedom.
For angular:
\frac{\partial L} {\partial \theta} = 0
since there is no explicit dependence on angle \theta.
\frac{\partial L}{\partial \dot{\theta}} = \ell = m r^2 \dot{\theta}
This variable \ell is a constant with respect to time, and is the angular momentum of the orbiting body.
We can then express the equation of angular velocity in terms of r:
\dot{\theta} = \frac{\ell}{mr^2}

For the radial degree of freedom:
\frac{\partial L}{\partial r} = mr \dot{\theta}^2 - \frac{\partial U}{\partial r}
\frac{d}{dt} \frac{\partial L}{\partial \dot{r}} = m \ddot{r}
Equate these together and you get the equation of motion:
m\ddot{r} = mr \dot{\theta}^2 - \frac{\partial U}{\partial r}
Plug in the equation for angular velocity, and the equation of motion only depends on r:
m\ddot{r} = \frac{\ell^2}{m r^3} - \frac{\partial U}{\partial r}
This differential equation can then be solved to get the trajectory of an orbiting body.

To solve this differential equation, do a change of variables: distance r to inverse distance u such that:
r = \frac{1}{u}
We then have:
m \frac{d^2}{dt^2} \frac{1}{u} = \frac{\ell^2}{m} u^3 - \frac{\partial U}{\partial r}

Because there is no explicit dependence on angle \theta in the Lagrangian, time dependence can be converted to angular dependence.
\frac{d\theta}{dt} = \frac{\ell}{mr^2} = \frac{\ell}{m} u^2
\frac{d}{dt} = \frac{\ell}{m} u^2 \frac{d}{d\theta}
\frac{d^2}{dt^2} \frac{1}{u} = \frac{\ell^2}{m^2} u^2 \frac{d}{d\theta} u^2 \frac{d}{d\theta} \frac{1}{u}=-\frac{\ell^2}{m^2} u^2 \frac{d^2u}{d\theta^2}

The equation of motion then becomes:
-\frac{\ell^2}{m} u^2 \frac{d^2u}{d\theta^2} = \frac{\ell^2}{m} u^3 - \frac{\partial U}{\partial r}

We now need the explicit form of the potential energy from the sun’s gravity to continue solving this. The equation is:
U(r) =-\frac{GMm}{r}
\frac{\partial U}{\partial r} = \frac{GMm}{r^2} = GMmu^2
Where G is Newton’s constant and M is the mass of the sun.

The equation of motion can then be simplified further:
-\frac{\ell^2}{m} u^2 \frac{d^2u}{d\theta^2} = \frac{\ell^2}{m} u^3 - GMmu^2
\frac{d^2 u}{d\theta^2} = -u + \frac{GMm^2}{\ell^2}
u^{\prime\prime} = -u + c

Solving this is relatively easy with a variable substitution:
w = u - c
w^{\prime\prime} = u^{\prime\prime}
w^{\prime\prime} = -w
The general solution is then:
w = A \sin\theta + B \cos\theta
However, without loss of generality, this can be expressed with just one trigonometric function.
=\frac{A}{2i} (\exp(i\theta) - \exp(-i\theta)) + \frac{B}{2} (\exp(i\theta) - \exp(-i\theta))
=\frac{1}{2} (\exp(i\theta) (B-iA) +\exp(-i\theta) (B+iA))
Note that:
B-iA = C \exp(i\delta)
w = \frac{C}{2} (\exp(i(\theta +\delta)) + \exp(-i(\theta +\delta)) )
= C\cos(\theta+\delta)

So now we have the general solution:
u = w + c = C\cos(\theta+\delta) + c
The constant C can be shown to be equal to \frac{GMm^2}{\ell^2} e.
where e is the eccentricity of the trajectory, not Euler’s number.
u = \frac{GMm^2}{\ell^2} (e \cos(\theta+\delta) + 1)
Put back in terms of distance, we have a function for a trajectory in terms of angle and radial distance:
r = \frac{\ell^2}{GMm^2} \frac{1}{(1+ e \cos(\theta+\delta))}

Note that different values of eccentricity give different trajectory shapes:
Circle e=0
Ellipse 1>e>0
Parabola e=1
Hyperbola e > 1
Each of these is a conic section: a 2D cross-section of a 3D cone.

This image has an empty alt attribute; its file name is ellipsehyp.gif

For simplicity, we can choose an axis where \theta+\delta = 0 such that r = r_0.
r_0 = \frac{\ell^2}{GMm^2}
So then we have:
r= \frac{r_0}{1+e\cos(\theta+\delta)}

We can verify that this trajectory is a conic section by transforming it into Cartesian coordinates to look something like this:
\frac{(x-d)^2}{a^2} + \frac{y^2}{b^2} = 1

Here the transformation is done step by step:
r= \frac{r_0}{1+e\cos(\theta+\delta)}
r + e x = r_0
r = r_0 - ex
x^2 + y^2 = r_0^2 - 2e r_0 x + e^2 x^2
(1-e^2) x^2 + 2er_0 x + y^2 = r_0^2
(1-e^2) x^2 + 2er_0 x + \frac{r_0^2 e^2}{1-e^2} + y^2 = r_0^2 + \frac{r_0^2 e^2}{1-e^2}
(1-e^2) \left( x+ \frac{r_0 e}{1-e^2} \right)^2 + y^2 = \frac{r_0^2}{1-e^2}
The last step is then putting the trajectory equation into the form shown above.
\frac{(1-e^2)^2}{r_0^2} \left( x + \frac{r_0 r}{1-e^2} \right) + \frac{(1-e^2)}{r_0^2} y^2 = 1
Where the variables here and relate to the variables a, b and d:
a = \frac{r_0}{1-e^2}
b = \frac{r_0}{\sqrt{1-e^2}}
d = \frac{r_0 e}{1-e^2} = ae
b/a = \sqrt{1-e^2}
So the trajectory can be written in terms of a and e:
\frac{(x-ae)^2}{a^2} + \frac{y^2}{a^2(1-e^2)} = 1
(x-ae)^2 + \frac{y^2}{1-e^2} = a^2
Or back in polar coordinates:
r= \frac{a(1-e^2)}{1+e\cos(\theta)}


A bound orbit with 0\leq e<1 has a min and max distance from the sun, a semi-major axis, and a semi-minor axis.
a = semi-major axis
b = semi-minor axis
r_\text{min} = \frac{r_0}{1+e} = a(1-e) perihelion (minimum distance to sun)
r_\text{max} = \frac{r_0}{1-e} = a(1+e) aphelion (maximum distance to sun)
Using this is actually how you solve for the constant C in the general trajectory formula.
\frac{1}{r} = C\cos(\theta) + \frac{GMm^2}{\ell^2}
For \theta = 0, we can say the planet is nearest to the sun, so r = a(1-e), and for \theta = \pi, we can say it is at its farthest from the sun, so r = a(1+e).
This gives a system of two equations that can be solved:
\frac{1}{a(1-e)} = C + \frac{GMm^2}{\ell^2}
\frac{1}{a(1+e)} = -C + \frac{GMm^2}{\ell^2}
C = \frac{GMm^2}{\ell^2}

From here, we can arrive at Kepler’s law of equal areas. This is the observation that an orbiting body sweeps out the same area between it and the sun for any given duration of time.
Start with an infinitesimal piece of swept area:
dA = \frac{1}{2} r(\theta)^2 d\theta
Divide by an infinitesimal piece of time:
\frac{dA}{dt} = r(\theta)^2 \frac{d\theta}{dt} = \frac{\ell}{2m}
And there you have it, the rate of the area swept over time is constant.
We can express this constant in terms of the semi-major axis, and arrive at another of Kepler’s laws relating the orbital period to the semi-major axis.
r_0 = \frac{\ell^2}{GMm^2}
r_0 = a (1-e^2)
\frac{\ell}{2m} = \frac{\sqrt{GM r_0}}{2} = \frac{\sqrt{GM a (1-e^2)}}{2}
The full orbit area and period are then related by:
A = \frac{\sqrt{GM a (1-e^2)}}{2} T
The area of the ellipse in terms of the semi-major axis is:
A = \pi ab = \pi a^2 \sqrt{1-e^2}
So we then have the law relating the orbital period with the semi-major axis:
T^2 = \frac{4\pi^2}{GM} a^3

Take the log of both sides, and we can see how planets and other objects fall on the curve.
\frac{2}{3} \log(T) - \frac{1}{3} \log\left(\frac{4\pi^2}{GM}\right) = \log(a)

The orbital period and semi-major axis of the planets were gathered from Wikipedia. However, if you were to measure these quantities yourself, you could use regression to check Kepler’s law and measure the mass of the sun.
From the numbers given in Wikipedia, we compute the ratio between exponents on the period and semi-major axis to be 0.6667 \pm 0.0001 and the mass of the sun to be (1.9952\pm 0.0140) \times 10^{30} \text{kg}.

From Logic to Math

This entry is for those who may be taking a symbolic logic or Boolean algebra course and find it hard to keep track of things. Symbolic logic has an abundance of symbols representing logical operations, or conjunctions, between premises to form statements or arguments. The symbols you are likely to see are listed below with the generic premises P and Q:
Not P = \, \sim P
P and Q = P \cdot Q
P or Q (inclusive or) = P \vee Q
If P then Q = P \rightarrow Q
P If and only if Q = P \leftrightarrow Q
P xor Q (exclusive or) = P \oplus Q

In addition to keeping track of several symbols, you have to remember the corresponding truth tables. The premises P and Q can both be either true T or false F, and the truth value of an argument consisting of the conjunction of P and Q is typically represented in a truth table which lists all the combinations of values and the resulting argument values.

For example, the truth table for “P and Q” is:
\begin{matrix} P & Q & P\cdot Q\\ T & T & T \\ T & F & F \\  F & T & F \\ F & F & F \end{matrix}

When I took the class, I wondered if there was a more succinct way to capture the structure of these operations. All arguments can be constructed in terms of “and”, “inclusive or” and “not” operations. Even the “if-then”, “if-and-only-if”, and “exlusive or” can be written in terms of “and”, “inclusive or” and “not”. The reduction does not have to stop there. All statements can be constructed in terms of “nand” (not-and). The symbol for this is the Sheffer stroke \uparrow.
P nand Q = P \uparrow Q
The other operations can be worked out in terms of nand:
Not P =\,\sim P = P\uparrow P
P and Q = P \cdot Q = (P\uparrow Q) \uparrow (P\uparrow Q)
P or Q = P \vee Q = (P\uparrow P) \uparrow (Q \uparrow Q)

However, I felt like this removes a lot of the intuition. So I present another approach here. The nand operation can be expressed in terms of arithmetic operations if true and false are mapped to 1 and 0, respectively: T = 1 and F = 0
P \uparrow Q = 1 - P Q

The continuous surface aligns with the discrete logic points.


This isn’t a unique arithmetic representation, since there are other formulas that can represent nand, but this one seems like the simplest.

The truth table for “P nand Q” is:
\begin{matrix} P & Q & P\uparrow Q\\ T & T & F \\ T & F & T \\  F & T & T \\ F & F & T \end{matrix}

Or rather, since T = 1 and F = 0:
\begin{matrix} P & Q & 1-PQ\\ 1 & 1 & 0 \\ 1 & 0 & 1 \\  0 & 1 & 1 \\ 0 & 0 & 1 \end{matrix}

We can then work out the arithmetic formulae for each logical operation.
For “not P” we have:
\sim P = P\uparrow P = 1 - P^2
Premises values P \in (0,1) are idempotent P^2 = P
Therefore:
\sim P = 1-P

“P and Q” can be worked out the same way, by using the P \uparrow Q = 1 - P Q to replace all the Sheffer stroke operations with arithmetic.
P \cdot Q = (P\uparrow Q) \uparrow (P\uparrow Q)
=(1-PQ)\uparrow (1-PQ) = 1 - (1-PQ)^2
= 1 - (1-2PQ + P^2 Q^2) = 1 - (1- PQ) = PQ

P or Q :
P \vee Q = (P\uparrow P) \uparrow (Q \uparrow Q) = (1-P) \uparrow (1-Q)
= P + Q - PQ

If P then Q.
P \rightarrow Q = P \uparrow (Q\uparrow Q) = P \uparrow (1-Q)
= 1 - P + PQ
(Note the asymmetry in the if-then truth table is represented in the formula.)
\begin{matrix} P & Q & 1-P+PQ\\ 1 & 1 & 1 \\ 1 & 0 & 0 \\  0 & 1 & 1 \\ 0 & 0 & 1 \end{matrix}

P if and only if Q:
P \leftrightarrow Q = (P \uparrow Q) \uparrow ((P\uparrow P) \uparrow (Q \uparrow Q))
=(1-PQ) \uparrow ((1-P)\uparrow (1-Q)) = (1-PQ)\uparrow(P+Q-PQ)
=1 - (1-PQ)(P+Q-PQ) = 1 - P-Q + 2PQ

By this point, you can use the above relations to replace other operators with arithmetic. For example, the exclusive or operation “P xor Q” can be expressed in terms of “or” and “and”:
P\oplus Q = (P \vee Q) \cdot \sim(P \cdot Q)
=(P+Q -PQ) \cdot (1-PQ)
=P+Q - 2PQ

To put it all together, here are the arithmetic formulae for logical operations:
Not P = \, \sim P = 1-P
P and Q = P \cdot Q = PQ
P or Q = P \vee Q = P + Q - PQ
P xor Q = P \oplus Q = P+Q - 2PQ
if P then Q = P \rightarrow Q  = 1 - P + PQ
P if and only if Q = P \leftrightarrow Q = 1 - P-Q + 2PQ
You can memorize these in place of truth tables.

Heat and Quantum Physics

In a previous blog entry, I show the “derivation” of the Schrödinger equation, which governs the behavior of the probabilistic wave function of a quantum particle. In this entry, I will show how this is related to the heat flow and temperature gradients we experience in our everyday lives.

Picture of me at Joseph Fourier’s grave in Père Lachaise Cemetery.

Deriving the Heat Equation

We have an intuitive understanding that heat flows from hot objects to colder objects nearby. This is the second law of thermodynamics. It also applies to a single object like a sheet of metal, a cooking pan, or a wire. If there is a spot on that object that is hotter than the rest of the object, perhaps from a heat source, heat from that spot will spread out through the object over time. Here I will derive the heat equation, which was developed by Joseph Fourier.

First, start with Fourier’s law of heat transfer, which is an experimental result. It models the relationship between heat conduction or flow rate in a material and the temperature gradient along the direction of heat flow through that material. Imagine we are dealing with a roughly 1-dimensional object like a wire. The law of heat transfer in that wire will be:
\frac{\partial Q}{\partial t} = -K \frac{\partial T}{\partial x}
Where:
dQ/dt = Is the heat flow.
dT/dx = Is the temperature T gradient along the wire length x.
K = Is the thermal conductivity of a material.

Visualization of heat flow through a segment of material.

In the visualization above, the heat going into segment at x is Q_\text{in}. The heat going out of segment at x +\Delta x is Q_\text{out}
Energy conservation (the first law of thermodynamics) demands the net amount of heat flow in the segment is the difference between the flow in (at x) and flow out (at x+\Delta x).
\frac{\partial Q}{\partial t} = \frac{\partial Q}{\partial t}_\text{in} - \frac{\partial Q}{\partial t}_\text{out}
Applying Fourier’s law of heat transfer to this equation we have:
\frac{\partial Q}{\partial t} = -K \frac{\partial T}{\partial x}(x) + K \frac{\partial T}{\partial x}(x+\Delta x)
The amount of heat flowing in/out of the material will raise/lower its temperature \Delta T according to the material’s mass m and heat capacity c.
c m \Delta T = Q
For a flow rate \frac{\partial Q}{\partial t}, a total amount of heat Q flows through after a time duration \Delta t.
Q = \Delta t \frac{\partial Q}{\partial t}
Applying these substitutions to the energy conservation equation we have:
cm \Delta T = K \left( \frac{\partial T}{\partial x}(x+\Delta x) - \frac{\partial T}{\partial x}(x) \right) \Delta t
Note that the material here is a wire, so we can consider a 1-D density \lambda:
m = \lambda \Delta x
c \lambda \Delta x \Delta T = K \left( \frac{\partial T}{\partial x}(x+\Delta x) - \frac{\partial T}{\partial x}(x) \right) \Delta t
Doing some rearranging we have:
\frac{\Delta T}{\Delta t} = \frac{K}{c\lambda} \frac{\left( \frac{\partial T}{\partial x}(x+\Delta x) - \frac{\partial T}{\partial x}(x) \right)}{\Delta x}
Shrinking these \Delta differences to infinitesimal d differences gives us a partial differential equation: namely the heat equation!
\frac{\partial T}{\partial t} = \frac{K}{c\lambda} \frac{\partial^2 T}{\partial x^2}
With a heat source term \mathcal{Q} we have:
\frac{\partial T}{\partial t} = \alpha \frac{\partial^2 T}{\partial x^2} + \mathcal{Q}
Note thermal diffusivity of the material here: \alpha =  \frac{K}{c\lambda}
This can be extended to include more dimensions to model how the temperature gradient changes as heat flows through a 2-d (plate) or 3-d (block) material.
\frac{\partial T}{\partial t} = \alpha \nabla^2 T + \mathcal{Q}

Transformation

We can transform the Heat equation into the Schrödinger equation by making some variable substitutions. Both equations are partial differential equations with first-order derivatives with respect to time and second-order derivatives with respect to space. To perform this transformation follow the steps below:
Turn temperature T into quantum wave function \Psi (probability density) multiplied by Planck’s constant \hbar:
T \rightarrow \hbar\Psi
So one can imagine quantum waves behaving like temperature.
We can turn thermal diffusivity into quantum spin unit \hbar/2 divided by mass m:
\alpha \rightarrow \frac{\hbar}{2m}
Turn the heat source term \mathcal{Q} into the potential energy times the quantum wave function multiplied by -1:
\mathcal{Q} \rightarrow -V \Psi
So this is interesting, it means the “heat source” is dependent on the “temperature”, which is distinct from what we encounter in our day-to-day.
Turn time into imaginary time (This is known as a Wick rotation):
t \rightarrow it
The presence of imaginary numbers here implies oscillation. This is how real “temperature” becomes complex probability waves, with real and imaginary components. We then have the Schrödinger Equation:
i\hbar \frac{\partial \Psi}{\partial t} =- \frac{\hbar^2}{2m} \frac{\partial^2 \Psi}{\partial x^2} +V\Psi

1-D Case

We can see how the behavior of temperature and quantum waves compare. We will consider the sourceless 1-D case for simplicity: the 1-D wire and 1-D square well.
The Heat/Schrödinger equation is:
\frac{\partial u}{\partial t} = k \frac{\partial^2 u}{\partial x^2}
Specify an initial condition (t=0)
u(x,0) = f(x)
and a boundary condition at (x=0 and x=a)
u(0,t) = u(a,t) = 0
The key to this comparison is the Wick rotation factor k:
k = e^{i\theta}
By changing \theta we can go from real to imaginary, from temperature-like to wave-like, and the strange realm in-between.

Solve using separation of variables: split time and space dependence.
u(x,t) = X(x) T(t)
\frac{1}{kT} \frac{\partial T}{\partial t} = \frac{1}{X} \frac{\partial^2 X}{\partial x^2} = C
We then have two equations to solve:
\frac{\partial T}{\partial t} = k C T
\frac{\partial^2 X}{\partial x^2} = C X
For the space component:
X(x) = Ae^{\sqrt{C} x} + B e^{-\sqrt{C} x}
X(0) = A + B = 0
X(x) = A(e^{\sqrt{C} x} - e^{-\sqrt{C} x})
X(a) = A (e^{\sqrt{C} a} - e^{-\sqrt{C} a}) = 0
e^{2\sqrt{C} a} = 1 = e^{2i \pi n}
\sqrt{C} a = i\pi n
\sqrt{C} = i\frac{\pi n}{a}
X(x) = A(e^{i \pi n x/a} - e^{-i \pi n x/a}) = 2iA\sin(\pi n x/a)
C = -\frac{\pi^2 n^2}{a^2}
Time component
T(t) = D e^{Ck t}
T(t) = D e^{- \pi^2 n^2 k t/a^2}
Put time and space dependence back together:
u(x,t) = X(x)T(t) = 2iAD e^{- \pi^2 n^2 k t/a^2} \sin(\pi n x/a)
u_n(x,t) = A_n e^{- \pi^2 n^2 k t/a^2} \sin(\pi n x/a)
There is a solution for each n, so put them together for completeness:
u(x,t) = \sum\limits_{n=1}^\infty A_n e^{- \pi^2 n^2 k t/a^2} \sin(\pi n x/a)
Initial condition can be written more explicitly:
u(x,0) = \sum\limits_{n=1}^\infty A_n \sin(\pi n x/a) = f(x)
Use Fourier transformation to model the evolution of temperature/wave over time:
\sum\limits_{n=1}^\infty A_n \sin(\pi n x/a) \sin(\pi m x/a) = f(x) \sin(\pi m x/a)
\int\limits_0^a \sum\limits_{n=1}^\infty A_n \sin(\pi n x/a) \sin(\pi m x/a) dx = \int\limits_0^a f(x) \sin(\pi m x/a) dx
A_m = \frac{2}{a} \int\limits_0^a f(x) \sin(\pi m x/a) dx
u(x,t) = \frac{2}{a} \sum\limits_{n=1}^\infty e^{- \pi^2 n^2 k t/a^2} \sin(\pi n x/a) \int\limits_0^a f(y) \sin(\pi n y/a) dy
In the visualizations below, the initial condition f(y) is delta function at a point qa between 0 and a. Both the real and imaginary components are shown.
u(x,t) = \frac{2}{a} \sum\limits_{n=1}^\infty e^{- \pi^2 n^2 k t/a^2} \sin(\pi n x/a) \sin(\pi n q)
The black line is temperature-like with \theta = 0. Notice it flattens out and stays real.
The red line is quantum wave-like with \theta = \pi/2. Notice the oscillations spread out and leak into the imaginary component.
The other lines, magenta, cyan and blue, are hybrids with 0 < \theta < \pi/2. You can see a mix of the behaviors.

real
imaginary
total

Merry Christoffelmas

In an earlier blog entry, I discuss spacetime curvature and how it manifests as gravity in Einstein’s theory of general relativity. The elegance of a fundmental force of nature being described by the geometry of space and time has motivated people to investigate whether or not this is the case for the other fundamental forces: electromagnetism and the nuclear forces.

In general, for any number of dimensions, the line element of a particle’s trajectory through spacetime is:
ds^2 = g_{AB} (x) dx^A dx^B
Where the indicies of A and B span over all dimensions of spacetime. The metric g_{AB}(x) describes the curvature of spacetime. The Lagrangian is then:
L = g_{AB} (x) \dot{x}^A \dot{x}^B
We can then find both sides of the Euler-Lagrange equation, which characterizes the geodesic.
The left-hand side is:
\frac{\partial L}{\partial x^C} = g_{AB,C} (x) \dot{x}^A \dot{x}^B
The right-hand side is:
\frac{\partial L}{\partial \dot{x}^C} = g_{AB}(x) (\delta^A_C \dot{x}^B + \dot{x}^A \delta^B_C)
Put them together:
\frac{d}{ds}\frac{\partial L}{\partial \dot{x}^C} = g_{AB,D}(x) \dot{x}^D (\delta^A_C \dot{x}^B + \dot{x}^A \delta^B_C) + g_{AB}(x) (\delta^A_C \ddot{x}^B + \ddot{x}^A \delta^B_C)
Put together we get:
g_{CB,A}(x) \dot{x}^A\dot{x}^B + g_{AC,B}(x) \dot{x}^B \dot{x}^A - g_{AB,C} (x) \dot{x}^A\dot{x}^B + 2 g_{CB} (x) \ddot{x}^B =0
This gives us the geodesic equation:
\frac{1}{2} g^{CD} (g_{DB,A} + g_{AD,B} - g_{AB,D}) \dot{x}^A \dot{x}^B + \ddot{x}^C = 0
The generalized Christoffel symbol is \Upsilon^C_{AB}.
\Upsilon^C_{AB} \dot{x}^A \dot{x}^B + \ddot{x}^C = 0

5-D Kaluza-Klein Theory
It turns out that adding another spatial dimension to the spacetime metric will give us the electromagnetic force in addition to gravity. Since, we only see three spatial dimensions, this additional fourth dimension must be microscopic. This idea is known as the Kaluza-Klein theory.

The metric of 4-D spacetime is a 4\times 4 matrix. If we add another spatial coordinate, the metric for this 5-D spacetime becomes a 5\times 5 matrix. This is done by concatenating a 4\times 4 matrix with a vertical 4-vector, a horizontal 4-vector and a scalar.
g_{\mu\nu}^{(4D)}= \begin{pmatrix} . & . & . & . \\  . & . & . & . \\ . & . & . & . \\ . & . & . & . \end{pmatrix} \rightarrow g_{AB}^{(5D)} = \left( \begin{array} {cccc|c} .&. & . & . & . \\.&. & . & . & . \\.&. & . & . & . \\.&. & . & . & . \\ \hline .&. & . & . & . \\ \end{array} \right)

The Kaluza-Klein metric is constructed this way. The 4-D electromagnetic vector potential A_\mu, which is a 4-vector, is concatenated to a displaced 4-D metric, along with a scalar field factor \phi. The metric is then:
g_{AB} = \begin{pmatrix} g_{\mu\nu} +\phi^2 A_\mu A_\nu & \phi^2 A_\mu \\ \phi^2 A_\nu & \phi^2  \end{pmatrix}
The inverse of this metric is:
g^{AB} = \begin{pmatrix} g^{\mu\nu} & - A^\mu \\ -A^\nu & g_{\sigma\rho} A^\sigma A^\rho + \frac{1}{\phi^2}  \end{pmatrix}

In this theory, electromagnetism arises from the spacetime curvature between 4-D spacetime and the fifth extra dimension. The 5-D line element can then be expressed in terms of the 4-D metric, the vector potential and the scalar field.
ds^2 = g_{AB} dx^A dx^B = g^{(4)}_{\mu\nu} dx^\mu dx^\nu + \phi^2 (A_\nu dx^\nu + dx^5)^2

The geodesic equation can then be evaluated in terms of 4-D quantities by finding each element of the 5-D Christoffel symbol \Upsilon in terms of the 4-D Christoffel symbol \Gamma. This allows us to view forces from this fifth dimension in more familiar 4-D quantities. Look at the 4-D subset index \sigma = \{1,2,3,4\} of the 5-D index C = \{1,2,3,4,5\}:
\Upsilon^\sigma_{AB} \dot{x}^A \dot{x}^B + \ddot{x}^\sigma = 0
This can also be done for the fifth index of C:
\Upsilon^5_{AB} \dot{x}^A \dot{x}^B + \ddot{x}^5 = 0

Break up the terms of the geodesic equation by dimensions:
\Upsilon^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^\sigma_{\mu 5} \dot{x}^\mu \dot{x}^5 + \Upsilon^\sigma_{5 \nu} \dot{x}^\nu \dot{x}^5 + \Upsilon^\sigma_{5 5} \dot{x}^5 \dot{x}^5 + \ddot{x}^\sigma = 0
\Upsilon^5_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^5_{\mu 5} \dot{x}^\mu \dot{x}^5 + \Upsilon^5_{5 \nu} \dot{x}^\nu \dot{x}^5 + \Upsilon^5_{5 5} \dot{x}^5 \dot{x}^5 + \ddot{x}^5 = 0
These equations give us the info we need to see how a fifth spatial dimension impacts physics. We want to evaluate the form of each Christoffel symbol factor \Upsilon in these equations.
It is assumed that no quantities in the metric are dependent on the 5th dimensional coordinate: \partial_5 f =0. This is known as the cylinder condition. We can use the definition of the Christoffel symbol to evaluate each element.
\Upsilon^C_{AB} =\frac{1}{2} g^{CD} (g_{DB,A} + g_{AD,B} - g_{AB,D})

Evaluating the Christoffel Symbols
Use the Kaluza-Klein metric elements to evaluate the Christoffel symbol factors. The first set of symbols are evaluated here:
\Upsilon^\sigma_{\mu\nu} = \frac{1}{2} g^{\sigma\rho} (g_{\rho \mu,\nu} + g_{\rho \nu,\mu} -g_{\mu\nu,\rho}) + \frac{1}{2} g^{\sigma 5} (g_{5 \mu,\nu} + g_{5 \nu,\mu} -g_{\mu\nu,5})
= \frac{1}{2} g^{\sigma\rho} (g_{\rho \mu,\nu} + g_{\rho \nu,\mu} -g_{\mu\nu,\rho}) + \frac{1}{2} g^{\sigma 5} (g_{5 \mu,\nu} + g_{5 \nu,\mu})
=\frac{1}{2} g^{\sigma\rho} (g^{(4)}{\rho \mu,\nu} + g^{(4)}{\rho \nu,\mu} -g^{(4)}{\mu\nu,\rho}) + \frac{\phi^2}{2} g^{\sigma\rho} (A\mu(A_{\rho,\nu} - A_{\nu,\rho})\\ + A_\nu (A_{\rho,\mu} - A_{\mu,\rho}) + A_\rho (A_{\mu,\nu} + A_{\nu,\mu})) -\frac{1}{2} A^\sigma (\phi^2 A_{\mu,\nu} + \phi^2 A_{\nu,\mu})
The first three terms here are the 4-D Christoffel symbol \Gamma!
= \Gamma_{\mu\nu}^\sigma + \frac{\phi^2}{2} g^{\sigma\rho} (A_\mu(A_{\rho,\nu} - A_{\nu,\rho}) + A_\nu (A_{\rho,\mu} - A_{\mu,\rho}))
=\Gamma^\sigma_{\mu\nu} + \frac{\phi^2}{2} g^{\sigma\rho} (A_\mu F_{\nu\rho} + A_\nu F_{\mu\rho})
=\Gamma^\sigma_{\mu\nu} + \frac{\phi^2}{2} (A_\mu {F_\nu}^{\sigma} + A_\nu {F_\mu}^{\sigma})
Note that the electromagnetic field strength tensor F_{\mu\nu} that appears here is related to the vector potential by:
F_{\mu\nu} = \partial_\mu A_\nu - \partial_\nu A_\mu = A_{\nu,\mu} - A_{\mu,\nu}

\Upsilon^\sigma_{\mu 5} = \frac{1}{2} g^{\sigma\rho} (g_{\rho \mu,5} + g_{\rho 5,\mu} - g_{\mu 5,\rho}) + \frac{1}{2} g^{\sigma 5} (g_{5 \mu,5} + g_{5 5,\mu} - g_{\mu 5,5})
= \frac{1}{2} g^{\sigma\rho} (g_{\rho 5,\mu} - g_{\mu 5,\rho})
=\frac{1}{2} g^{\sigma\rho} \phi^2 (A_{\rho,\mu} - A_{\mu,\rho})
= \frac{\phi^2}{2} g^{\sigma\rho} F_{\mu\rho}
= \frac{\phi^2}{2} {F_\mu}^{\sigma}

Use symmetry to evaluate this one:
\Upsilon^\sigma_{5\nu} = \frac{1}{2} g^{\sigma \rho} (g_{\rho 5,\nu} + g_{\rho \nu, 5} - g_{5 \nu,\rho}) + \frac{1}{2} g^{\sigma 5} (g_{5 5,\nu} + g_{5\nu, 5} - g_{5 \nu,5})
= \frac{\phi^2}{2} {F_\nu}^\sigma

This next one is easy due to the cylinder condition:
\Upsilon^\sigma_{55} = 0

This completes the first set. Onto the second set:
\Upsilon^5_{\mu\nu} = \frac{1}{2} g^{5\rho} (g_{\rho \mu,\nu} + g_{\rho \nu,\mu} -g_{\mu\nu,\rho}) + \frac{1}{2} g^{5 5} (g_{5 \mu,\nu} + g_{5 \nu,\mu} -g_{\mu\nu,5})
= \frac{1}{2} g^{5\rho} (g_{\rho \mu,\nu} + g_{\rho \nu,\mu} -g_{\mu\nu,\rho}) + \frac{1}{2} g^{5 5} (g_{5 \mu,\nu} + g_{5 \nu,\mu})
=-\frac{1}{2}A^\rho (g_{\rho \mu,\nu} + g_{\rho \nu,\mu} -g_{\mu\nu,\rho} +\phi^2 (A_\mu(A_{\rho,\nu} - A_{\nu,\rho}) + A_\nu (A_{\rho,\mu} - A_{\mu,\rho}) \\+ A_\rho (A_{\mu,\nu} + A_{\nu,\mu})) + \frac{1}{2} \left(g_{\alpha\beta} A^\alpha A^\beta + \frac{1}{\phi^2}\right) (\phi^2 A_{\mu,\nu} + \phi^2 A_{\nu,\mu})
=-\frac{1}{2} A^\rho (2\Gamma_{\mu\nu\rho} + \phi^2 (A_\mu F_{\nu\rho} + A_\nu F_{\mu\rho} + A_\rho (A_{\mu,\nu} + A_{\nu,\mu})) \\+ \frac{1}{2} (A_{\mu,\nu} + A_{\nu,\mu}) + \frac{\phi^2}{2} g_{\alpha\beta} A^\alpha A^\beta (A_{\mu,\nu} + A_{\nu,\mu})
=-A^\rho \Gamma_{\mu\nu\rho} - \frac{1}{2} \phi^2 A^\rho (A_\mu F_{\nu\rho} + A_\nu F_{\mu\rho}) -\phi^2 \frac{1}{2} A^\rho A_\rho (A_{\mu,\nu} + A_{\nu,\mu}) \\+ \frac{1}{2} (A_{\mu,\nu} + A_{\nu,\mu}) + \frac{\phi^2}{2} g_{\alpha\beta} A^\alpha A^\beta (A_{\mu,\nu} + A_{\nu,\mu})
=\frac{1}{2}(A_{\mu,\nu} + A_{\nu,\mu}) - A_\sigma \Gamma^\sigma_{\mu\nu} - \frac{\phi^2}{2} A^\rho (A_\mu F_{\nu\rho} + A_\nu F_{\mu\rho})
=\frac{1}{2} (A_{\mu;\nu} + A_{\nu;\mu}) - \frac{\phi^2}{2} A^\rho (A_\mu F_{\nu\rho} + A_\nu F_{\mu\rho})
Note the covariant derivative of the 4-vector potential appears: A_{\mu;\nu} = A_{\mu,\nu} - \Gamma_{\mu\nu}^\sigma A_\sigma

\Upsilon_{\mu 5}^5 = \frac{1}{2} g^{5\rho} (g_{\rho \mu,5} + g_{\rho 5,\mu} - g_{\mu 5,\rho}) + \frac{1}{2} g^{5 5} (g_{5 \mu,5} + g_{5 5,\mu} - g_{\mu 5,5})
=\frac{1}{2} g^{5\rho}(g_{\rho 5,\mu} - g_{\mu 5,\rho})
=-\frac{\phi^2}{2} A^\rho (A_{\rho,\mu} - A_{\mu,\rho})
= -\frac{\phi^2}{2} A^\rho F_{\mu\rho}

Use symmetry again:
\Upsilon^5_{5\nu} = \frac{1}{2} g^{5\rho} (g_{\rho 5,\nu} + g_{\rho \nu, 5} - g_{5\nu,\rho}) + \frac{1}{2} g^{5 5} (g_{5 5,\nu} + g_{5 \nu, 5} - g_{5\nu,5})
=-\frac{\phi^2}{2} A^\rho F_{\nu\rho}

Use the cylinder condition again:
\Upsilon^5_{5 5} = 0

The Lorentz Force Emerges
We can then plug in our findings into the geodesic equations. For the 4-D one we have:
\Upsilon^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^\sigma_{\mu 5} \dot{x}^\mu \dot{x}^5 + \Upsilon^\sigma_{5 \nu} \dot{x}^\nu \dot{x}^5 + \Upsilon^\sigma_{5 5} \dot{x}^5 \dot{x}^5 + \ddot{x}^\sigma = 0
(\Gamma^\sigma_{\mu\nu} + \frac{\phi^2}{2} (A_\mu {F_\nu}^{\sigma} + A_\nu {F_\mu}^{\sigma})) \dot{x}^\mu \dot{x}^\nu + \frac{\phi^2}{2} {F_\mu}^{\sigma} \dot{x}^\mu \dot{x}^5 + \frac{\phi^2}{2} {F_\nu}^\sigma \dot{x}^\nu \dot{x}^5 + \ddot{x}^\sigma = 0
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 A_\nu {F_\mu}^\sigma \dot{x}^\mu \dot{x}^\nu + \phi^2 {F_\mu}^\sigma \dot{x}^\mu \dot{x}^5 + \ddot{x}^\sigma = 0
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 {F_{\mu}}^\sigma \dot{x}^\mu (A_\nu \dot{x}^\nu + \dot{x}^5) + \ddot{x}^\sigma = 0 (equation 1)

For the 5th dimension one we have:
\Upsilon^5_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^5_{\mu 5} \dot{x}^\mu \dot{x}^5 + \Upsilon^5_{5 \nu} \dot{x}^\nu \dot{x}^5 + \Upsilon^5_{5 5} \dot{x}^5 \dot{x}^5 + \ddot{x}^5 = 0
\left(\frac{1}{2} (A_{\mu;\nu} + A_{\nu;\mu}) - \frac{\phi^2}{2} A^\rho (A_\mu F_{\nu\rho} + A_\nu F_{\mu\rho})\right) \dot{x}^\mu \dot{x}^\nu -\frac{\phi^2}{2} A^\rho F_{\mu\rho} \dot{x}^\mu \dot{x}^5 -\frac{\phi^2}{2} A^\rho F_{\nu\rho} \dot{x}^\nu \dot{x}^5 + \ddot{x}^5 = 0
A_{\mu;\nu} \dot{x}^\mu \dot{x}^\nu - \phi^2 A^\rho A_\nu F_{\mu\rho} \dot{x}^\mu\dot{x}^\nu - \phi^2 A^\rho F_{\mu\rho} \dot{x}^\mu \dot{x}^5 + \ddot{x}^5 = 0
A_{\mu;\nu} \dot{x}^\mu \dot{x}^\nu - \phi^2 A^\rho F_{\mu\rho} \dot{x}^\mu (A_\nu \dot{x}^\nu + \dot{x}^5) + \ddot{x}^5 = 0
- A_\sigma \Gamma_{\mu\nu}^\sigma \dot{x}^\mu \dot{x}^\nu +A_{\mu,\nu} \dot{x}^\mu \dot{x}^\nu - \phi^2 A^\rho F_{\mu\rho} \dot{x}^\mu (A_\nu \dot{x}^\nu + \dot{x}^5) + \ddot{x}^5 = 0 (equation 2)

We then have a system of two equations! Equation 1 is multiplied by the vector potential to get:
A_\sigma \Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 A_\sigma {F_{\mu}}^\sigma \dot{x}^\mu (A_\nu \dot{x}^\nu + \dot{x}^5) + A_\sigma \ddot{x}^\sigma = 0
Summing this with equation 2 gives us:
A_{\mu,\nu} \dot{x}^\mu \dot{x}^\nu + A_\mu \ddot{x}^\mu + \ddot{x}^5 = 0
This can be expressed as a derivative with respect to proper time:
\frac{d}{ds} (A_\mu \dot{x}^\mu + \dot{x}^5) = 0
Which implies the argument is constant, since the derivative is equal to zero:
Q = (A_\mu \dot{x}^\mu + \dot{x}^5)

We can then uncover the nature of the scalar field by looking at the line-element again:
ds^2 = {g}_{\mu\nu} dx^\mu dx^\nu + \phi^2 (A_\nu dx^\nu + dx^5)^2
Note the first term is the line-element, or proper time, in 4-D spacetime, and the second term looks like Q from above:
ds^2 =d\tau^2 + \phi^2 (A_\nu \dot{x}^\nu + \dot{x}^5)^2 ds^2 = d\tau^2 + \phi^2 Q^2 ds^2
This allows us to relate the line-elements or proper times from 5-D and 4-D spacetimes:
ds^2 (1-\phi^2 Q^2) = d\tau^2
\frac{d\tau}{ds} = \sqrt{1-\phi^2 Q^2}
For this derivative to be a real number, an upper bound is placed on the scalar field \phi.
\phi^2 Q^2 < 1

The geodesic equation 1 terms are differentiated with respect to the 5-D proper time s:
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 Q {F_{\mu}}^\sigma \dot{x}^\mu + \ddot{x}^\sigma = 0
To see how this would appear in 4-D spacetime, translate them into 4-D proper time \tau derivatives using the chain rule:
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu \frac{d\tau^2}{ds^2} + \phi^2 Q {F_\mu}^\sigma \dot{x}^\mu \frac{d\tau}{ds} + \ddot{x}^\sigma \frac{d\tau^2}{ds^2} = 0
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 Q {F_\mu}^\sigma \dot{x}^\mu \frac{ds}{d\tau} + \ddot{x}^\sigma = 0
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \frac{\phi^2 Q}{\sqrt{1-\phi^2 Q^2}} {F_\mu}^\sigma \dot{x}^\mu + \ddot{x}^\sigma = 0
This geodesic looks like one subject to a gravitational field and an electromagnetic field, a la the Lorentz force equation:
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \frac{q}{m} {F_\mu}^\sigma \dot{x}^\mu + \ddot{x}^\sigma = 0
The ratio of charge to mass is then:
\frac{q}{m} = \frac{\phi^2 Q}{\sqrt{1-\phi^2 Q^2}}

You can further evaluate the Ricci tensor and Ricci scalar and get Maxwell’s equations. This unifies gravity with electromagnetism. This is a remarkable result from just adding one dimension to spacetime. There are some problems though. This theory does not predict the correct masses of particles, which means there’s something wrong with it. However, it is still compelling. The dive into this idea is what gave rise to String theory.

Nuclear Forces
A theory that unifies gravity and electromagnetism should also unify with the nuclear forces. The weak nuclear force has already been unified with electromagnetism into what is known as the electroweak force. The strong nuclear force and electroweak force have fields with non-abelian (non-commutative) properies.
We saw with electromagnetism that the field tensor is:
F_{\mu\nu} = \partial_\mu A_\nu - \partial_\nu A_\mu
For non-abelian fields, the field tensor is:
\mathcal{F}^a_{\mu\nu} = F^a_{\mu\nu}+ f^{a}_{bc}  A_\mu^b A_\nu^c = \partial_\mu A^a_\nu - \partial_\nu A^a_\mu + f^{a}_{bc}  A_\mu^b A_\nu^c
Antisymmetry of the tensor is preserved by the structure constants f^{a}_{bc} being antisymmetric. The indices abc span the different fields that compose a force. For electroweak this maps to SU(2) group elements, and for the strong force this maps to SU(3) group elements.

To see if we can get these forces from spacetime curvature too, start with an arbitrary number of spacetime dimensions. Here we will treat the Greek indices as 4-D spacetime, and the Latin indices as higher/extra dimensions.
We can then extend the Kaluza-Klein metric g_{AB} \rightarrow \gamma_{AB}:
\gamma_{AB} = \begin{pmatrix} g_{\mu\nu} +\phi^2 h^{kl} A_{\mu k} A_{\nu l} & \phi^2 A_{\mu i} \\ \phi^2 A_{\nu j} & \phi^2 h_{ij}  \end{pmatrix}
Where there are now a variety of 4-vector potentials A_{\mu}^i, where \mu is the index in 4-D spacetime, and i is the index for extra dimensions. This implies that the different fields in the electroweak and strong force are different because they each occupy a different extra dimension.
The extra dimension metric h_{ij} describes the curvature between the extra dimensions.
The inverse of this metric is:
\gamma^{AC} = \begin{pmatrix} g^{\nu\beta} & - A^{\nu n} \\ -A^{\beta i} & g_{\sigma\rho} A^{\sigma n} A^{\rho i} + \frac{h^{n i}}{\phi^2}  \end{pmatrix}

Like the cylinder condition from 5-D, this N-D metric has some assertions. The 4-D metric elements are not dependent on the extra dimensions:
\gamma_{\mu\nu,i} = 0
The extra dimension elements are not dependent on the extra dimensions:
\gamma_{ij,k} = 0
The extra dimension element are not dependent on 4-D spacetime:
\gamma_{ij,\mu} = 0
Keep in mind, the off-diagonal elements are not restricted this way. So we can have:
\gamma_{\mu j, D} \neq 0
The way the off-diagonal elements are handled here will be to facilitate the aforementioned non-abelian behavior.
\gamma_{\mu k, i} = \phi^2 A_{\mu k, i}

Here we can impose a gauge choice involving the structure constant by setting the associated covariant derivative equal zero: A_{\mu k;i} = 0
A_{\mu k,i} = b f^j_{ik} A_{\mu j}
An off-diagonal field strength tensor G_{ik\mu} will become handy later, and can be evaluated in terms of the structure constant:
G_{ik\mu} = A_{\mu k,i} -A_{\mu i,k} = b f^j_{ik} A_{\mu j} - b f^j_{ki} A_{\mu j} = 2 b f^j_{ik} A_{\mu j}
We also assert that the extra dimension metric is related to the structure constant by:
h^{mj} = \epsilon f^m_{ab} f^{jab}
The extra dimension metric can be used to raise and lower indices too:
h^{mj} = \epsilon h^{mn} f_{nab} f^{jab} = \epsilon h^{mn} (D-1)! \delta^j_n = \epsilon (D-1)! h^{mj}
Therefore: \epsilon = \frac{1}{(D-1)!}
Where D is the number of extra dimensions. Therefore we have:
h^{mj} = \frac{1}{(D-1)!} f^m_{ab} f^{jab}

The Geodesic
The form of the line-element can then be worked out:
ds^2 =  \begin{pmatrix} dx^\mu & dx^j \end{pmatrix} \begin{pmatrix} g_{\mu\nu} +\phi^2 h^{kl} A_{\mu k} A_{\nu l} & \phi^2 A_{\mu i} \\ \phi^2 A_{\nu j} & \phi^2 h_{ij}  \end{pmatrix} \begin{pmatrix} dx^\nu \\ dx^i \end{pmatrix}
=g_{\mu\nu} (dx^\nu) (dx^\mu) + \phi^2 h^{kl} A_{\mu k} A_{\nu l} (d x^\nu ) (dx^\mu) \\+ \phi^2 A_{\mu i} (dx^i)(dx^\mu) + \phi^2 A_{\nu j} (dx^\nu) (dx^j) + \phi^2 h_{i j} (dx^i) (dx^j)
= g_{\mu\nu} dx^\nu dx^\mu + \phi^2 ( h^{kl} A_{\mu k} A_{\nu l} (dx^\nu) (dx^\mu) + 2A_{\mu i} (dx^i) (dx^\mu) + h_{ij} (dx^i)(dx^j))
=g_{\mu\nu} dx^\nu dx^\mu + \phi^2 (h_{ij} A_{\mu}^j A_{\nu}^i dx^\nu dx^\mu + 2 h_{ij} A_{\mu}^j dx^i dx^\mu + h_{ij}dx^i dx^j)
We then have:
ds^2=g_{\mu\nu} dx^\nu dx^\mu + \phi^2 h_{ij} (A_\mu^j dx^\mu + dx^j) (A_\nu^i dx^\nu + dx^i)

The geodesic equation can be broken up the same way as before.
\Upsilon^C_{AB} \dot{x}^A \dot{x}^B + \ddot{x}^C = 0
For the 4-D terms we have:
\Upsilon^\sigma_{AB} \dot{x}^A \dot{x}^B + \ddot{x}^\sigma = 0
\Upsilon^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^\sigma_{\mu i} \dot{x}^\mu \dot{x}^i + \Upsilon^\sigma_{j \nu} \dot{x}^j \dot{x}^\nu + \Upsilon^\sigma_{ij} \dot{x}^i \dot{x}^j + \ddot{x}^\sigma = 0
And for the extra dimension terms we have:
\Upsilon^k_{AB} \dot{x}^A \dot{x}^B + \ddot{x}^k = 0
\Upsilon^k_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^k_{\mu i} \dot{x}^\mu \dot{x}^i + \Upsilon^k_{j \nu} \dot{x}^j \dot{x}^\nu + \Upsilon^k_{ij} \dot{x}^i \dot{x}^j + \ddot{x}^k = 0

Evaluate the Christoffel symbols from the 4-D terms:
\Upsilon^\sigma_{\mu\nu} = \frac{1}{2} \gamma^{\sigma D} (\gamma_{\mu D,\nu} + \gamma_{\nu D,\mu} - \gamma_{\mu\nu,D})
= \frac{1}{2} \gamma^{\sigma \rho} (\gamma_{\mu \rho,\nu} + \gamma_{\nu \rho,\mu} - \gamma_{\mu\nu,\rho}) + \frac{1}{2} \gamma^{\sigma i} (\gamma_{\mu i,\nu} + \gamma_{\nu i,\mu} - \gamma_{\mu\nu,i})
=\frac{1}{2} g^{\sigma \rho} (\gamma_{\mu \rho,\nu} + \gamma_{\nu \rho,\mu} - \gamma_{\mu\nu,\rho}) - \frac{1}{2} A^{\sigma i} (\gamma_{\mu i,\nu} + \gamma_{\nu i,\mu})
=\frac{1}{2} g^{\sigma \rho} (g_{\mu\rho,\nu} + g_{\nu \rho,\mu} - g_{\mu\nu,\rho} + \phi^2 h^{kl} (A_{\mu k ,\nu} A_{\rho l} + A_{\mu k} A_{\rho l ,\nu} + A_{\nu k, \mu} A_{\rho l} + A_{\nu k} A_{\rho l, \mu}\\ - A_{\mu k,\rho} A_{\nu l} -A_{\mu k} A_{\nu l ,\rho} )) - \frac{\phi^2}{2} A^{\sigma i} (A_{\mu i, \nu} + A_{\nu i,\mu})
=\Gamma_{\mu\nu}^\sigma + \frac{\phi^2}{2} g^{\sigma\rho} h^{kl} (A_{\rho l} (A_{\mu k,\nu} + A_{\nu k,\mu}) + A_{\mu k} (A_{\rho l, \nu} - A_{\nu l,\rho}) + A_{\nu l} (A_{\rho k, \mu} \\- A_{\mu k, \rho})) - \frac{\phi^2}{2} g^{\sigma\rho} h^{kl} A_{\rho l} (A_{\mu k, \nu} + A_{\mu k,\nu})
=\Gamma^\sigma_{\mu\nu} + \frac{\phi^2}{2} g^{\sigma\rho} h^{kl} (A_{\mu k} F_{\nu\rho l} + A_{\nu l} F_{\mu \rho k})

\Upsilon^\sigma_{\mu i} = \frac{1}{2} \gamma^{\sigma D} (\gamma_{\mu D,i} + \gamma_{i D,\mu} - \gamma_{\mu i, D})
=\frac{1}{2} \gamma^{\sigma \rho} (\gamma_{\mu \rho,i} + \gamma_{i \rho,\mu} - \gamma_{\mu i, \rho}) + \frac{1}{2} \gamma^{\sigma k} (\gamma_{\mu k,i} + \gamma_{i k,\mu} - \gamma_{\mu i, k})
=\frac{1}{2} g^{\sigma\rho} (\gamma_{i\rho,\mu} - \gamma_{\mu i,\rho}) - \frac{1}{2} A^{\sigma k} (\gamma_{\mu k,i} + \gamma_{i k,\mu} - \gamma_{\mu i, k})
=\frac{\phi^2}{2} g^{\sigma \rho} ( A_{\rho i,\mu} - A_{\mu i, \rho} ) - \frac{\phi^2}{2} A^{\sigma k} (A_{\mu k, i} + h_{ik, \mu} - A_{\mu i,k})
=\frac{\phi^2}{2} g^{\sigma\rho} F_{\mu\rho i} - \frac{\phi^2}{2} A^{\sigma k} h_{ik, \mu} - \frac{\phi^2}{2} A^{\sigma k} G_{ik\mu}
=\frac{\phi^2}{2} g^{\sigma\rho} F_{\mu\rho i} - \frac{\phi^2}{2} A^{\sigma k} G_{ik\mu}
=\frac{\phi^2}{2} g^{\sigma\rho} F_{\mu\rho i} - \frac{\phi^2}{2} A^{\sigma k} G_{ik\mu}
=\frac{\phi^2}{2} g^{\sigma\rho} F_{\mu\rho i} - \phi^2 b f^j_{ik} A^{\sigma k} A_{\mu j}

\Upsilon^\sigma_{ij} = \frac{1}{2} \gamma^{\sigma D} (\gamma_{iD,j} + \gamma_{j D,i} -\gamma_{ij,D})
=\frac{1}{2} \gamma^{\sigma \rho} (\gamma_{i\rho,j} + \gamma_{j \rho,i} -\gamma_{ij,\rho}) +\frac{1}{2} \gamma^{\sigma k} (\gamma_{ik,j} + \gamma_{j k,i} -\gamma_{ij,k})
=\frac{\phi^2}{2} g^{\sigma\rho} (A_{\rho i,j} + A_{\rho j,i} - h_{ij,\rho}) - \frac{\phi^2}{2} A^{\sigma k} (h_{ik,j} + h_{jk,i} - h_{ij,k})
=\frac{\phi^2}{2} g^{\sigma\rho} (A_{\rho i,j} + A_{\rho j,i})
=\frac{\phi^2}{2} g^{\sigma \rho} b (f_{ij}^k A_{\rho k} + f_{ji}^k A_{\rho k}) = 0

Plug these into the 4-D geodesic equation:
\Upsilon^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^\sigma_{\mu i} \dot{x}^\mu \dot{x}^i + \Upsilon^\sigma_{j \nu} \dot{x}^j \dot{x}^\nu + \Upsilon^\sigma_{ij} \dot{x}^i \dot{x}^j + \ddot{x}^\sigma = 0
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \frac{\phi^2}{2} g^{\sigma\rho} h^{kl} (A_{\mu k} F_{\nu\rho l} + A_{\nu l} F_{\mu \rho k}) \dot{x}^\mu \dot{x}^\nu + (\phi^2 g^{\sigma\rho} F_{\mu\rho i} - 2\phi^2 b f^j_{ik} A^{\sigma k} A_{\mu j}) \dot{x}^\mu \dot{x}^i + \ddot{x}^\sigma = 0
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 g^{\sigma \rho} A_{\nu}^k F_{\mu \rho k} \dot{x}^\mu \dot{x}^\nu + \phi^2 g^{\sigma\rho} (F_{\mu\rho k} - 2 b f^j_{ki} A_\rho^i A_{\mu j}) \dot{x}^\mu \dot{x}^k + \ddot{x}^\sigma = 0
\Gamma^\sigma_{\mu\nu} + \phi^2 g^{\sigma\rho} \dot{x}^\mu (F_{\mu\rho k} + 2 b f_{ik}^j A_\rho^i A_{\mu j}) (A_\nu^k \dot{x}^\nu + \dot{x}^k) + \ddot{x}^\sigma = 0 (equation 3)
Note that a symmetric and antisymmetric tensor being multiplied results in zero:
(f_{ij}^k A_\mu^i A_\rho^j) (g^{\sigma\rho} A_{\nu k} \dot{x}^\mu \dot{x}^\nu)=f^{ijk} A_{\nu k} A_{\mu i} A_{\rho j} \dot{x}^\mu \dot{x}^\nu g^{\sigma \rho} = 0

Evaluate the Christoffel symbols for the extra dimensions:
\Upsilon^k_{\mu\nu} = \frac{1}{2} \gamma^{k D} (\gamma_{\mu D,\nu} + \gamma_{\nu D,\mu}-\gamma_{\mu\nu, D})
=\frac{1}{2} \gamma^{k \rho} (\gamma_{\mu \rho,\nu} + \gamma_{\nu \rho,\mu}-\gamma_{\mu\nu, \rho}) + \frac{1}{2} \gamma^{k i} (\gamma_{\mu i,\nu} + \gamma_{\nu i,\mu}-\gamma_{\mu\nu, i})
=-\frac{1}{2} A^{\rho k} (\gamma_{\mu \rho,\nu} + \gamma_{\nu \rho,\mu}-\gamma_{\mu\nu, \rho}) + \frac{1}{2} (g_{\alpha\beta} A^{\alpha k} A^{\beta i} + h^{k i}/\phi^2) (\phi^2 A_{\mu i,\nu} + \phi^2 A_{\nu i,\mu})
=-\frac{1}{2} A^{\rho k} (g_{\mu \rho,\nu} + g_{\nu \rho,\mu} - g_{\mu\nu,\rho}) - \frac{\phi^2}{2} A^{\rho k} h^{ij} (A_{\mu i,\nu} A_{\rho j} + A_{\mu i} A_{\rho j,\nu} \\+ A_{\nu i,\mu} A_{\rho j} + A_{\nu i} A_{\rho j,\mu} - A_{\mu i,\rho} A_{\nu j} - A_{\mu i} A_{\nu j,\rho}) \\+ \frac{\phi^2}{2} (g_{\alpha \beta} A^{\alpha k} A^{\beta i} + h^{ki}/\phi^2 )(A_{\mu i,\nu} + A_{\nu i,\mu})
=-\frac{1}{2} A_\sigma^k g^{\sigma \rho} (g_{\mu\rho,\nu} + g_{\nu\rho,\mu} - g_{\mu\nu,\rho}) - \frac{\phi^2}{2} A^{\rho k} h^{ij} ( A_{\rho j} (A_{\mu i,\nu} + A_{\nu i, \mu}) \\+ A_{\mu i} (A_{\rho j,\nu} - A_{\nu j,\rho}) + A_{\nu j} (A_{\rho i,\mu} - A_{\mu i,\rho})) \\+ \frac{\phi^2}{2} g_{\alpha\beta} A^{\alpha k} A^{\beta i} (A_{\mu i,\nu} + A_{\nu i,\mu}) + \frac{1}{2} h^{ki} (A_{\mu i,\nu} + A_{\nu i,\mu})
=-A^k_\sigma \Gamma^\sigma_{\mu\nu} - \frac{\phi^2}{2} A^{\rho k} h^{ij} (A_{\rho j} (A_{\mu i, \nu} + A_{\nu i,\mu}) + A_{\mu i} F_{\nu \rho j} + A_{\nu j} F_{\mu \rho i}) \\+ \frac{\phi^2}{2} g_{\alpha\beta} A^{\alpha k} A^{\beta i} (A_{\mu i,\nu} + A_{\nu i,\mu}) + \frac{1}{2} h^{ki} (A_{\mu i,\nu} + A_{\nu i,\mu})
=-A^k_\sigma \Gamma^\sigma_{\mu\nu} + \frac{1}{2} (A_{\mu,\nu}^k + A_{\nu,\mu}^k) - \frac{\phi^2}{2} A^{\rho k} h^{ij} (A_{\mu i} F_{\nu\rho j} + A_{\nu j} F_{\mu \rho i})\\ - \frac{\phi^2}{2} A^{\rho k} h^{ij} A_{\rho j} (A_{\mu i,\nu} + A_{\nu i,\mu}) + \frac{\phi^2}{2} g_{\alpha\beta} A^{\alpha k} A^{\beta i} (A_{\mu i,\nu} + A_{\nu i,\mu})
=\frac{1}{2} (A^k_{\mu;\nu} + A^k_{\nu;\mu}) - \frac{\phi^2}{2} A^{\rho k} (A_\mu^j F_{\nu \rho j} + A_\nu^i F_{\mu\rho i}) \\ + \frac{\phi^2}{2} (A_{\mu i,\nu} + A_{\nu i,\mu}) (g_{\alpha\beta} A^{\alpha k} A^{\beta i} - A^{\rho k} A_{\rho j} h^{ij})
=\frac{1}{2} (A^k_{\mu;\nu} + A^k_{\nu;\mu}) - \frac{\phi^2}{2} A^{\rho k} (A_\mu^j F_{\nu \rho j} + A_\nu^i F_{\mu\rho i})

\Upsilon^k_{\mu i} = \frac{1}{2} \gamma^{k D} ( \gamma_{\mu D, i} + \gamma_{i D,\mu} -\gamma_{\mu i,D})
=\frac{1}{2} \gamma^{k \rho} ( \gamma_{\mu \rho, i} + \gamma_{i \rho,\mu} -\gamma_{\mu i,\rho}) + \frac{1}{2} \gamma^{k j} ( \gamma_{\mu j, i} + \gamma_{i j,\mu} -\gamma_{\mu i,j})
=-\frac{1}{2} A^{\rho k} (\gamma_{i \rho,\mu} - \gamma_{\mu i,\rho}) + \frac{1}{2} (g_{\alpha\beta} A^{\alpha k} A^{\beta j} + h^{kj}/\phi^2) \phi^2 (h_{ij,\mu} + A_{\mu j,i} - A_{\mu i,j})
=-\frac{1}{2} A^{\rho k} (\gamma_{i \rho,\mu} - \gamma_{\mu i,\rho}) + \frac{1}{2} (g_{\alpha\beta} A^{\alpha k} A^{\beta j} + h^{kj}/\phi^2) \phi^2 (G_{i j \mu})
=-\frac{\phi^2}{2} A^{\rho k} (A_{\rho i,\mu} - A_{\mu i,\rho}) + \frac{\phi^2}{2} g_{\alpha\beta} A^{\alpha k} A^{\beta j} G_{ij\mu}+ \frac{1}{2} h^{kj} G_{ij \mu}
=-\frac{\phi^2}{2} A^{\rho k} F_{\mu\rho i} + \frac{\phi^2}{2} A^k_\beta A^{\beta j} G_{ij\mu} + \frac{1}{2} h^{kj} G_{ij\mu}
=-\frac{\phi^2}{2} A^{\rho k} F_{\mu\rho i} + \phi^2 A^k_\beta A^{\beta j} b f^l_{ij} A_{\mu l} + h^{kj} b f^l_{ij} A_{\mu l}

\Upsilon^k_{ij} = \frac{1}{2} \gamma^{k D} (\gamma_{iD,j} + \gamma_{jD,i} -\gamma_{ij,D})
= \frac{1}{2} \gamma^{k \rho} (\gamma_{i\rho,j} + \gamma_{j\rho,i} -\gamma_{ij,\rho}) + \frac{1}{2} \gamma^{k l} (\gamma_{il,j} + \gamma_{jl,i} -\gamma_{ij,l})
= -\frac{\phi^2}{2} A^{\rho k} (A_{\rho i,j} + A_{\rho j,i}) = 0

Put these together in the extra dimension geodesic:
\Upsilon^k_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \Upsilon^k_{\mu i} \dot{x}^\mu \dot{x}^i + \Upsilon^k_{j \nu} \dot{x}^j \dot{x}^\nu + \Upsilon^k_{ij} \dot{x}^i \dot{x}^j + \ddot{x}^k = 0

(A^k_{\mu,\nu} - \Gamma_{\mu\nu}^\sigma A_\sigma^k - \phi^2 A^{\rho k} A^j_\mu F_{\nu\rho j}) \dot{x}^\mu \dot{x}^\nu \\+ (-\phi^2 A^{\rho k} F_{\mu\rho i} + 2\phi^2 A^k_\beta A^{\beta j} b f^l_{ij} A_{\mu l} + 2 h^{kj} b f^l_{ij} A_{\mu l}) \dot{x}^\mu \dot{x}^i + \ddot{x}^k = 0 (equation 4)

Sum equation 3 and 4 the same way we summed equations 1 and 2 for the 5-D case. Scale equation 3 by the vector potential:
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 g^{\sigma \rho} A_{\nu}^k F_{\mu \rho k} \dot{x}^\mu \dot{x}^\nu + \phi^2 g^{\sigma\rho} (F_{\mu\rho k} - 2 b f^j_{ki} A_\rho^i A_{\mu j}) \dot{x}^\mu \dot{x}^k + \ddot{x}^\sigma = 0 (equation 3)

A^m_\sigma \Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \phi^2 g^{\sigma \rho} A^m_\sigma A_{\nu}^k F_{\mu \rho k} \dot{x}^\mu \dot{x}^\nu \\+ \phi^2 g^{\sigma\rho} A^m_\sigma (F_{\mu\rho k} - 2 b f^j_{ki} A_\rho^i A_{\mu j}) \dot{x}^\mu \dot{x}^k + A^m_\sigma \ddot{x}^\sigma = 0 (equation 3, scaled)

(A^m_{\mu,\nu} - \Gamma_{\mu\nu}^\sigma A_\sigma^m - \phi^2 A^{\rho m} A^j_\mu F_{\nu\rho j}) \dot{x}^\mu \dot{x}^\nu \\+ (-\phi^2 A^{\rho m} F_{\mu\rho k} + 2\phi^2 A^m_\beta A^{\beta j} bf^l_{kj} A_{\mu l} + 2 h^{mj} b f^l_{kj} A_{\mu l}) \dot{x}^\mu \dot{x}^k + \ddot{x}^m = 0 (equation 4)

The sum is then:
A^m_{\mu,\nu} \dot{x}^\mu \dot{x}^\nu + 2h^{mj} b f_{kj}^l A_{\mu l} \dot{x}^\mu \dot{x}^k + A^m_\sigma \ddot{x}^\sigma + \ddot{x}^m =0
Note the 4-D Christoffel symbols cancel out.

Unification
Suppose b=1/2 and we get:
A^m_{\mu,\nu} \dot{x}^\mu \dot{x}^\nu + h^{mj} f_{kj}^l A_{\mu l} \dot{x}^\mu \dot{x}^k + A^m_\sigma \ddot{x}^\sigma + \ddot{x}^m =0
A^m_{\mu,\nu} \dot{x}^\mu \dot{x}^\nu + A_{\mu,k}^m \dot{x}^\mu \dot{x}^k + A^m_\mu \ddot{x}^\mu + \ddot{x}^m =0
This is a derivative with respect to proper time being equal to zero. We saw this before.
\frac{d}{ds} (A^m_\mu \dot{x}^\mu + \dot{x}^m) = 0
We then get a set of constants for each extra dimension.
A_\mu^m \dot{x}^\mu + \dot{x}^m = Q^m
equation 3 can then be written in terms of these constants:
\Gamma^\sigma_{\mu\nu} + \phi^2 g^{\sigma\rho} \dot{x}^\mu (F_{\mu\rho k} + f_{ik}^j A_\rho^i A_{\mu j}) Q^k + \ddot{x}^\sigma = 0

From the line element, we can take advantage of the chain rule to put the equation in terms of 4-D proper time \tau:
ds^2=g_{\mu\nu} dx^\nu dx^\mu + \phi^2 h_{ij} (A_\mu^j dx^\mu + dx^j) (A_\nu^i dx^\nu + dx^i)
ds^2 = d\tau^2 + \phi^2 h_{ij} (A^j_\mu \dot{x}^\mu + \dot{x}^j) (A^i_\nu \dot{x}^\nu + \dot{x}^i) ds^2
ds^2 = d\tau^2 + \phi^2 h_{ij} Q^j Q^i ds^2
ds^2 (1-\phi^2 h_{ij} Q^i Q^j) = d\tau^2
\frac{ds}{d\tau} = \frac{1}{\sqrt{1- \phi^2 h_{ij} Q^i Q^j}}

This gives us the Lorentz Force equation for charges in a non-abelian field!
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \frac{\phi^2 Q^l}{\sqrt{1- \phi^2 h_{ij} Q^i Q^j}} ({F_{\mu l}}^\sigma + f_{jkl} A^k_\mu A^{\sigma j})\dot{x}^\mu + \ddot{x}^\sigma = 0
\Gamma^\sigma_{\mu\nu} \dot{x}^\mu \dot{x}^\nu + \frac{\phi^2 Q^l}{\sqrt{1- \phi^2 h_{ij} Q^i Q^j}} {\mathcal{F}_{\mu l}}^\sigma\dot{x}^\mu + \ddot{x}^\sigma = 0
Where the charge mass ratio for a particle is given by:
\frac{q^l}{m} = \frac{\phi^2 Q^l}{\sqrt{1- \phi^2 h_{ij} Q^i Q^j}}
These multiple charges could be electric, weak nuclear flavors, and color charge. They are all interrelated by how the hidden extra dimensions are shaped in the metric h_{ij}.

A theory like this could accurately describe how all of the fundamental forces of nature may actually be the product of the same underlying phenomenon. This is the hunt for a Grand Unified Theory or a Theory of Everything. This has yet to be done successfully in physics.

Happy Holidays!
Matt

All the forces emerging from spacetime curvature, but as a Christmas tree.

Light and Matter

In my previous blog entry, I discuss Snell’s law and the refraction of light within matter. The refractive index of a substance is a metric of how much light will bend upon entering or exiting that substance. So what is the refractive index in terms of other physical variables? To address this question, we can start with Maxwell’s equations, which describe electromagnetism.

Maxwell’s equations:
\nabla \cdot \vec{E} = \frac{\rho}{\epsilon}
\nabla \cdot \vec{B} = 0
\nabla \times \vec{E} = - \frac{\partial \vec{B}}{\partial t}
\nabla \times \vec{B} = \mu \left(\vec{J} + \epsilon \frac{\partial \vec{E}}{\partial t} \right)

Charge density is \rho, current density is \vec{J}, the electric field strength is \vec{E} and the magnetic field strength is \vec{B}.
Besides the charge and current densities, the other places matter enters into the equations here is in dielectric permittivity \epsilon and magnetic permeability \mu.
Permittivity and permeability are related to the impedance and capacitance of substances. For instance the wave impedence of a substance is: Z = \sqrt{\frac{\mu}{\epsilon}}
Permittivity is a metric of how polarizable a substance is. This relates to electric dipoles within a substance. The related electric displacement field is defined by: \vec{D}= \epsilon \vec{E}.
Permeability is a metric of how much influence an external field has on substance. This relates to the magnetic dipoles within a substance. The related Auxiliary Magnetic Field is defined as: \vec{H} = \frac{1}{\mu} \vec{B}.
Keep in mind that the vacuum has permittivity and permeability as well.

If no charge or current:
One remarkable result from Maxwell’s equations is that even if there are no charges or currents, \rho = 0, \, \vec{J} = 0, you can still have non-zero values for electric and magnetic field strength!
The Maxwell’s equations without sources look like:
\nabla \cdot \vec{E} = 0
\nabla \cdot \vec{B} = 0
\nabla \times \vec{E} = - \frac{\partial \vec{B}}{\partial t}
\nabla \times \vec{B} = \mu \epsilon \frac{\partial \vec{E}}{\partial t}

From here, we can solve for field strength \vec{E},\,\vec{B}.
\nabla \times \nabla \times \vec{E} = - \mu\epsilon \frac{\partial^2}{\partial t^2} \vec{E}
\nabla (\nabla \cdot \vec{E}) - \nabla^2 \vec{E} = - \mu \epsilon\frac{\partial^2}{\partial t^2} \vec{E}
We then get partial differential equations for both fields.
\nabla^2 \vec{E} = \mu \epsilon \frac{\partial^2}{\partial t^2} \vec{E}
\nabla^2 \vec{B} = \mu \epsilon \frac{\partial^2}{\partial t^2} \vec{B}
These are wave equations! They are identical for both the electric and magnetic field, so we only need to solve it once.
These wave equations describe light! These are waves of electic and magnetic fields, hence why they are called electromagnetic waves. Note the speed of these waves will be related to the permittivity and permeability of the medium.
v = \frac{1}{\sqrt{\mu\epsilon}}

Solutions to the Wave Equations
Solve for the fields’ vector components, which themselves are dependent on space and time coordinates. For example, the x-component of the electric field is dependent on time and space. Assume dependence on spatial coordinates and time are seperable.
E_x = E_x(x,y,z,t) = X(x) Y(y) Z(z) T(t)
Plug this into the partial differential equations to get an equation, where each term depends on one dimension of time or space.
YZT \frac{\partial^2}{\partial x^2} X + XZT \frac{\partial^2}{\partial y^2} Y + XYT \frac{\partial^2}{\partial z^2} Z = \mu \epsilon XYZ \frac{\partial^2}{\partial t^2} T
\frac{1}{X} \frac{\partial^2}{\partial x^2} X + \frac{1}{Y} \frac{\partial^2}{\partial y^2} Y + \frac{1}{Z} \frac{\partial^2}{\partial z^2} Z = \frac{\mu \epsilon}{T} \frac{\partial^2}{\partial t^2} T
We then assert harmonic time dependence with frequency \omega:
T = T_0 e^{-i\omega t}
And assert spatial symmetry, so the left hand side must have negative eigenvalues:
-k_x^2 -k_y^2 -k_z^2 = -\mu \epsilon \omega^2
These eigenvalues are the squared components of the wave number \vec{k} in terms of frequency \omega and permittivity and permeability.
k^2 = \mu\epsilon \omega^2
k = \sqrt{\mu\epsilon} \omega

As I mentioned above, this gives the phase velocity of a light wave through a medium with permittivity \epsilon and permeability \mu. The permittivity and permeability of the vacuum gives the speed of light in a vacuum c.
c = \frac{1}{\sqrt{\mu_0 \epsilon_0}}
In a general medium, we get velocities below c. This is what we saw with Snell’s law!
v = \frac{\omega}{k} = \frac{1}{\sqrt{\mu\epsilon}} = c/n
This is where the refractive index comes from! It’s dependent on the relative permeabilities and permittivities of media.
n = \sqrt{\frac{\mu\epsilon}{\mu_0\epsilon_0}}

Wave Structure
The x-component of the electric field can be written as a product of waves in space and time.
E_x = X(x)Y(y)Z(z)T(t) = X_0 e^{ik_x x} Y_0 e^{ik_y y} Z_0 e^{i k_z z} T_0 e^{-i\omega t} = E_{x,0} e^{i(\vec{k}\cdot \vec{x} - \omega t)}
This applies to the other spatial components of the \vec{E} vector.
\vec{E} = \vec{E}_0 e^{i(\vec{k} \cdot \vec{x} - \omega t)}
Linear combinations give us sinusoidal waves and wave packets. The same solution exists for \vec{B}.
\vec{B} = \vec{B}_0 e^{i(\vec{k} \cdot \vec{x} - \omega t)}
These are oscillating waves with amplitudes of E_0 and B_0 respectively. The direction of propagation of these waves is given by the wave number vector \vec{k}.

Plugging these solutions into Maxwell’s equations can tell us more about the waves. Plug into the first Maxwell equation: \nabla \cdot \vec{E} = 0
\left( E_{x,0} \frac{\partial}{\partial x} + E_{y,0} \frac{\partial}{\partial y} + E_{z,0} \frac{\partial}{\partial z} \right) e^{i(\vec{k} \cdot \vec{x} - \omega t)} = 0
(E_{x,0} k_x + E_{y,0} k_y + E_{z,0} k_z) e^{i(\vec{k} \cdot \vec{x} - \omega t)} = 0
\vec{E} \cdot \vec{k} = 0
This implies the electric field is perpendicular to the wave’s propagation: It is a transverse wave. The same can be shown for the magnetic field, using the second Maxwell equation.

The other two Maxwell equations will show us how the fields’ vectors are oriented in space.
\nabla \times \vec{E} = - \frac{\partial \vec{B}}{\partial t}
Plug in the solutions for the electric and magnetic fields:
\vec{E} = \vec{E}_0 e^{i(\vec{k} \cdot \vec{x} - \omega t)}
\vec{B} = \vec{B}_0 e^{i(\vec{k} \cdot \vec{x} - \omega t)}
We then evaluate the expression:
\nabla \times \vec{E}_0 e^{i(\vec{k} \cdot \vec{x} - \omega t)} = - \frac{\partial}{\partial t} \vec{B_0} e^{i(\vec{k} \cdot \vec{x} - \omega t)}
\left(\left(E_{z,0} \frac{\partial}{\partial y} - E_{y,0}\frac{\partial}{\partial z} \right),-\left(E_{x,0} \frac{\partial}{\partial z} - E_{z,0}\frac{\partial}{\partial x} \right),\left(E_{x,0} \frac{\partial}{\partial y} - E_{y,0}\frac{\partial}{\partial x} \right) \right) e^{i(\vec{k} \cdot \vec{x} - \omega t)} = i\omega \vec{B}
i(E_{z,0} k_y - E_{y,0} k_z, -E_{x,0} k_z + E_{z,0} k_x, E_{x,0} k_y - E_{y,0} k_x) e^{i(\vec{k} \cdot \vec{x} - \omega t)} = i\omega \vec{B}
This shows that the cross product between the wave number vector and electric field is proportional to the magnetic field. This implies the magnetic field is perpendicular to the electric field.
\vec{k} \times \vec{E} = \omega \vec{B}
\vec{E} \cdot (\vec{k} \times \vec{E}) = 0
\vec{E} \cdot \vec{B} = 0
The form of the magnetic field in terms of the electric field is then:
\vec{B} = \sqrt{\mu\epsilon} \frac{\vec{k}\times\vec{E}}{k}
The cross product of the two fields is proportional to the wave number vector.
\vec{E} \times \vec{B} = \frac{\sqrt{\mu\epsilon}}{k} \vec{E} \times (\vec{k} \times \vec{E})=\frac{\sqrt{\mu\epsilon}}{k} (\vec{E}^2 \vec{k} - (\vec{E} \cdot \vec{k}) \vec{E}) = \sqrt{\mu\epsilon}\frac{\vec{k}}{k} E^2
This is in the direction of propagation, and is known as the Poynting vector.

Boundary Conditions
Now that we have the structure of light waves. What happens when a light wave passes from one medium to another? Imagine an electromagnetic wave passing from vacuum into water, the two media separated by a plane.
When the wave starts to interact with the water, the electromagnetic field displaces charges and induces currents on the surface. This is where the electric displacement field \vec{D} from the displaced charges, and the auxiliary magnetic field \vec{H} from the induced current come into play.
\vec{D} = \epsilon \vec{E}
\vec{H} = \frac{1}{\mu} \vec{B}

Accumulated charge and current on the surface of a medium.

The electric field \vec{E} components parallel to the surface are unaffected by the plane of charges on the surface.
E_\parallel = E_\parallel^\prime
The electric field components perpendicular to the surface are affected by the plane of charges on the surface. So there is a difference for these components. Here we need to account for the different permittivities. This manifests from the electric displacement field \vec{D} components perpendicular to the surface remain unchanged between media. It is the piece of the electric field from free changes.
D_\perp = {D_\perp}^\prime
\epsilon E_\perp = \epsilon^\prime {E_\perp}^\prime
Similarly, the induced current points perpendicular to the surface, inducing a magnetic field parallel to the surface. So only the parallel components of the wave’s magnetic field are affected. This manifests from the auxiliary magnetic field \vec{H} components parallel to the surface remaining unchanged. This accounts for the difference in permeability. It is the piece of the magnetic field produced by currents.
H_\parallel = H_\parallel^\prime
\frac{1}{\mu} B_\parallel = \frac{1}{\mu^\prime} B_\parallel^\prime
Perpendicular components are unaffected.
B_\perp = B_\perp^\prime

Incoming Field Orientation
You can think of the wavefront of light as a ring propagating perpendularly along \vec{k}. The directions of \vec{B} and \vec{E} are contained within this ring, but can be oriented in any way such that \vec{B} and \vec{E} are perpendicular to each other. This orientation \eta is the polarization of the wave.
\vec{B} \cdot \vec{E} = 0

A vector bound in a ring can be defined by:
\vec{R} = r(\cos\eta,\sin\eta,0)
Where \eta is the angle of the vector within the ring.
Tilt the z-axis towards the x-axis by \beta . The ring is then defined by:
\vec{R^\prime} = r(\cos\beta \cos\eta, \sin\eta, \sin\beta\cos\eta)

The direction of propagation is defined by the wave number vector. Without loss of generality, we can have the light wave propagating in a plane perpendicular to the surface of the medium:
\vec{k} = (k\sin\theta,0,k\cos\theta)
A ring tilted to be perpendicular around \vec{k} can be defined using the dot product:
\vec{R^\prime}\cdot \vec{k} = kr (\sin\theta \cos\beta \cos\eta + \cos\theta \sin\beta \cos\eta) = 0
\beta = -\theta

The orientation of fields in space are then:
\vec{E} = E(\cos\theta \cos\eta,\sin\eta,-\sin\theta\cos\eta)
\vec{B} = B(-\cos\theta \sin\eta,\cos\eta,\sin\theta\sin\eta)
We can check this:
\vec{E}\times\vec{B} = EB (\sin\theta,0,\cos\theta)

Reflecting and Refracting
Some portion of the wave reflects off the surface and the rest refracts through the medium. The reflected wave can be found by flipping the z coordinate of \vec{k}, and reflect the field vectors accordingly. The incident wave vector is:
\vec{k}_I = k_I (\sin\theta,0,\cos\theta)
The reflected wave vector is:
\vec{k}_R = k_R (\sin\theta,0,-\cos\theta)
The transmitted or refracted wave vector is:
\vec{k}_T = k_T (\sin\theta^\prime,0,\cos\theta^\prime)
We can check the fields of all three using the Poynting vector:
\vec{k} = \frac{k}{\sqrt{\mu\epsilon} E^2} \vec{E} \times \vec{B}

The incident fields are:
\vec{E}_I = E_I(\cos\theta \cos\eta,\sin\eta,-\sin\theta\cos\eta)
\vec{B}_I = B_I(-\cos\theta \sin\eta,\cos\eta,\sin\theta\sin\eta)
The reflected fields are found by flipping over x:
\vec{E}_R = E_R(-\cos\theta \cos\eta,\sin\eta,-\sin\theta\cos\eta)
\vec{B}_R = B_R(\cos\theta \sin\eta,\cos\eta,\sin\theta\sin\eta)
The refracted fields are rotated from \eta to \eta^\prime.
\vec{E}_T = E_T(\cos\theta^\prime \cos\eta^\prime,\sin\eta^\prime,-\sin\theta^\prime\cos\eta^\prime)
\vec{B}_T = B_T(-\cos\theta^\prime \sin\eta^\prime,\cos\eta^\prime,\sin\theta^\prime\sin\eta^\prime)

The electric fields of the wave inside and outside of the medium are then:
The combination of the incident and reflected field is on the outside.
\vec{E} = ((E_I-E_R)\cos\theta \cos\eta,(E_I+E_R)\sin\eta,-(E_I+E_R)\sin\theta\cos\eta)
The refracted field is on the inside.
\vec{E^\prime} = E_T(\cos\theta^\prime \cos\eta^\prime,\sin\eta^\prime,-\sin\theta^\prime\cos\eta^\prime)
The same applies to the magnetic fields:
\vec{B} = (-(B_I-B_R)\cos\theta \sin\eta,(B_I+B_R)\cos\eta,(B_I+B_R)\sin\theta\sin\eta)
\vec{B^\prime} = B_T(-\cos\theta^\prime \sin\eta^\prime,\cos\eta^\prime,\sin\theta^\prime\sin\eta^\prime)

Connect with Boundary Conditions
We can stitch the waves together from both regions using the boundary conditions mentioned earlier. From these boundary conditions we can get a system of equations.

The first equation is from the perpendicular component of the electric field.
\epsilon E_\perp = \epsilon^\prime {E_\perp}^\prime
\epsilon (E_I+E_R) \sin\theta \cos\eta = \epsilon^\prime E_T \sin\theta^\prime \cos\eta^\prime (equation 1)

The second and third equations are from the parallel component of the electric field.
E_\parallel = E_\parallel^\prime
(E_I-E_R) \cos\theta \cos\eta = E_T \cos\theta^\prime \cos\eta^\prime (equation 2)
(E_I+E_R) \sin\eta = E_T \sin\eta^\prime (equation 3)

The fourth equation is from the perpendicular component of the magnetic field.
B_\perp = B_\perp^\prime
(B_I + B_R) \sin\theta \sin\eta = B_T \sin\theta^\prime \sin\eta^\prime
The magnetic field amplitude is related to the electric field amplitude, giving us:
\sqrt{\mu\epsilon}(E_I+E_R) \sin\theta \sin\eta = \sqrt{\mu^\prime\epsilon^\prime} E_T \sin\theta^\prime \sin\eta^\prime (equation 4)

The fifth and sixth equations are from the parallel component of the magnetic field :
\frac{1}{\mu} B_\parallel = \frac{1}{\mu^\prime} B_\parallel^\prime
\frac{1}{\mu} (B_I-B_R) \cos\theta \sin\eta = \frac{1}{\mu^\prime} B_T \cos\theta^\prime \sin\eta^\prime
\frac{1}{\mu} (B_I+B_R) \cos\eta = \frac{1}{\mu^\prime} B_T \cos\eta^\prime
Put into terms of electric field amplitude:
\sqrt{\frac{\epsilon}{\mu}} (E_I-E_R) \cos\theta \sin\eta = \sqrt{\frac{\epsilon^\prime}{\mu^\prime}} E_T \cos\theta^\prime \sin\eta^\prime (equation 5)
\sqrt{\frac{\epsilon}{\mu}}(E_I+E_R) \cos\eta = \sqrt{\frac{\epsilon^\prime}{\mu^\prime}} E_T \cos\eta^\prime (equation 6)

Snell’s law appears when you divide the first equation by the last equation. Or divide the fourth equation by the third.
\sqrt{\epsilon\mu} \sin\theta = \sqrt{\epsilon^\prime \mu^\prime} \sin\theta^\prime
n \sin\theta = n^\prime \sin\theta^\prime

Polarization
These equations can be used to figure out how much of a wave is reflected and how much is refracted. We can start by squaring equations 1 and 4, and add them together to eliminate dependence on \eta^\prime.
\left(\frac{\epsilon}{\epsilon^\prime} \right)^2(E_I + E_R)^2 \sin^2 \theta \cos^2 \eta = E_T^2 \sin^2 \theta^\prime \cos^2 \eta^\prime
\frac{\mu \epsilon}{\mu^\prime \epsilon^\prime} (E_I+E_R)^2 \sin^2 \theta \sin^2 \eta = E_T^2 \sin^2 \theta^\prime \sin^2 \eta^\prime
Add together:
(E_I+E_R)^2 \sin^2 \theta \left( \frac{\epsilon^2}{{\epsilon^\prime}^2} \cos^2 \eta + \frac{\mu \epsilon}{\mu^\prime \epsilon^\prime}\sin^2\eta \right) = E_T^2 \sin^2 \theta^\prime
(E_I+E_R)^2 \left( \frac{\mu^\prime \epsilon}{\mu \epsilon^\prime} \cos^2 \eta + \sin^2\eta \right) = E_T^2
Do the same with equation 2 and 5:
(E_I-E_R)^2 \cos^2\theta \cos\eta^2 = E_T^2 \cos^2\theta^\prime \cos^2\eta^\prime
\frac{\epsilon \mu^\prime}{\epsilon^\prime \mu}(E_I-E_R)^2 \cos^2\theta \sin^2\eta = E_T^2 \cos^2 \theta^\prime \sin^2 \eta^\prime
Add together:
(E_I-E_R)^2 \cos^2\theta \left(\cos^2 \eta + \frac{\mu^\prime \epsilon}{\mu \epsilon^\prime}\sin^2\eta \right) = E_T^2 \cos^2 \theta^\prime
It is useful to define shorthand variables:
\alpha = \frac{\cos\theta^\prime}{\cos\theta}
\beta = \sqrt{\frac{\epsilon^\prime \mu}{\epsilon \mu^\prime}}
We then have two equations:
(E_I+E_R)^2 \left(\cos^2 \eta + \beta^2 \sin^2\eta \right) = \beta^2 E_T^2
(E_I-E_R)^2 \left(\beta^2 \cos^2 \eta + \sin^2\eta \right) = \alpha^2 \beta^2 E_T^2

We can use these equations to solve for the field strengths. The field strengths can give us the fractions of power reflected and refracted.
Solve for the reflected field E_R and refracted field E_T in terms of the incident field E_I.
E_I + E_R = \frac{\beta}{\sqrt{\cos^2 \eta + \beta^2 \sin^2 \eta}} E_T
E_I - E_R = \frac{\alpha\beta}{\sqrt{\beta^2 \cos^2 \eta + \sin^2\eta}} E_T
Use these two equations to solve for reflected and refracted field strengths.
E_T = \frac{2}{\beta} \left( \frac{\sqrt{1 + (\beta^2 -1) \sin^2 \eta} \sqrt{\beta^2 - (\beta^2 -1)\sin^2 \eta}}{\alpha \sqrt{1 + (\beta^2 -1) \sin^2 \eta} + \sqrt{\beta^2 - (\beta^2 -1)\sin^2 \eta} }\right) E_I
E_R = \frac{\beta}{2} \left( \frac{\sqrt{\beta^2 - (\beta^2 -1)\sin^2 \eta} - \alpha \sqrt{1 + (\beta^2 -1) \sin^2 \eta} }{\sqrt{1 + (\beta^2 -1) \sin^2 \eta}\sqrt{\beta^2 - (\beta^2 -1)\sin^2 \eta}} \right) E_T
Plug in E_T in terms of E_I:
E_R = \frac{\sqrt{\beta^2 - (\beta^2 -1)\sin^2 \eta} - \alpha \sqrt{1 + (\beta^2 -1) \sin^2 \eta} }{\sqrt{\beta^2 - (\beta^2 -1)\sin^2 \eta} + \alpha \sqrt{1 + (\beta^2 -1) \sin^2 \eta} } E_I

Familiar Polarization
If \eta = \pi/2, then \vec{E} = E(0,1,0). This is horizontal polarization.
If \eta = 0 then \vec{B} = B(0,1,0). This is vertical polarization.
For each case, the equations for the refracted and reflected fields are then:
E_T = \frac{2}{\alpha + \beta} E_I
E_R = \frac{\beta - \alpha}{\beta + \alpha} E_I
for vertical polarization. For horizontal polarization:
E_T = \frac{2}{\alpha\beta + 1} E_I
E_R = \frac{1-\alpha\beta}{1+\alpha\beta} E_I
The factors here are Fresnel Coefficients, which is how much of each polarization is transmitted or reflected.
The power of a wave is proportional to the square of the field strength.
P = \sqrt{\frac{\epsilon}{\mu}} E^2 \cos\theta

The fraction of power transmitted through reflection is then:
\frac{P_R}{P_I} = \frac{E_R^2}{E_I^2}
For vertical polarization: = \frac{(\beta-\alpha)^2}{(\beta+\alpha)^2}
For horizontal polarization: =\frac{(1-\alpha\beta)^2}{(1+\alpha\beta)^2}

The fraction of power transmitted through refraction is then:
\frac{P_T}{P_I} = \sqrt{\frac{\epsilon^\prime \mu}{\epsilon \mu^\prime}} \frac{E_T^2}{E_I^2} \frac{\cos\theta^\prime}{\cos\theta} = \alpha\beta \frac{E_T^2}{E_I^2}
For vertical polarization: =\frac{4\alpha\beta}{(\beta+\alpha)^2}
For horizontal polarization: = \frac{4\alpha\beta}{(1+\alpha\beta)^2}

An interesting physical consequence is the Brewster’s angle. This is where vertical polarization drops to zero. The reflected light is completely polarized horizontally.
(vertical polarization) = \frac{(\beta-\alpha)^2}{(\beta+\alpha)^2} = 0
We then have:
\beta = \alpha
In the case where magnetic permeability is roughly the same between the media \mu^\prime = \mu
\beta = \sqrt{\frac{\epsilon^\prime \mu}{\epsilon \mu^\prime}} = \frac{\epsilon^\prime}{\epsilon} \sqrt{\frac{\epsilon \mu}{\epsilon^\prime \mu^\prime}} = n
\alpha = \sqrt{\frac{1- \frac{\epsilon \mu}{\epsilon^\prime \mu^\prime} \sin^2\theta }{1-\sin^2\theta}} = \sqrt{\frac{1-(\sin^2\theta)/n^2}{1-\sin^2\theta}}
1- \frac{1}{n^2} \sin^2\theta = n^2 (1-\sin^2\theta)
\tan\theta_B = n
\theta_B = \arctan(n)
The Brewster’s angle here is the arctangent of the refractive index.


Photon Momentum
Snell’s law can also be derived from quantum mechanics. The wave number vector is related to the momentum by Planck’s constant:
p = \hbar k
The wave number in one medium is:
k = \sqrt{\mu\epsilon} \omega
The wave number in the other medium is:
k^\prime = \sqrt{\mu^\prime \epsilon^\prime} \omega
The ratio of these two give us the refractive index ratio:
\frac{k^\prime}{k} = \sqrt{\frac{\mu^\prime \epsilon^\prime}{\mu\epsilon}} = \frac{n^\prime}{n} = \frac{\lambda}{\lambda^\prime} = \frac{v}{v^\prime}
The components of the electric field parallel to surface must be equal across the boundary defined by the xy-plane, because the momentum of photons parallel to the surface is conserved.
k_{xy}^\prime = k_{xy}
The z component is not conserved between media, in general. This results in a deviation in angle:
k^\prime \sin\theta^\prime = k \sin\theta
Therefore:
\frac{\sin\theta^\prime}{\sin\theta} = \frac{k}{k^\prime} = \frac{n}{n^\prime}

Snell’s Law

This entry is dedicated to my cousin Joe and his wife Michelle. Congratulations on tying the knot you two!

Every summer, my family would travel up to Sybil Lake in Minnesota to go fishing. When we weren’t out on the boat with our parents and Grandpa Ervin, we kids would be on the dock with our lines in the water. We’d be waiting for a bite from the same rock bass we had caught the previous day with reasonable success.
I would sometimes dip my fishing rod into the water and exclaim something like “oh no, my fishing rod is bent!” It wasn’t bent though; it was the water refracting the light from the part of the rod that was submerged, making the rod appear bent. A classic prank.
You can reproduce this effect with a cup of water and a straw, as seen in the picture below.

Bent straw or bent light?

Light goes from one medium (air) to another medium (water) and its behavior changes as a result. The most noticeable change here is the angle. Light comes in at an angle \theta and is redirected to travel along a different angle \theta^\prime by the nature of the substance it is passing through. How does this happen?

Light refracts through a medium.

On the atomic scale, the light vibrates the charges in the substance that it is passing through. These vibrating charges themselves, emit light which then interferes with the initial light, causing it to slow down and change its trajectory. The degree to which light is affected by a substance depends on how influenced it is by electric and magnetic fields. Keep in mind that light is composed of oscillating electric and magnetic fields. This phenomenon is why we have lenses and rainbows.

Fermat’s Principle
It can be shown that this bent path is actually the shortest path from one medium to another. In a previous entry, I talked about the principle of least action, and how the distance between two points is not always a straight line. In this case, it can be shown to be two straight lines. In general, one medium has a refractive index of n, and the other has a refractive index of n^\prime. These could be water, glass, air, a vacuum, etc. The refractive index is a metric of how influenced a medium is by electric and magnetic fields.

Apply some geometry and trig to this problem.

To show how this bent path between media minimizes the action, we break down the problem using geometry to parameterize the path taken by the light by a distance variable x. The light will go from point A to point B, but we want to minimize the amount of time it takes to do so.
The time it takes to go from A to B is the distance traveled above the water D, divided by the speed above the water v, added to the distance traveled below the water D^\prime, divided by the speed below the water v^\prime.
t = \frac{D}{v} + \frac{D^\prime}{v^\prime}
We can then parameterize this equation in terms of distance x using geometry shown in the image above. Also the fact that the speed of light in a medium is dependent on the nature of that medium n, such that v = c/n.
t = \frac{1}{c} ( n \sqrt{x^2 + h^2}  + n^\prime \sqrt{(q-x)^2 + y^2})
To minimize the time, we take the derivative with respect to x and set it equal to zero.
\frac{dt}{dx} = 0
We then get the equation:
\frac{dt}{dx} = \frac{1}{c} \left( n\frac{x}{\sqrt{x^2 + h^2}} + n^\prime \frac{-(q-x)}{\sqrt{(q-x)^2 + y^2}} \right) = 0
Note that these terms are equivalent to the sines of the angles in the problem:
\sin\theta = \frac{x}{\sqrt{x^2 + h^2}}
\sin\theta^\prime = \frac{(q-x)}{\sqrt{(q-x)^2 + y^2}}
This leaves us with the following equation:
n \sin\theta = n^\prime \sin\theta^\prime
which is Snell’s law! This law shows how the angle of a light beam changes from one medium to another. There are actually a number of ways to derive this law.

Wavelength and Trig
It can be shown that the wavelength of the light passing through a medium changes. This can also be shown using geometry. Copy the original image and shift it over by an arbitrary distance d, so there are two light beams entering the medium in the same way. You can then imagine the wavelengths along the light beams meeting at the boundary between the two media. Where one wavelength \lambda ends at the surface, another one \lambda^\prime begins. The speed of light in a medium is related to wavelength and period time by
v = \lambda/t .

Wavelength change from refraction.

Finding the relationship between the wavelength change, the refraction angle, the incident angle then becomes a trigonometry problem. Note that in the image above, the wavelengths and distance d form two triangles sharing a side, with angles \theta, \theta^\prime:
d \sin\theta = \lambda = ct
d \sin\theta^\prime = \lambda^\prime = vt
Take the ratio of these two equations (the arbitrary distance d cancels out):
\frac{\sin\theta}{\sin\theta^\prime} = \frac{\lambda}{\lambda^\prime} = \frac{c}{v}
The ratio between the speed of light from one medium to another can be thought of as an refractive index n:
n = c/v
\frac{\sin\theta}{\sin\theta^\prime} = n
Or in general, with two media, one with refractive index n and the other with n^\prime.
\frac{\sin\theta}{\sin\theta^\prime} = \frac{v}{v^\prime} = \frac{n^\prime}{n} = \frac{\lambda}{\lambda^\prime}
This also gives us Snell’s law!
n\sin\theta = n^\prime\sin\theta^\prime

Note, there is no solution if:
1 < \frac{n}{n^\prime} \sin\theta
What does this mean? There is a critical angle where refraction stops happening.
There’s no loss of generality by saying n^\prime > n
\sin^{-1} \frac{n^\prime}{n} = \theta_c < \theta
Beyond this critical angle, there is no refraction, only reflection. In general, a portion of the incoming light is refracted and a portion is reflected.

Wave variables

We know a wave’s speed is related to its wavelength and period by:
v = \lambda/t
Also note that frequency is the reciprocal of period.
f = 1/t
Another way to think about wavelength is by taking its reciprocal to get the wavenumber k.
k = 2\pi/\lambda
(This is related to momentum in quantum mechanics by Planck’s constant p = \hbar k)
Frequency can be scaled to angular frequency \omega
f = 2\pi \omega
So we can express speed as:
v = \omega/k

Phase and group velocity
What if there is a change in frequency \omega with respect to wavenumber k? In general, frequency is a function of wavenumber \omega(k). The relationship between these two quantities depends on the nature of the media the wave is propagating through.
Initially, we have a sine wave propagating at a speed of \omega/k.
\sin(kx - \omega(k) t)
Then the wave number starts to change from passing through a dispersive medium.
k \rightarrow k^\prime = k+ \Delta k
In response, the frequency changes:
\omega \rightarrow \omega^\prime = \omega + \Delta \omega
This change transforms trig function terms: e^{i kx - \omega t} \rightarrow e^{i kx - \omega t}e^{i \Delta kx - \Delta\omega t}
The sine wave becomes:
\sin(kx - \omega(k) t) \rightarrow \sin([k + \Delta k ]x - [\omega + \Delta \omega] t)
This can also be written as:
= \sin(kx -\omega t) \cos(\Delta k x - \Delta \omega t) + \cos(kx-\omega t)\sin(\Delta k x - \Delta \omega t)
which is a convolution of two waves. One with wavelength 2\pi/k, and the other with a wavelength of 2\pi/\Delta k. One with the speed of \omega/k as mentioned previously, and the other wave with a speed of \Delta \omega/\Delta k.
These quantities are known as the phase velocity and group velocity, respectively:
v_p = \frac{\omega}{k}
v_g = \frac{d\omega}{dk}
The ratio of angular frequency and wavenumber is phase velocity and the derivative of angular frequency with respect to wavenumber is group velocity. The “groups” are thought of as envelopes or packets, wrapping the smaller wavelength wave.

This image has an empty alt attribute; its file name is image-6.png

For more on how trig functions work, check out my entry about them.

Dispersion
If group velocity is different from phase velocity, we get dispersion. For refraction, this means the refractive index will be dependent on wavelength. This is why we have rainbows: different colors of light will refract at different angles in water.

This image has an empty alt attribute; its file name is image-8.png
Light of different wavelengths (colors) refracts at different angles.

Suppose we have light in a vacuum propagating towards a medium made of matter. While in the vacuum it has a wavelength of:
\lambda_0 = \frac{2\pi c}{\omega}
Once it enters the medium the wavelength becomes:
\lambda = \frac{2\pi}{k} = 2\pi \frac{v_p}{\omega}
From before, we know that the refractive index is the ratio of the phase velocities (or the ratio of the wavelengths).
n = \frac{c}{v_p} = \frac{\lambda_0}{\lambda}
We also saw that phase velocity is the ratio between angular frequency and wavenumber. This gives us an equation relating angular frequency to wavenumber, light speed in a vacuum and the refractive index.
\omega = k v_p = k c /n
Now suppose the refractive index is dependent on the frequency of the light n = n(\omega).
The group velocity can be found by taking the derivative.
\frac{d\omega}{dk} = c \frac{1}{n} - kc \frac{1}{n^2} \frac{dn}{dk}
v_g = v_p \left(1 - \frac{k}{n} \frac{dn}{dk}\right)
This equation can be expressed in terms of wavelength as well:
k = \frac{2\pi}{\lambda}
dk = - \frac{2\pi}{\lambda^2} d\lambda
v_g = v_p \left(1 + \frac{\lambda}{n} \frac{dn}{d\lambda} \right)
Note that if the refractive index is independent of frequency that group velocity will be equivalent to phase velocity.
if: v_p \neq v_g
1 \geq 1 + \frac{\lambda}{n} \frac{dn}{d\lambda}
0 \geq \frac{dn}{d\lambda}
We can use Snell’s law to then show angular dependence on wavelength:
\sin \theta_0 = n(\lambda) \sin\theta
The incoming light is at an angle \theta_0, and the refracted light is at an angle \theta, which is dependent on the wavelength of the light. This is because the refractive index is dependent on wavelength.
\theta(\lambda) = \arcsin \left( \frac{\sin\theta_0}{n(\lambda)} \right)
\frac{d\theta}{d\lambda} = -\frac{1}{\sqrt{1- \frac{\sin^2\theta_0}{n^2}}} \frac{\sin\theta_0}{n^2} \frac{dn}{d\lambda}=\frac{-\sin\theta_0}{n \sqrt{n^2 - \sin^2\theta_0}} \frac{dn}{d\lambda}
The way in which n depends on \lambda is theorized with Cauchy’s equation:
n(\lambda) = \sum\limits_{j=0}^J \frac{C_j}{\lambda^{2j}}
where the coefficients C_j depend on the material.
This allows rainbows to appear in the sky when sunlight refracts through raindrops.

This image has an empty alt attribute; its file name is rainbow.png
Photo of a rainbow I took in Minnesota years ago.

If light is a wave of electric and magnetic fields, then how does it travel at the speed it does? How does a refractive index relate to these fields? How much light is reflected and how much is refracted? To address these questions, we will solve Maxwell’s equations to derive waves of light as a consequence.
Maxwell’s equations:
\nabla \cdot \vec{E} = \frac{\rho}{\epsilon}
\nabla \cdot \vec{B} = 0
\nabla \times \vec{E} = - \frac{\partial \vec{B}}{\partial t}
\nabla \times \vec{B} = \mu \left(\vec{J} + \epsilon \frac{\partial \vec{E}}{\partial t} \right)
Stay tuned!

Math Challenge

Which is greater, e^\pi or \pi^e ? Determine the answer without using a calculator.

Try to show which direction the inequality symbol is pointing. The solution is shown below.

The solution:
Start by relating the two values with a placeholder symbol ~, which could be either inequality symbol >, < or =.
e^\pi \sim \pi^e
Exponentiate both sides by \frac{1}{\pi e}
\left(e^\pi\right)^\frac{1}{\pi e} \sim \left(\pi^e\right)^\frac{1}{\pi e}
This evaluates to:
e^\frac{1}{e} \sim \pi^\frac{1}{\pi}

Both sides can be generalized to a function of x:
f(x) = x^\frac{1}{x}
The solution to this problem is hidden in this function.
Note that x = e^{\ln x}. We can use this fact to write the function as:
f(x) = e^{\frac{\ln x}{x}}

We can use calculus to find the extrema (maximum and/or minimum) of f(x).
Take the derivative of the function:
\frac{d f}{dx} = e^{\frac{\ln x}{x}} \frac{d}{dx}\left( \frac{\ln x}{x} \right) =x^{\frac{1}{x}} \frac{1}{x^2} \left( 1 - \ln x\right)
Set it equal to zero. The maximum or minimum of a function will be flat (with a slope of zero).
\frac{df}{dx} = 0
Solve for x in the resulting equation to find the extrema:
\ln x = 1
x=e
So we know there is one extreme point at x = e, but is it a max or a min?

To determine this, take the derivative of f(x) again. The second derivative is curvature.
\frac{d^2 f}{dx^2} = e^{\frac{\ln x}{x}} \frac{1}{x^4} \left( 1 - \ln x\right)^2 - e^{\frac{\ln x}{x}} \frac{2}{x^3} \left( 1 - \ln x \right) - e^{\frac{\ln x}{x}} \frac{1}{x^3}
=x^\frac{1}{x} \frac{1}{x^4} ( 1-2\ln(x) + \ln(x)^2 -3x + 2x\ln x)
If the function curves up at the extreme point (is positive), it is a minimum. If it curves down (is negative) it is a maximum.
Plug in the extreme point: x=e
\frac{d^2 f}{dx^2}(e) = - e^\frac{1}{e} \frac{1}{e^3} < 0

The curvature is negative! So it is a maximum.
Therefore the value of f(x=e) is greater than the value of f(x) anywhere else along x, including at x=\pi:
f(e) > f(x\neq e)
f(e) > f(\pi)
e^\frac{1}{e} > \pi^\frac{1}{\pi}
This shows that the \sim is the greater than symbol:
Therefore e^\pi is larger.
e^\pi > \pi^e

You can confirm this with a calculator:
e^\pi = 23.1407
\pi^e = 22.4592

Squircles

The area of a circle is proportional to the radius squared, where the proportionality constant is \pi.
A = \pi R^2

The area of a square is also proportional to the radius (or apothem) squared, where the proportionality constant is 4.
A = 4 R^2

Circle radius and square apothem.

In general, shapes with a characteristic length have an area of the form:
A = \kappa R^2
Where the proportionality constant \kappa depends on the shape.

So is there a way to unify circles and squares beyond stating this generic formula? How can we determine the value of \kappa for a given shape?
One place to start is noting that points on a circle can be parameterized in cartesian and polar coordinates:
x^2 + y^2 = R^2
Where any point on the xy-plane is related to polar coordinates by:
x = r \cos\theta
y = r\sin\theta
Plugging these into the formula for a circle gives us:
r^2 \cos\theta^2 + r^2 \sin\theta^2 = R^2
Which reduces to:
r = R
because of the trig identity: \cos\theta^2 + \sin\theta^2 =1
This tells us the radius of a circle is not dependent on the angle around the center, it is a constant, which what we expect.

This can be generalized further, where instead of squaring the cartesian coordinate values of the points on a shape, we exponentiate them by n:
|x|^n + |y|^n = R^n
The absolute value was implied when we were squaring x and y, but it is made explicit here.
In polar coordinates this becomes:
r^n |\cos\theta|^n + r^n |\sin\theta|^n = R^n
Solving for r shows us that shapes of this form have points with radial distances that depend on angle.
r = \frac{R} { \sqrt[n]{|\cos\theta|^n + |\sin\theta|^n }}
If n =2 then we recover the circle.
If n = 1 we get a square, but rotated 45 degrees (diamond).

Rotated square n=1


Other values of n give us shapes between circles and squares. For example, n = 4 gives us a “squircle”, a shape somewhere between a circle and square.

Squircle n=4

So what are the area formulas for these shapes? This will give us the form of the proportionality constant \kappa = A/R^2. To do this we need to take the integral of radial distance with respect to the angle around the center of the shape.
A = \int\limits_0^{2\pi} \int\limits_0^r r^\prime dr^\prime d\theta
= \frac{1}{2} \int\limits_0^{2\pi} r^2 d\theta
We can then plug in the general radial distance formula from the previous paragraph, giving us the form of the proportionality constant for any value of n:
\kappa = \frac{A}{R^2} = \frac{1}{2} \int\limits_0^{2\pi}  \frac{d\theta} { \sqrt[n]{|\cos\theta|^n + |\sin\theta|^n }}

This integral can be evaluated approximately for a given value of n by making d\theta a very small but finite number, and performing a discrete sum:
\kappa \approx \frac{\pi}{M} \sum\limits_{m=0}^M \frac{1}{\left|\cos \left(\frac{2\pi}{M}m\right) \right|^n + \left|\sin \left(\frac{2\pi}{M}m\right) \right|^n}^{2/n}
where M is a sufficiently large number. (Write computer code to do this sum.)

The n=4 squircle can be shown to have a proportionality constant of approximately 3.7081. So the area formula for that shape is:
A = 3.7081 R^2

Below is an animation showing the shape and the approximate proportionality constant for different values of n.

It is interesting how increasing the value of n takes us from a star shape, to a rotated square, to a circle, to a squircle, and then gradually to a square.
The proportionality constant can also be plotted against the exponent n.
Note that as n approaches infinity, \kappa=A/R^2 approaches 4, which is the proportionality constant for a square: A = 4 R^2.

I was inspired to write this entry after seeing Stand-up Maths’ video on Squircles. Check it out:

Dirac-Like Maxwell’s Equations

During quarantine I had some time to go through old boxes. I found a homework assignment from when I attended Professor Pran Nath’s quantum physics course in 2009. There was no accompanying solution in the box because it was a “bonus” homework from the end of the semester. So I felt like working through it now, years later. Here is my solution to that homework.

The problem asks us to show that Maxwell’s equations, in the presence of a source, can be written in a “Dirac-like” form.
J = \mathbb{M} C
Where C is a 4-vector of magnetic and electric field strengths, J is a 4-vector of sources (electric charges, magnetic monopoles and their respective current densities), and \mathbb{M} is a 4-by-4 matrix yet to be determined.

Maxwell’s equations can be written in a more symmetric form if we suppose both electric charges and magnetic monopoles exist. We get two Gauss’ laws for electric charge and magnetic monopole densities:
\nabla \cdot \vec{E} = \rho_e
\nabla \cdot \vec{B} = \rho_m
Ampère’s law stays the same:
\nabla \times \vec{B} - \frac{\partial \vec{E}}{\partial t} = \vec{J}_e
Faraday’s law of induction gets a current term from magnetic current, making it look more like Ampère’s law:
\nabla \times \vec{E} + \frac{\partial \vec{B}}{\partial t} = -\vec{J}_m

Electric and magnetic field strengths \vec{E} and \vec{B} do not add together directly. However, they can be written together as a complex field:
\vec{C} = \vec{E} + i \vec{B}
Gauss’ laws then become:
\nabla \cdot (\vec{E} + i \vec{B}) = \rho_e + i\rho_m
Or more compactly:
\nabla \cdot \vec{C} = \rho_c
Faraday’s and Ampère’s laws then become:
\nabla \times \vec{B} - \frac{\partial \vec{E}}{\partial t} - i \left( \nabla \times \vec{E} + \frac{\partial \vec{B}}{\partial t} \right) = \vec{J}_e + i \vec{J}_m
\nabla \times (\vec{B} - i \vec{E}) - \frac{\partial}{\partial t} (\vec{E} + i \vec{B}) = \vec{J}_e + i \vec{J}_m
-i \nabla \times (\vec{E} + i \vec{B}) - \frac{\partial}{\partial t} (\vec{E} + i \vec{B}) = \vec{J}_e + i \vec{J}_m
-i \nabla \times \vec{C} - \frac{\partial}{\partial t} \vec{C} = \vec{J}_c

So then, Maxwell’s equations can be expressed as two equations:
\nabla \cdot \vec{C} = \rho_c
\frac{1}{i} \nabla \times \vec{C} - \frac{\partial}{\partial t} \vec{C} = \vec{J}_c
Because the divergence of the curl of a vector is zero, we can show that charge is conserved through a continuity equation.
\nabla \cdot \nabla \times \vec{C} = 0

\frac{1}{i}\nabla \cdot \nabla \times \vec{C} - \frac{\partial}{\partial t} \nabla \cdot \vec{C} = \nabla \cdot \vec{J_c}
0 - \frac{\partial}{\partial t} \nabla \cdot \vec{C} = \nabla \cdot \vec{J_c}
Therefore:
-\frac{\partial}{\partial t} \rho_c = \nabla \cdot \vec{J}_c
This is the continuity equation. It can also be expressed in Einstein notation:
\partial_\mu J^\mu = 0
Where J is a 4-vector of current and charge density elements:
J =  \begin{pmatrix} \rho_c \\ \vec{J}_c \end{pmatrix}  = \begin{pmatrix} \rho_c \\ J_c^x \\  J_c^y \\ J_c^z \end{pmatrix} =   \begin{pmatrix} \rho_e + i \rho_m \\ J_e^x + iJ_m^x \\   J_e^y + iJ_m^y  \\  J_e^z + iJ_m^z   \end{pmatrix}
Similarly, the complex field strength can be expressed as a 4-vector:
C =   \begin{pmatrix} 0 \\ \vec{C} \end{pmatrix}  = \begin{pmatrix} 0 \\ C^x \\  C^y \\ C^z \end{pmatrix} =   \begin{pmatrix} 0\\ E^x + i B^x \\    E^y + i B^y   \\   E^z + i B^z    \end{pmatrix}

Using the two Maxwell’s equations, we can then map J elements to C elements:
J =  \begin{pmatrix} \rho_c \\ \vec{J}_c \end{pmatrix}  =  \begin{pmatrix} \nabla \cdot \vec{C} \\ \frac{1}{i} \nabla \times \vec{C} - \partial_t \vec{C} \end{pmatrix}
Decomposing this into 4-vector elements we get:
\begin{pmatrix} \rho_c \\ J_c^x \\  J_c^y \\ J_c^z \end{pmatrix}  = \begin{pmatrix} 0 & \partial_x & \partial_y & \partial_z \\ 0 & 0 & -\partial_z/i & \partial_y/i \\ 0 & \partial_z/i & 0 & -\partial_x/i \\ 0 & -\partial_y/i & \partial_x/i & 0 \end{pmatrix}  \begin{pmatrix} 0 \\ C^x \\  C^y \\ C^z \end{pmatrix} -  \begin{pmatrix} 0&0&0&0 \\ 0 & \partial_t & 0 & 0 \\ 0&0&\partial_t&0\\ 0&0&0&\partial_t\end{pmatrix} \begin{pmatrix} 0 \\ C^x \\  C^y \\ C^z \end{pmatrix}
This can be expressed as a sum of 4-by-4 matrices and derivative operations with respect to space-time:
=\frac{1}{i}\left[ -\begin{pmatrix}  0 & 0 & 0 & 0\\ 0 & i & 0 &0\\ 0 & 0 & i & 0\\ 0& 0& 0 &i\end{pmatrix} \partial_t + \begin{pmatrix} 0 & i & 0 & 0 \\ 0&0&0&0 \\ 0&0&0&-1 \\ 0&0&1&0 \end{pmatrix} \partial_x +  \begin{pmatrix} 0 & 0 & i & 0 \\ 0&0&0&1\\ 0&0&0&0 \\ 0&-1&0&0 \end{pmatrix} \partial_y  +  \begin{pmatrix} 0 & 0 & 0& i \\ 0&0&-1&0 \\ 0&1&0&0 \\ 0&0&0&0 \end{pmatrix} \partial_z   \right]   \begin{pmatrix} 0 \\ C^x \\  C^y \\ C^z \end{pmatrix}

The four 4-by-4 matrices can be written as a 4-vector of matrices \alpha. This allows us to write the form of matrix \mathbb{M} mentioned above.
\mathbb{M} =  \frac{1}{i} \alpha^\mu \partial_\mu
The matrices associated with the space derivatives contain the basis elements of the SO(3) Lie algebra. These are the 3-by-3 generators of rotation in 3-D space: L_x, L_y, L_z. They also contain the 3-D unit basis vectors \hat{e}_k. The matrix associated with the time derivative contains the identity matrix in 3-D space.
\alpha_\mu \in  \begin{pmatrix} 0 & 0\\ 0& i \mathbb{I} \end{pmatrix},  \begin{pmatrix} 0 & i \hat{e}_k \\ 0 & L_k \end{pmatrix}
Note a 4th complex field C^t = E^t + i B^t could be non-zero without any effects, since the time column of \mathbb{M} is composed of all zeros, and so it would never enter Maxwell’s equations.

The Dirac-like form of Maxwell’s equations in Einstein notation is then:
J = \frac{1}{i} \alpha^{\mu} \partial_\mu C
and since J and C elements occupy spacetime:
J_\rho = \frac{1}{i} \alpha_\rho^{\mu\sigma} \partial_\mu C_\sigma
This is “Dirac-like” because it looks like the Dirac equation for fermion fields. Where \gamma elements are 4-by-4 matrices multiplied by spacetime derivatives as well:
- m\psi  = \frac{1}{i} \gamma^\mu \partial_\mu  \psi

I have fond memories from grad school with Darin Baumgartel and Gregg Peim.

Gregg, myself, and Darin at Ginger Exchange years ago.

The Heisenberg Uncertainty Principle

In discussions about quantum physics, you may have heard that one cannot measure both position and momentum to perfect precision. The measurement of one interferes with the measurement of another due to the wave-like nature of particles. This is known as the Heisenberg Uncertainty Principle. In this entry, I will discuss how the mathematical reasoning behind this.

In math, operations A and B are said to “commute” if AB = BA. If they do not commute, then the order of multiplication matters: AB \neq BA. This may seem strange if you have not worked with matrix multiplication, but you have almost certainly encountered this in real life.

An example of this kind of non-commuting algebra are rotations in 3-D space. In general, starting from the same orientation then rotating an object clockwise then top-front will give a different final configuration than rotating it top-front then clockwise. A collection of operations like this are known as a non-abelian group. In the case of rotation in 3-D space, these rotation operations are matrix multiplication on the orientation of the object. See the figures below.

Rotating clockwise, then top-front.
Rotating top-front, then clockwise.

\text{Clockwise} \times \text{TopFront} \times \text{(Initial Orientation)} = \text{(Final Orientation 1)}
\text{TopFront} \times \text{Clockwise} \times \text{(Initial Orientation)} = \text{(Final Orientation 2)}
\text{Final Orientation 1} \neq \text{Final Orientation 2}
\therefore \text{TopFront} \times \text{Clockwise}  \neq \text{Clockwise} \times \text{TopFront} 
This could be written as a commutator:
[\text{TopFront} ,\text{Clockwise}]  \neq 0
the general definition  of which is:
[A,B] = AB-BA

Position and Momentum

To see that this kind of non-commuting algebra exists between position and momentum when dealing with quantum waves, we have to know the form of the operators involved. Imagine we want to measure momentum. We can apply the momentum operator on the wave function:
-i \hbar \frac{d}{dx} \Psi = p \Psi
If we want to measure both position x and momentum p, we arrive at:
-i \hbar \hat{x} \frac{d}{dx} \Psi =  x p \Psi     (equation 1)

If we change the order of the operators however, the result changes. This is because the momentum operator \frac{d}{dx} acts on the position operator x as well as the wave function \Psi:
-i \hbar \frac{d}{dx} \hat{x} \Psi = -i \hbar \frac{d}{dx} x \Psi =  (-i \hbar + x p) \Psi     (equation 2)

Taking the difference between equation 1 and 2 gives you the commutation relation between the momentum and position operators:
[\hat{x},\hat{p}] = \hat{x}\hat{p} - \hat{p}\hat{x} = i\hbar   (equation 3)

The fact that the difference is not zero is why there is an Uncertainty Principle. This is a key feature in quantum mechanics.

Statistics

The way the uncertainty principle arises from equation 3 involves some statistics.
When you are measuring a quantity, you expect to get the average of all possible values. This is the expectation value \mu_X.
\mu_X = \text{E}[X] = \langle X \rangle

If you are taking multiple measurements, the amount of change between different measurements is characterized by the standard deviation \sigma_X, or variance (standard deviation squared).
\sigma^2_X = \text{E}[(X-\mu_X)^2] = \langle (X - \langle X \rangle)^2 \rangle
= \langle X^2 - 2 X\langle X \rangle + \langle X \rangle^2 \rangle = \langle X^2 \rangle - \langle X \rangle^2
In quantum mechanics, typically the angle bracket notation is used to denote an expectation value, instead of \text{E}.

The expectation value of a measurement is the integral (or sum) over the probability density times the possible values of the measured quantity.
\mu_X = \sum\limits_{i=1}^n P_i X_i
In quantum mechanics, the probability density is determined by the wave function \Psi. We then have the operator X acting on the wave function in different ways to give rise to statistical quantities.
\mu_X = \langle X \rangle = \langle \Psi | X | \Psi \rangle = \int_V \Psi^* X \Psi dV
\sigma_X^2 = \langle (X - \langle X \rangle)^2 \rangle = \langle \Psi | (X - \langle X \rangle)^2 | \Psi \rangle
= \langle (X - \langle X \rangle) \Psi | (X - \langle X \rangle) \Psi \rangle

Note, that an operator acting on a state gives another state: \Psi_X = (X - \langle X \rangle) \Psi. This is analogous to projection of vectors. Through this particular operation, we can get the variance, which is a metric of precision.
\sigma_X^2 = \langle \Psi_X | \Psi_X \rangle
The Cauchy-Schwarz inequality on inner products of vectors applies here. This inequality relates the magnitudes of two vectors to the inner product.
|\vec{a} | |\vec{b}| \geq \vec{a} \cdot \vec{b}
This same idea is carried over into what is called “Hilbert space”, where functions are vectors in infinite-dimensional space, and the inner product is the convolutional integral between two functions. (Consider another similar operator (Y - \langle Y \rangle) acting on state \Psi to give \Psi_Y).
\langle \Psi_X |\Psi_Y \rangle = \int_V \Psi_X^*  \Psi_Y dV
\sqrt{\langle \Psi_X |\Psi_X \rangle} \sqrt{\langle \Psi_Y |\Psi_Y \rangle} \geq \langle \Psi_X |\Psi_Y \rangle
\langle \Psi_X | \Psi_X \rangle \langle \Psi_Y | \Psi_Y \rangle \geq |\langle \Psi_X | \Psi_Y \rangle |^2
This then gives us an inequality on the variance of two variables X and Y.
\sigma_X^2 \sigma_Y^2 \geq |\langle \Psi_X | \Psi_Y \rangle |^2

Observables

Both position and momentum are observable quantities, this means their expectation values are real numbers, and their operators are Hermitian. Hermitian operators are defined as being their own conjugate:  X^*= X.

In general, the inner products of wave functions are complex numbers.
z = \langle \Psi_X | \Psi_Y \rangle = R + iI
z^* = \langle \Psi_Y | \Psi_X \rangle = R - iI
|z|^2 = z^* z = (R - iI)(x + iI) = R^2 + I^2
The inequality we arrived at previously, then becomes:
\sigma_X^2 \sigma_Y^2 \geq |\langle \Psi_X | \Psi_Y \rangle |^2 = R^2 + I^2

The quantities R and I can be expressed in terms of z and its complex conjugate.
R = \frac{(z + z^*)}{2}
I = \frac{(z - z^*)}{2i}
Therefore they can be expressed in terms of the inner products of the wave functions.
R = \frac{1}{2} (\langle \Psi_X |\Psi_Y \rangle + \langle \Psi_Y |\Psi_X \rangle) = \frac{1}{2} (\langle (X - \langle X \rangle)(Y - \langle Y \rangle) \rangle + \langle (Y - \langle Y \rangle)(X - \langle X \rangle) \rangle)
=\frac{1}{2} (\langle XY - X\langle Y \rangle - Y\langle X \rangle + \langle X \rangle \langle Y \rangle \rangle + \langle YX - X\langle Y \rangle - Y\langle X \rangle + \langle X \rangle \langle Y \rangle \rangle)
=\frac{1}{2} ( \langle XY \rangle + \langle YX \rangle - 2 \langle X \rangle \langle Y \rangle)
=\frac{1}{2} \langle \{ X,Y \} \rangle - \langle X \rangle \langle Y \rangle

I = \frac{1}{2i} (\langle \Psi_X |\Psi_Y \rangle - \langle \Psi_Y |\Psi_X \rangle) = \frac{1}{2i} (\langle (X - \langle X \rangle)(Y - \langle Y \rangle) \rangle - \langle (Y - \langle Y \rangle)(X - \langle X \rangle) \rangle)
=\frac{1}{2i} (\langle XY - X\langle Y \rangle - Y\langle X \rangle + \langle X \rangle \langle Y \rangle \rangle - \langle YX - X\langle Y \rangle - Y\langle X \rangle + \langle X \rangle \langle Y \rangle \rangle)
=\frac{1}{2i} ( \langle XY \rangle - \langle YX \rangle)
=\frac{1}{2i} \langle [X,Y] \rangle

The inequality then becomes the Schrödinger uncertainty relation:
\sigma_X^2 \sigma_Y^2 \geq \left( \frac{1}{2} \langle \{ X,Y \} \rangle - \langle X \rangle \langle Y \rangle \right)^2 + \left( \frac{1}{2i} \langle [X,Y] \rangle \right)^2
If both operators X and Y correspond to observables (both are Hermitian), then the anti-commutator \{ X,Y \} is also Hermitian. This means it has a real expectation value. Therefore the first term is positive. This allows the inequality to be simplified:
\sigma_X^2 \sigma_Y^2 \geq \left( \frac{1}{2i} \langle [X,Y] \rangle \right)^2  
This allows us to get an uncertainty principle between two observable quantities X and Y by knowing how they commute.

Proof that the anti-commutator of two Hermitian operators is also Hermitian:
\{X,Y\} = XY + YX
\{X,Y\}^* = (XY + YX)^* = YX + XY = \{X,Y\}
Therefore this is Hermitian with a real expectation value.
This does not hold for the commutator however:
[X,Y] = XY - YX
[X,Y]^* = (XY-YX)^* = YX - XY = [Y,X]
Therefore this is not Hermitian. It is anti-Hermitian, with an imaginary expectation value.

Heisenberg Uncertainty Principle

To get the famous result, we plug in position:
X = x
and momentum:
Y = p =- i\hbar \frac{d}{dx}
Plug these into the simplified uncertainty inequality we arrived at previously:
\sigma_x^2 \sigma_p^2 \geq \left( \frac{1}{2i} \langle [x,p] \rangle \right)^2
We know what the commutator is from equation 3, and we have:
\sigma_x^2 \sigma_p^2 \geq \left( \frac{1}{2i} i\hbar \right)^2
\sigma_x^2 \sigma_p^2 \geq \frac{\hbar^2}{4}
\sigma_x \sigma_p \geq \frac{\hbar}{2}

It is sometimes written this way:
\Delta x \Delta p \geq \frac{h}{4\pi}
where \hbar = \frac{h}{2\pi} and \Delta x = \sigma_x
The product of uncertainty in position and momentum must be greater than a positive number. Meaning you can never know both to perfect precision. Hence the name “Uncertainty Principle”.
Similarly, it can be shown that time and energy have the same relationship:
\Delta t \Delta E \geq \frac{\hbar}{2}
You can find similar uncertainty relationships when dealing with angular momentum and other observables. Not all pairs of observable quantities exhibit this limitation. The ones that do are called “complementary variables.” If you can evaluate the commutator of these, then you can get the corresponding uncertainty principle.