Category Archives: physics

Strangeness from Adding a Dimension

The shortest distance between two points is a straight line (if there’s nothing in the way). But how can we prove this?

Imagine some arbitrarily curved path defined by the function y between two fixed points: this is clearly not the shortest path. Figure below shows curved path between two fixed points (spanning the interval of x=a to x=b).


What is the distance along this path between these two fixed points? Solving this problem is clearly not as simple as using the Pythagorean theorem, or is it?

If we break up the length of the path s into infinitesimal segments of size ds, we can then use the Pythagorean theorem to characterize the length of the segments:
d\vec{s} = (dy,dx)
Where dx and dy are the infinitesimal legs, and ds is the hypotenuse.
ds = \sqrt{dx^2 + dy^2}

Figure below shows an infinitesimal line segment of a curved path.


Computing the entire length of the path is then a matter of adding up all these infinitesimal ds segments. To do this, take note that the first derivative of the path’s function y is the ratio of the infinitesimal legs (this is the same thing as slope).
y_x = \frac{dy}{dx}
and so the length of the hypotenuse can be expressed in terms of the derivative of y:
ds = \sqrt{dx^2 + dy^2} = \sqrt{1 + \left(\frac{dy}{dx} \right)^2} \, dx = \sqrt{1 + y_x^2} \, dx

The total length s along path from x = a to x = b is then the sum of all the ds segments from a to b. You find the sum of infinitesimals by taking the integral:
s =  {\displaystyle \int} ds = {\displaystyle \int_a^b} \sqrt{1 + y_x^2}\, dx

Alright, so we are able to measure the length of a path by evaluating this integral. But how does this prove the shortest path is a straight line? For this we need to delve into what is known as “calculus of variations”. What we want to do is minimize the distance along the path between two points, and see if the resulting function y defines a straight line.

For the sake of simplicity, we can express the integrand above as L.
s = {\displaystyle \int_a^b} L\, dx
L = \sqrt{1 + y_x^2}

A more generalized path from a to b is a function of position y and slope y_x, which we can write as:
L = L(y,y_x)
which is also know as a Lagrangian.

We want to minimize the length of path, so that means we want to minimize any deviation, or variation from the shortest path.
To characterize how much variation is in the length of a path, we can vary the integral using \delta to signify what is being varied. This works like a derivative operator:
\delta s = {\displaystyle \int_a^b} \delta L\, dx

Then inside the integral we have a varied Lagrangian:
\delta L = \frac{\partial L}{\partial y} \delta y + \frac{\partial L}{\partial y_x} \delta y_x

Here we’ll use some substitution from the product rule, to substitute the second term, so that we are varying in terms of \delta y and not both \delta y and \delta y_x:
\frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right) = \frac{d}{dx} \frac{\partial L}{\partial y_x } \delta y + \frac{\partial L}{\partial y_x } \delta y_x

\delta L = \frac{\partial L}{\partial y} \delta y - \frac{d}{dx} \frac{\partial L}{\partial y_x } \delta y  + \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right)

= \left( \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x } \right) \delta y + \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right)
So now all the variation is only in terms of \delta y.

We can now look at the whole integral and evaluate it.
\delta s = {\displaystyle \int_a^b} \delta L dx

= {\displaystyle \int_a^b} \left( \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x } \right) \delta y dx +  {\displaystyle \int_a^b} \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right) dx

The second integral term is easily evaluated
{\displaystyle \int_a^b} \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right) dx = \frac{\partial L}{\partial y_x } \delta y \bigg|_a^b

At the endpoints of our path x = a and b there is no variation, since they are fixed by definition. So \delta y = 0. And the second term is equal to 0, so we are left with:
\delta s = {\displaystyle \int_a^b} \left( \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x } \right) \delta y dx

Figure below shows variation between the endpoints, but no variation at the endpoints.


Now if we want to minimize the variation in path length, we set the variation to zero: \delta s = 0, and since \delta y \neq 0 between the endpoints, that means:
0 = \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x }
This result is known as the Euler-Lagrange equation, and it is used to compute things like equations of motion.

We can then plug in our Lagrangian from earlier L = \sqrt{1 + y_x^2} into the Euler-Lagrange equation. The first term is easy:
\frac{\partial L}{\partial y} = 0, since there’s no y dependence in our L.
So this only leaves the other term
\frac{d}{dx} \frac{\partial L}{\partial y_x } = 0
which implies
\frac{\partial L}{\partial y_x } = C
Where C is an arbitrary constant from integration.

We then compute the other term:
\frac{\partial L}{\partial y_x } = \frac{y_x}{\sqrt{1 + y_x^2}}
\frac{y_x}{\sqrt{1 + y_x^2}} = C
Now we can solve for y_x so we can then get the form of y:
y_x = \frac{C}{\sqrt{1-C^2}}
Since C is an arbitrary constant, we can just say:
y_x = m
dy = m\,dx

Integrate both sides to get the form of y
We get a factor of x and an arbitrary constant b
y = mx + b
This is a straight line!
So the path with the minimal variation is a straight line.

Figure below shows blue line with arbitrary variation, and black line with minimal variation.

What about higher dimensions of space? We just saw a 1-D path in a 2-D space. (The path with the minimal variation is a straight line). What about a 2-D surface in 3-D space? What is the surface with minimal variation? Instead minimizing length, we’ll now be minimizing surface area.

To characterize the nature of a surface we resort to infinitesimal segments again: except here they are planes instead of hypotenuses.
Figure below shows that a curved surface can be thought of as many infinitesimal planes stitched together:

Here we will use the fact that a plane can be defined by two vectors, or rather the vector resulting from their cross product:
d\vec{u} = (dx,0,dz)
d\vec{v} = (0,dy,dz)
d\vec{A} = d\vec{u} \times d\vec{v} = (-dz\,dy, -dz \, dx, dx \, dy)

Figure below shows how infinitesimal vectors form an infinitesimal plane (segment of a surface):


The magnitude of the cross product vector defining this plane is equivalent to the area between the two original vectors (the area of the plane):
dA = \sqrt{dz^2 dy^2 + dz^2 dx^2 + dx^2 dy^2}
Using the same factoring method we used before, we can express area in terms of derivatives:
dA = \sqrt{ \big( \frac{dz}{dx} \big)^2+ \big( \frac{dz}{dy} \big)^2 + 1} \, dxdy = \sqrt{z_x^2 + z_y^2 + 1} \, dxdy

To add up the infinitesimal area segments we take the integral along both dimensions x and y:
A = \iint\limits_{S} \sqrt{z_x^2 + z_y^2 + 1}\,dxdy

The integrand here is also a Lagrangian:
A = \iint\limits_{S} L \,dxdy
L = \sqrt{z_x^2 + z_y^2 + 1}

Just like with the line, this surface Lagrangian generalizes to:
L = L(z,z_x,z_y)
Notice now there are two derivatives in the Lagrangian.

We can vary this Lagrangian too to minimize the area:
\delta A = \iint\limits \delta L \,dxdy
\delta L = \frac{\partial L}{\partial z} \delta z + \frac{\partial L}{\partial z_x} \delta z_x + \frac{\partial L}{\partial z_y} \delta z_y

We can also use the same substitution trick we used before to vary only in terms of \delta z:
\frac{d}{dx} \left( \frac{\partial L}{\partial z_x } \delta z \right) = \frac{d}{dx} \frac{\partial L}{\partial z_x } \delta z + \frac{\partial L}{\partial z_x } \delta z_x
\frac{d}{dy} \left( \frac{\partial L}{\partial z_y } \delta z \right) = \frac{d}{dy} \frac{\partial L}{\partial z_y } \delta z + \frac{\partial L}{\partial z_y} \delta z_y

\delta L = \frac{\partial L}{\partial z} \delta z + \frac{d}{dx} \left( \frac{\partial L}{\partial z_x } \delta z \right) - \frac{d}{dx} \frac{\partial L}{\partial z_x } \delta z + \frac{d}{dy} \left( \frac{\partial L}{\partial z_y } \delta z \right) - \frac{d}{dy} \frac{\partial L}{\partial z_y } \delta z

\delta A = \iint \frac{\partial L}{\partial z} \delta z \,dxdy + \int \frac{\partial L}{\partial z_x } \delta z \,dy  \big|- \iint \frac{d}{dx} \frac{\partial L}{\partial z_x } \delta z \,dxdy + \int\frac{\partial L}{\partial z_y } \delta z \,dx \big|- \iint \frac{d}{dy} \frac{\partial L}{\partial z_y } \delta z \,dxdy
At the endpoints \delta z = 0, so the terms with pipes vanish.

If we then plug \delta L back into the surface integral and minimize variation:
\delta A = 0
0 = \iint \left(  \frac{\partial L}{\partial z}  - \frac{d}{dx} \frac{\partial L}{\partial z_x }  - \frac{d}{dy} \frac{\partial L}{\partial z_y } \right)\delta z \, dxdy

Between the endpoints there is variation \delta z \neq 0:
So the Euler-Lagrange equation in the 2-D case is:
0 = \frac{\partial L}{\partial z}  - \frac{d}{dx} \frac{\partial L}{\partial z_x }  - \frac{d}{dy} \frac{\partial L}{\partial z_y }

We can then plug in our Lagrangian into this Euler-Lagrange equation:
L = \sqrt{z_x^2 + z_y^2 + 1}

What we end up with is something not as simple as a straight line. We get an equation that describes what are called minimal surfaces:
0 = z_{xx} (z_y^2 + 1) + z_{yy} (z_x^2 + 1) - 2 z_x z_y z_{xy}
This equation is also known as Lagrange’s equation.

The higher dimensional analog to a straight line, a plane, satisfies Lagrange’s equation
z = \alpha x + \beta y + \gamma
Plug it in and see. But it’s not the only solution!

In 2-D, the only one solution is a straight line. The strangeness comes when we add a dimension. In 3-D, there are more solutions than just the plane!
Other solutions include catenoids, helicoids, and weird things like the Saddle Tower.

Figure below is a catenoid:


Minimal surfaces can be created by dipping wire frames into soapy water. Surprising to me though is that a sphere or spherical bubble: r^2 = x^2 + y^2 + z^2 is not a minimal surface! (If you plug it into the Lagrange equation, you get back a contradiction.)

With no outside forces, a line gives the shortest distance between two points, a plane, a catenoid, etc give the 2-D version of this. But what happens when outside forces, like gravity are present? The shortest distance may no longer be a straight line. The minimal surface may no longer be a flat plane. This is where you can derive things like the shape of catenary cables, and brachistochrones. I’ll leave this topic of applying force for another day.


Ehrenfest theorem allows us to see how physical quantities evolve through time in terms of other physical quantities. In quantum mechanics, physical quantities, like momentum or position, are represented by operators. To get the average value of a physical quantity of a system, we act the operator on a wavefunction. The wavefunction represents the physical system’s probabilistic behavior. The average of a physical quantity is called an “expectation value”.

The expectation value of an operator \mathscr{O} is:
\left< \mathscr{O} \right> = \left< \Psi \left| \mathscr{O} \right| \Psi \right> = \int dx \Psi^{\dagger} \mathscr{O} \Psi
where \Psi is a wavefunction \Psi(x).

This definition allows us to take a time derivative of the expectation value.
\frac{d}{dt}\left< \mathscr{O} \right> = \frac{d}{dt} \left< \Psi \left| \mathscr{O} \right| \Psi \right> = \frac{d}{dt} \int dx \Psi^{\dagger} \mathscr{O} \Psi
The total derivative moves inside the integral as a partial derivative, where we use the product rule to differentiate:
= \int dx (\frac{\partial}{\partial t} \Psi^{\dagger} \mathscr{O} \Psi + \Psi^{\dagger} \frac{\partial}{\partial t} \mathscr{O} \Psi + \Psi^{\dagger} \mathscr{O} \frac{\partial}{\partial t} \Psi)

A time derivative acting on a wave function is equivalent to a Hamiltonian operator acting on a wave function (with a factor of \frac{1}{i \hbar}):
\frac{\partial}{\partial t} \Psi = \frac{1}{i \hbar} H \Psi
Take the complex conjugate:
\frac{\partial}{\partial t} \Psi^{\dagger} = \frac{-1}{i \hbar} \Psi^{\dagger} H

We can then swap the time derivatives for Hamiltonians:
\frac{d}{dt}\left< \mathscr{O} \right> = \int dx ( \frac{-1}{i \hbar} \Psi^{\dagger} H \mathscr{O} \Psi + \Psi^{\dagger} \frac{\partial}{\partial t} \mathscr{O} \Psi + \frac{1}{i \hbar} \Psi^{\dagger} \mathscr{O} H \Psi)
= \frac{-1}{i \hbar} \left< \Psi \left| H \mathscr{O} \right| \Psi \right> + \left< \Psi \left| \frac{\partial}{\partial t} \mathscr{O} \right| \Psi \right> + \frac{1}{i \hbar} \left< \Psi \left| \mathscr{O} H \right| \Psi \right>
The first and last terms can be combined using a commutator:
= \left< \Psi \left| \frac{\partial}{\partial t} \mathscr{O} \right| \Psi \right> + \frac{1}{i \hbar} \left< \Psi \left| [\mathscr{O},H] \right| \Psi \right>

So then we have Ehrenfest Theorem, relating the time derivative of an expectation value to the expectation value of a time derivative:
\frac{d}{dt}\left< \mathscr{O} \right> = \left< \frac{\partial}{\partial t} \mathscr{O} \right> + \frac{1}{i\hbar}\left< [\mathscr{O},H] \right>

The general form of the Hamiltonian H, has a momentum-dependent kinetic energy term, and a position-dependent potential energy term V(x).
H = \frac{p^2}{2m} + V(x)

As an example, let’s see how changes in momentum over time can be expressed.
\frac{d}{dt}\left< p \right> = \left< \frac{\partial}{\partial t} p \right> + \frac{1}{i\hbar}\left< [p,H] \right>
Momentum doesn’t have an explicit time-dependence, so the first term is zero. Further, operators commute with themselves: [p,p^2] = 0, so the Hamiltonian reduces to just the potential energy term. So we’re left with:
\frac{d}{dt}\left< p \right> = \frac{1}{i\hbar}\left< [p,V(x)] \right>

To see what this remaining commutator of operators reduces to, we will have to use a little calculus. Momentum expressed in terms of position is essentially the derivative operator:
p = -i\hbar \frac{d}{dx}. Keep in mind that potential energy is a function of position which is why momentum does not commute with it.
Since we are dealing with operators, we need them to act on something: we’ll use a dummy wavefunction \Phi(x). Then just remember the product rule for derivatives:
[p,V]\Phi = -i\hbar[\frac{d}{dx},V]\Phi = -i\hbar(\frac{d}{dx}V\Phi - V\frac{d}{dx}\Phi) = -i\hbar(\frac{d}{dx}V)\Phi
So then we can see that the change in the expectation value of momentum with respect to time is:
\frac{d}{dt}\left< p \right> =-\left< \frac{d}{dx}V(x) \right>
On the left you have impulse over change in time, and on the right you have change in potential energy over change is position. Both are ways of measuring force in classical physics. In fact, this Newton’s second law!
\frac{d}{dt}\left< p \right> = \left< F \right>

If we do the same for change in expectation value of position with respect to time, we get an equation for velocity:
\frac{d}{dt}\left< x \right> = \left< \frac{\partial}{\partial t} x \right> + \frac{1}{i\hbar}\left< [x,H] \right>
Again, position has no explicit time-dependence, so the first term is zero. However, now position commutes with potential energy (since it’s just a function of position), and position does not commute with kinetic energy (momentum). So we’re left with:
\frac{d}{dt}\left< x \right> = \frac{1}{i\hbar} \frac{1}{2m}\left< [x,p^2] \right>
If we go through the dummy wavefunction process we’ll arrive at:
\frac{d}{dt}\left< x \right> = \frac{\left< p \right>}{m}

What if we try a much more complication operator, like the translation operator? I covered how this operator works in a previous blog entry. A translation operator shifts a wavefunction’s position by some distance a:
T(a)\Psi(x) = \Psi(x+a)
What does the change in this expectation value of this operator look like?
\frac{d}{dt}\left< T(a) \right> = \left< \frac{\partial}{\partial t} T(a) \right> + \frac{1}{i\hbar}\left< [T(a),H] \right>
There’s no explicit time dependence here, so the first term is zero. The commutator term is all that is left. We must now consider the explicit form of T(a).
T(a) = e^{a \frac{d}{dx}}
It only contains x derivative operators (momentum operators) and so it commutes with the kinetic energy term in the Hamiltonian, but not the potential energy term.
\frac{d}{dt}\left< T(a) \right> = \frac{1}{i\hbar}\left< [e^{a \frac{d}{dx}},V(x)] \right>
How do we evaluate something like this with a derivative operator in the exponent? This is what Taylor expansions are for!
e^{a \frac{d}{dx}} = 1+ \sum\limits_{n=1}^{\infty} \frac{a^n}{n!} (\frac{d}{dx})^n
After using a dummy wavefunction and Pascal’s triangle a bit, you get:
\frac{d}{dt}\left< T(a) \right> = \frac{1}{i\hbar}\sum\limits_{n=1}^{\infty} \frac{a^n}{n!} \sum\limits_{m=0}^{n-1}  \frac{n!}{m!(n-m)!} \left< ((\frac{d}{dx})^{n-m}V) (\frac{d}{dx})^m \right>
which can be expressed as a sum of products of force derivatives and momentum.
-\sum\limits_{n=1}^{\infty} \sum\limits_{m=0}^{n-1}  \frac{1}{(i\hbar)^{m+1}}\frac{a^n}{m!(n-m)!} \left< ((\frac{d}{dx})^{n-m-1}F) p^m  \right>
This tells us the change in a translation operator with respect to time can be quantified with this particular series of force and momentum products. This result looks messy, but it seems intuitive that force would be involved here.