Category Archives: math

How to Measure Euler’s Number by Shuffling Cards.

Previously on this blog, I have talked about Buffon’s Needle, which is a way to measure \pi using random numbers. In this entry, I will be talking about a way to measure another fundamental constant e, also by using random numbers.

To measure e, we will shuffle a deck of cards many times and count the number of shuffles that result in the deck being deranged. This means that no card remains in its original position in the deck. This type of arrangement is called a “derangement”.

We will start by computing the theoretical probability (p) getting a derangement, which turns out to be 1/e or about 36.7%, for a sufficiently large number of cards. We will then experimentally measure e by shuffling cards.

To compute the theoretical probability p, we need to understand two numbers:
The first is the total number of arrangements that the cards can be in (R).
The second is how many of those arrangements are derangements (D).
The theoretical probability is the ratio of those two numbers:
p = D/R
and it approaches 1/e.
(Or you could say R/D approaches e.)

The first number is easy, it’s just the factorial of the number of cards in the deck (n).
R_n = n!
The second number D_n is not as straightforward. We have to count the number of derangements a deck can have.

To show a simple example of counting derangements, let’s look at a deck of only 3 cards:
cards
There are 6 ways to arrange these 3 cards (3! = 6)
A,2,3
A,3,2
2,A,3
3,A,2  (this one is a derangement)
2,3,A  (this one is a derangement)
3,2,A

Only 2 of these arrangements have every card moved from its original position. So the number of derangements for 3 cards is 2. So in this case, the probability of getting a derangement is about 33%.
D_3 = 2
p_3 = D/R = 2/3! = 1/3
That’s close to 1/e, but not quite, because it turns out 3 cards are just not enough to produce that probability. We need more.

In general, with a deck of n cards, we can arrange any number k of those cards in P_{nk} number of ways, where:
P_{nk} = \frac{n!}{(n-k)!}
We can use this formula to count the number of ways there are to rearrange a subset k of the cards, while n-k cards remain in a fixed position (we are counting the number of permutations).

Note that n-k \neq 0, then by definition, the corresponding arrangements will not be derangements. If we can count all of these, then we can subtract this number of arrangements from the total number n! to get the number of derangements.

Let’s see this in action by going back to the deck of 3 cards. If we select all the cards to rearrange, we can get all 6 possible permutations:
P_{3,3} = \frac{3!}{(3-3)!} = \frac{3!}{0!} = 3! = 6
A,2,3
A,3,2
2,A,3
3,A,2
2,3,A
3,2,A

If we select only 2 cards out of the 6 to rearrange, we only get a subset of the arrangements. However if we count how many there are with the formula we get 6 permutations:
P_{3,2} = \frac{3!}{(3-2)!} = \frac{3!}{1!} = 3! = 6
So there are 6 ways to arrange 2 cards (6 ways to hold 1 card fixed). But why do we get 6 permutations here? Well, let’s look at all the ways to do this.
If we select 2 and 3 to rearrange, we get 2 permutations:
A,2,3
A,3,2
If we select A and 3 to rearrange, we get 2 permutations:
A,2,3
3,2,A
And if select A and 2 to rearrange, we get 2 permutations:
A,2,3
2,A,3
Notice however that A,2,3 occurs 3 times, and so there are actually 4 unique arrangements here, and they are not derangements by definition. So we have 6 - 4 = 2 derangements for 6 cards.
But is there a way to count unique arrangements without doing it by hand like we did here? Let’s keep going with this pattern.

If we select only 1 card to rearrange, then that’s trivial, because you can’t rearrange one card.
P_{3,1} = \frac{3!}{(3-1)!} = \frac{3!}{2!} = 3
If selecting A:
A,2,3
If selecting 2:
A,2,3
If selecting 3:
A,2,3
All 3 permutations are the same arrangement!

Then if we select no cards to rearrange, this is also trivial.
P_{3,0} = \frac{3!}{(3-0)!} = 1
If selecting nothing:
A,2,3
That’s just 1 permutation.

All duplicate permutations, are accounted for here. The number of permutations for P_{3,0} accounts for one of the duplicates in P_{3,1}:
P_{3,1} - P_{3,0} = 3 - 1 = 2
The permutations left in this difference account for 2 of the duplicates in P_{3,2}:
P_{3,2} - (P_{3,1} - P_{3,0}) = 6 - 2 = 4
And you are left with the unique 4 arrangements that are not derangements.
So to get the number of derangements, we just have to subtract this from the total number of arrangements P_{3,3}:
P_{3,3} - (P_{3,2} - (P_{3,1} - P_{3,0})) = 6 - 4 = 2
And so we see that D_3 = 2

We have a pattern here. We can see that by distributing the negative signs in the above equation we get:
D_3 = P_{3,3} - P_{3,2} + P_{3,1} - P_{3,0}

This can be expressed as a sum:
D_3 = \sum\limits_{k = 0}^3 P_{3,k} (-1)^{3-k}
Or more explicitly:
D_3 = \sum\limits_{k = 0}^3 \frac{3!}{(3-k)!} (-1)^{3-k}

It turns out that this pattern holds for decks of any number of cards n (just replace the 3 here):
D_n = \sum\limits_{k = 0}^n \frac{n!}{(n-k)!} (-1)^{n-k}

For a deck of 3 cards, the number of derangements is: D_3 = 2
For a deck of 4 cards: D_4 = 9
For a deck of 5 cards: D_5 = 44
which then give us the probabilities of derangement:
p_3 = 2/3! = 33.3\%
p_4 = 9/4! = 37.5\%
p_5 = 44/5! = 36.7\%
So by 4 cards, we start seeing a probability closer to 1/e.

Now, to get the probability p of getting a derangement for n cards:
p_n = \frac{D_n}{R_n} = \frac{1}{n!}\,\sum\limits_{k = 0}^n \frac{n!}{(n-k)!} (-1)^{n-k}
The n! factors cancel out, and we are left with:
p_n = \sum\limits_{k = 0}^n \frac{1}{(n-k)!} (-1)^{n-k}

Since the order of addition doesn’t matter, we can change the direction of this sum to start at n and end at 0:
p_n = \sum\limits_{k = n}^0 \frac{1}{(n-k)!} (-1)^{n-k}
If we do some relabeling, the relationship to e will become more obvious. Let’s set m = n-k, so we can write:
p_n = \sum\limits_{m=0}^n \frac{1}{m!} (-1)^{m}
This looks like a truncated Taylor expansion of the exponential function e^x.
A Taylor expansion is a way of expressing a function as a polynomial.

The Taylor expansion of the exponential function is:
e^x = \sum\limits_{m=0}^\infty \frac{1}{m!} x^m
If we plug in x = -1 we get:
e^{-1} = \sum\limits_{m=0}^\infty \frac{1}{m!} (-1)^m
which is the limiting case of a deck with infinite cards:
p_\infty = \frac{1}{e}

This equivalence to the Taylor expansion is the reason why as you increase the number of cards, the probability of getting a derangement approaches 1/e.
To put it another way:
\frac{1}{p} = \frac{R}{D} \approx e.

Now that we have seen the theoretical probability, let’s do the experiment.
Take a standard 52 card deck, note the original positions of the cards, then shuffle and see if you have a derangement. Keep shuffling and noting if the result is a derangement. Keep doing this about 40,000 times. Count the total number of shuffles R and the number of derangements D you get.
If you then take the ratio \frac{R}{D} you will get something approaching e.

Just kidding. That would be extremely tedious. Instead, we can have a computer it for us.
eulerfig3

For a deck of 52 cards, it seems to take about 40k shuffles for the ratio of shuffles to derangements to approach e. See the figure above.

The code used to compute e from shuffles is shown below:
Here I used MATLAB code, to take advantage of vectorization and logical array operations. The inputs are cards (the number of cards in a deck) and games (the number of shuffles). Wins (a win is when you get a derangement) are then counted.

 function eulersnum = eulerdeck(cards,games)
 wins = 0;
 series = 1:cards;
 for ii=1:games
  shuffled = randperm(cards);
  wins = wins + ~any(series == shuffled);
 end
 eulersnum = games/wins;
end 

The known value of e is 2.7183…
To get an accurate measurement we should use a deck with a lot of cards, since that will include more terms of the Taylor expansion. We should also shuffle many times to be sure that we capture the probability.
So with a deck of 1000 cards, and 10 trials, each of 1000 shuffles, we measure e to be:
2.7265 \pm 0.1060

Advertisements

Strangeness from Adding a Dimension

The shortest distance between two points is a straight line (if there’s nothing in the way). But how can we prove this?

Imagine some arbitrarily curved path defined by the function y between two fixed points: this is clearly not the shortest path. Figure below shows curved path between two fixed points (spanning the interval of x=a to x=b).

path2

What is the distance along this path between these two fixed points? Solving this problem is clearly not as simple as using the Pythagorean theorem, or is it?

If we break up the length of the path s into infinitesimal segments of size ds, we can then use the Pythagorean theorem to characterize the length of the segments:
d\vec{s} = (dy,dx)
Where dx and dy are the infinitesimal legs, and ds is the hypotenuse.
ds = \sqrt{dx^2 + dy^2}

Figure below shows an infinitesimal line segment of a curved path.

path5

Computing the entire length of the path is then a matter of adding up all these infinitesimal ds segments. To do this, take note that the first derivative of the path’s function y is the ratio of the infinitesimal legs (this is the same thing as slope).
y_x = \frac{dy}{dx}
and so the length of the hypotenuse can be expressed in terms of the derivative of y:
ds = \sqrt{dx^2 + dy^2} = \sqrt{1 + \left(\frac{dy}{dx} \right)^2} \, dx = \sqrt{1 + y_x^2} \, dx

The total length s along path from x = a to x = b is then the sum of all the ds segments from a to b. You find the sum of infinitesimals by taking the integral:
s =  {\displaystyle \int} ds = {\displaystyle \int_a^b} \sqrt{1 + y_x^2}\, dx

Alright, so we are able to measure the length of a path by evaluating this integral. But how does this prove the shortest path is a straight line? For this we need to delve into what is known as “calculus of variations”. What we want to do is minimize the distance along the path between two points, and see if the resulting function y defines a straight line.

For the sake of simplicity, we can express the integrand above as L.
s = {\displaystyle \int_a^b} L\, dx
where:
L = \sqrt{1 + y_x^2}

A more generalized path from a to b is a function of position y and slope y_x, which we can write as:
L = L(y,y_x)
which is also know as a Lagrangian.

We want to minimize the length of path, so that means we want to minimize any deviation, or variation from the shortest path.
To characterize how much variation is in the length of a path, we can vary the integral using \delta to signify what is being varied. This works like a derivative operator:
\delta s = {\displaystyle \int_a^b} \delta L\, dx

Then inside the integral we have a varied Lagrangian:
\delta L = \frac{\partial L}{\partial y} \delta y + \frac{\partial L}{\partial y_x} \delta y_x

Here we’ll use some substitution from the product rule, to substitute the second term, so that we are varying in terms of \delta y and not both \delta y and \delta y_x:
\frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right) = \frac{d}{dx} \frac{\partial L}{\partial y_x } \delta y + \frac{\partial L}{\partial y_x } \delta y_x

\delta L = \frac{\partial L}{\partial y} \delta y - \frac{d}{dx} \frac{\partial L}{\partial y_x } \delta y  + \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right)

= \left( \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x } \right) \delta y + \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right)
So now all the variation is only in terms of \delta y.

We can now look at the whole integral and evaluate it.
\delta s = {\displaystyle \int_a^b} \delta L dx

= {\displaystyle \int_a^b} \left( \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x } \right) \delta y dx +  {\displaystyle \int_a^b} \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right) dx

The second integral term is easily evaluated
{\displaystyle \int_a^b} \frac{d}{dx} \left( \frac{\partial L}{\partial y_x } \delta y \right) dx = \frac{\partial L}{\partial y_x } \delta y \bigg|_a^b

At the endpoints of our path x = a and b there is no variation, since they are fixed by definition. So \delta y = 0. And the second term is equal to 0, so we are left with:
\delta s = {\displaystyle \int_a^b} \left( \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x } \right) \delta y dx

Figure below shows variation between the endpoints, but no variation at the endpoints.

path3

Now if we want to minimize the variation in path length, we set the variation to zero: \delta s = 0, and since \delta y \neq 0 between the endpoints, that means:
0 = \frac{\partial L}{\partial y} - \frac{d}{dx} \frac{\partial L}{\partial y_x }
This result is known as the Euler-Lagrange equation, and it is used to compute things like equations of motion.

We can then plug in our Lagrangian from earlier L = \sqrt{1 + y_x^2} into the Euler-Lagrange equation. The first term is easy:
\frac{\partial L}{\partial y} = 0, since there’s no y dependence in our L.
So this only leaves the other term
\frac{d}{dx} \frac{\partial L}{\partial y_x } = 0
which implies
\frac{\partial L}{\partial y_x } = C
Where C is an arbitrary constant from integration.

We then compute the other term:
\frac{\partial L}{\partial y_x } = \frac{y_x}{\sqrt{1 + y_x^2}}
\frac{y_x}{\sqrt{1 + y_x^2}} = C
Now we can solve for y_x so we can then get the form of y:
y_x = \frac{C}{\sqrt{1-C^2}}
Since C is an arbitrary constant, we can just say:
y_x = m
dy = m\,dx

Integrate both sides to get the form of y
We get a factor of x and an arbitrary constant b
y = mx + b
This is a straight line!
So the path with the minimal variation is a straight line.

Figure below shows blue line with arbitrary variation, and black line with minimal variation.
path4

What about higher dimensions of space? We just saw a 1-D path in a 2-D space. (The path with the minimal variation is a straight line). What about a 2-D surface in 3-D space? What is the surface with minimal variation? Instead minimizing length, we’ll now be minimizing surface area.

To characterize the nature of a surface we resort to infinitesimal segments again: except here they are planes instead of hypotenuses.
Figure below shows that a curved surface can be thought of as many infinitesimal planes stitched together:
surface

Here we will use the fact that a plane can be defined by two vectors, or rather the vector resulting from their cross product:
d\vec{u} = (dx,0,dz)
d\vec{v} = (0,dy,dz)
d\vec{A} = d\vec{u} \times d\vec{v} = (-dz\,dy, -dz \, dx, dx \, dy)

Figure below shows how infinitesimal vectors form an infinitesimal plane (segment of a surface):

SURFACEVEC

The magnitude of the cross product vector defining this plane is equivalent to the area between the two original vectors (the area of the plane):
dA = \sqrt{dz^2 dy^2 + dz^2 dx^2 + dx^2 dy^2}
Using the same factoring method we used before, we can express area in terms of derivatives:
dA = \sqrt{ \big( \frac{dz}{dx} \big)^2+ \big( \frac{dz}{dy} \big)^2 + 1} \, dxdy = \sqrt{z_x^2 + z_y^2 + 1} \, dxdy

To add up the infinitesimal area segments we take the integral along both dimensions x and y:
A = \iint\limits_{S} \sqrt{z_x^2 + z_y^2 + 1}\,dxdy

The integrand here is also a Lagrangian:
A = \iint\limits_{S} L \,dxdy
L = \sqrt{z_x^2 + z_y^2 + 1}

Just like with the line, this surface Lagrangian generalizes to:
L = L(z,z_x,z_y)
Notice now there are two derivatives in the Lagrangian.

We can vary this Lagrangian too to minimize the area:
\delta A = \iint\limits \delta L \,dxdy
\delta L = \frac{\partial L}{\partial z} \delta z + \frac{\partial L}{\partial z_x} \delta z_x + \frac{\partial L}{\partial z_y} \delta z_y

We can also use the same substitution trick we used before to vary only in terms of \delta z:
\frac{d}{dx} \left( \frac{\partial L}{\partial z_x } \delta z \right) = \frac{d}{dx} \frac{\partial L}{\partial z_x } \delta z + \frac{\partial L}{\partial z_x } \delta z_x
\frac{d}{dy} \left( \frac{\partial L}{\partial z_y } \delta z \right) = \frac{d}{dy} \frac{\partial L}{\partial z_y } \delta z + \frac{\partial L}{\partial z_y} \delta z_y

\delta L = \frac{\partial L}{\partial z} \delta z + \frac{d}{dx} \left( \frac{\partial L}{\partial z_x } \delta z \right) - \frac{d}{dx} \frac{\partial L}{\partial z_x } \delta z + \frac{d}{dy} \left( \frac{\partial L}{\partial z_y } \delta z \right) - \frac{d}{dy} \frac{\partial L}{\partial z_y } \delta z

\delta A = \iint \frac{\partial L}{\partial z} \delta z \,dxdy + \int \frac{\partial L}{\partial z_x } \delta z \,dy  \big|- \iint \frac{d}{dx} \frac{\partial L}{\partial z_x } \delta z \,dxdy + \int\frac{\partial L}{\partial z_y } \delta z \,dx \big|- \iint \frac{d}{dy} \frac{\partial L}{\partial z_y } \delta z \,dxdy
At the endpoints \delta z = 0, so the terms with pipes vanish.

If we then plug \delta L back into the surface integral and minimize variation:
\delta A = 0
0 = \iint \left(  \frac{\partial L}{\partial z}  - \frac{d}{dx} \frac{\partial L}{\partial z_x }  - \frac{d}{dy} \frac{\partial L}{\partial z_y } \right)\delta z \, dxdy

Between the endpoints there is variation \delta z \neq 0:
So the Euler-Lagrange equation in the 2-D case is:
0 = \frac{\partial L}{\partial z}  - \frac{d}{dx} \frac{\partial L}{\partial z_x }  - \frac{d}{dy} \frac{\partial L}{\partial z_y }

We can then plug in our Lagrangian into this Euler-Lagrange equation:
L = \sqrt{z_x^2 + z_y^2 + 1}

What we end up with is something not as simple as a straight line. We get an equation that describes what are called minimal surfaces:
0 = z_{xx} (z_y^2 + 1) + z_{yy} (z_x^2 + 1) - 2 z_x z_y z_{xy}
This equation is also known as Lagrange’s equation.

The higher dimensional analog to a straight line, a plane, satisfies Lagrange’s equation
plane.png
z = \alpha x + \beta y + \gamma
Plug it in and see. But it’s not the only solution!

In 2-D, the only one solution is a straight line. The strangeness comes when we add a dimension. In 3-D, there are more solutions than just the plane!
Other solutions include catenoids, helicoids, and weird things like the Saddle Tower.

Figure below is a catenoid:

catenoid2

Minimal surfaces can be created by dipping wire frames into soapy water. Surprising to me though is that a sphere or spherical bubble: r^2 = x^2 + y^2 + z^2 is not a minimal surface! (If you plug it into the Lagrange equation, you get back a contradiction.)

With no outside forces, a line gives the shortest distance between two points, a plane, a catenoid, etc give the 2-D version of this. But what happens when outside forces, like gravity are present? The shortest distance may no longer be a straight line. The minimal surface may no longer be a flat plane. This is where you can derive things like the shape of catenary cables, and brachistochrones. I’ll leave this topic of applying force for another day.

Is Math Broken?

On the sixth anniversary of our relationship, I dedicate this entry to my fiancée Amy.

What is 1 + 2 + 3+ 4+ 5 + ...? Infinite, right? Well, this is apparently not the only answer. You can find a lot of stuff on the web explaining how this sum is equivalent to the huge number -\frac{1}{12}. I’m not kidding. How can something seemingly infinite and positive be equal to a negative fraction?

Folks on the web usually show some manipulation of the sum that results in this weird answer. I have not found these methods satisfactory, so I worked through it myself in an effort to convince myself of this absurdity. In this entry I show my work.

First, I start with the the Riemann zeta Function:
\zeta(s) = \sum\limits_{n=1}^\infty \frac{1}{n^s}
and the Dirichlet eta Function:
\eta(s) = \sum\limits_{n=1}^\infty \frac{(-1)^{n+1}}{n^s}

We can show there is a relationship between these two sums.
\eta(s) - \zeta(s) = \sum\limits_{n=1}^\infty \frac{(-1)^{n+1} - 1}{n^s}
Here it can be seen that odd n terms are 0, and even n terms are -\frac{2}{n^s}. This can be reduced to:
= \sum\limits_{n=1}^\infty - \frac{2}{2^s n^s}
which is just the Riemann zeta Function with a coefficient:
= -\frac{2}{2^s}\sum\limits_{n=1}^\infty \frac{1}{n^s} = -2^{1-s} \zeta(s)
So now we can express the difference between eta and zeta as:
\eta(s) - \zeta(s) = -2^{1-s}\zeta(s)
therefore:
\eta(s) = (1-2^{1-s})\zeta(s)

If we plug in some numbers for s into the Riemann and Dirichlet Functions we get a few series that we will evaluate:
\zeta(-1) = \sum\limits_{n=1}^\infty n = 1 + 2 + 3 + 4 + 5 + ...

\eta(-1) = \sum\limits_{n=1}^\infty (-1)^{n+1} n = 1 - 2 + 3 - 4 + 5 - ...

\eta(0) =  \sum\limits_{n=1}^\infty (-1)^{n+1}  = 1 - 1 + 1 - 1 + 1 - ...

Let’s look at that third series. We can use the recursion of \eta(0) to show:
\eta(0) = 1 - \eta(0)
and by solving for \eta(0) we get:
\eta(0) = \frac{1}{2}

You can also get this result with the geometric series:
\sum\limits_{n=0}^\infty x^n = \frac{1}{1-x}
Plug in x = -1 to get:
\sum\limits_{n=0}^\infty (-1)^n = \frac{1}{2}

Now, this is weird but sort of makes sense. The partial sum of \eta(0) bounces around between 1 and 0. So this result of \frac{1}{2} is like the average of this bouncing.

The figure below shows the partial sum of \eta(0) in blue, bouncing. The “total” of \frac{1}{2} is shown in black.

onehalf

What about \eta(-1) = 1 - 2 + 3 -4 + ...? Does this sum result in some finite value like \eta(0)? To find out, let’s use a similar method to what we did previously: we will take the difference between two series to see if the result is one of the series used in the difference.
\eta(-1) - \eta(0) = \sum\limits_{n=1}^\infty  (-1)^{n+1} n - (-1)^{n+1} = \sum\limits_{n=1}^\infty  (-1)^{n+1} (n-1)
The first term of this series is 0, so it is equivalent to starting the index at n=2:
= \sum\limits_{n=2}^\infty  (-1)^{n+1} (n-1)
We can then arbitrarily adjust the index by saying n = m + 1:
= \sum\limits_{m=1}^\infty  (-1)^{m+2} m
This is equivalent to -\eta(-1). So now we have:
\eta(-1) - \eta(0) = -\eta(-1)
We know that \eta(0) = \frac{1}{2}, so it is a matter of solving for \eta(-1):
\eta(-1) = \frac{1}{4}

Weird! This implies that 1-2+3-4+5-... = \frac{1}{4}.
This is sort of like an average too though. The partial sum is bouncing around a center here too. Even though the magnitude of the partial sum is increasing, it is bouncing around a center of \frac{1}{4}.

The figure below shows the partial sum of \eta(-1) in blue. The center of \frac{1}{4} in black, and the lines bounding the partial sum in red (note where they intersect).

oneforth3

We can now take advantage of that equation we derived at the beginning to find the value for \zeta(-1):
\eta(s) = (1-2^{1-s})\zeta(s)
Plug in s = -1:
\eta(-1) = -3 \zeta(-1)
We know that \eta(-1) = \frac{1}{4}, so we can solve for \zeta(-1):
\zeta(-1) = -\frac{1}{12}
Now this is especially weird because, remember what the definition of \zeta(-1) is above: it implies that
1 + 2 + 3+ 4+ 5 + ... = -\frac{1}{12}

The partial sum of this series is not bouncing around some value, so what is this?
The figure below shows the partial sum of \zeta(-1) in blue. The result of the infinite sum of -\frac{1}{12} in black.

onetwelvth

Is math broken? Is this an inconsistency, a la Gödel? This sort of thing actually shows up in nature: namely the Casimir Effect, so there is something real happening here.

This result is also known as a Ramanujan sum: where the partial sums do not converge, yet you can arrive at a finite value that characterizes an infinite sum. This sum is not the only Ramanujan sum that causes head scratching. Learn some more of them and baffle your friends at parties.

Ehrenfestival

Ehrenfest theorem allows us to see how physical quantities evolve through time in terms of other physical quantities. In quantum mechanics, physical quantities, like momentum or position, are represented by operators. To get the average value of a physical quantity of a system, we act the operator on a wavefunction. The wavefunction represents the physical system’s probabilistic behavior. The average of a physical quantity is called an “expectation value”.

The expectation value of an operator \mathscr{O} is:
\left< \mathscr{O} \right> = \left< \Psi \left| \mathscr{O} \right| \Psi \right> = \int dx \Psi^{\dagger} \mathscr{O} \Psi
where \Psi is a wavefunction \Psi(x).

This definition allows us to take a time derivative of the expectation value.
\frac{d}{dt}\left< \mathscr{O} \right> = \frac{d}{dt} \left< \Psi \left| \mathscr{O} \right| \Psi \right> = \frac{d}{dt} \int dx \Psi^{\dagger} \mathscr{O} \Psi
The total derivative moves inside the integral as a partial derivative, where we use the product rule to differentiate:
= \int dx (\frac{\partial}{\partial t} \Psi^{\dagger} \mathscr{O} \Psi + \Psi^{\dagger} \frac{\partial}{\partial t} \mathscr{O} \Psi + \Psi^{\dagger} \mathscr{O} \frac{\partial}{\partial t} \Psi)

A time derivative acting on a wave function is equivalent to a Hamiltonian operator acting on a wave function (with a factor of \frac{1}{i \hbar}):
\frac{\partial}{\partial t} \Psi = \frac{1}{i \hbar} H \Psi
Take the complex conjugate:
\frac{\partial}{\partial t} \Psi^{\dagger} = \frac{-1}{i \hbar} \Psi^{\dagger} H

We can then swap the time derivatives for Hamiltonians:
\frac{d}{dt}\left< \mathscr{O} \right> = \int dx ( \frac{-1}{i \hbar} \Psi^{\dagger} H \mathscr{O} \Psi + \Psi^{\dagger} \frac{\partial}{\partial t} \mathscr{O} \Psi + \frac{1}{i \hbar} \Psi^{\dagger} \mathscr{O} H \Psi)
= \frac{-1}{i \hbar} \left< \Psi \left| H \mathscr{O} \right| \Psi \right> + \left< \Psi \left| \frac{\partial}{\partial t} \mathscr{O} \right| \Psi \right> + \frac{1}{i \hbar} \left< \Psi \left| \mathscr{O} H \right| \Psi \right>
The first and last terms can be combined using a commutator:
= \left< \Psi \left| \frac{\partial}{\partial t} \mathscr{O} \right| \Psi \right> + \frac{1}{i \hbar} \left< \Psi \left| [\mathscr{O},H] \right| \Psi \right>

So then we have Ehrenfest Theorem, relating the time derivative of an expectation value to the expectation value of a time derivative:
\frac{d}{dt}\left< \mathscr{O} \right> = \left< \frac{\partial}{\partial t} \mathscr{O} \right> + \frac{1}{i\hbar}\left< [\mathscr{O},H] \right>

The general form of the Hamiltonian H, has a momentum-dependent kinetic energy term, and a position-dependent potential energy term V(x).
H = \frac{p^2}{2m} + V(x)

As an example, let’s see how changes in momentum over time can be expressed.
\frac{d}{dt}\left< p \right> = \left< \frac{\partial}{\partial t} p \right> + \frac{1}{i\hbar}\left< [p,H] \right>
Momentum doesn’t have an explicit time-dependence, so the first term is zero. Further, operators commute with themselves: [p,p^2] = 0, so the Hamiltonian reduces to just the potential energy term. So we’re left with:
\frac{d}{dt}\left< p \right> = \frac{1}{i\hbar}\left< [p,V(x)] \right>

To see what this remaining commutator of operators reduces to, we will have to use a little calculus. Momentum expressed in terms of position is essentially the derivative operator:
p = -i\hbar \frac{d}{dx}. Keep in mind that potential energy is a function of position which is why momentum does not commute with it.
Since we are dealing with operators, we need them to act on something: we’ll use a dummy wavefunction \Phi(x). Then just remember the product rule for derivatives:
[p,V]\Phi = -i\hbar[\frac{d}{dx},V]\Phi = -i\hbar(\frac{d}{dx}V\Phi - V\frac{d}{dx}\Phi) = -i\hbar(\frac{d}{dx}V)\Phi
So then we can see that the change in the expectation value of momentum with respect to time is:
\frac{d}{dt}\left< p \right> =-\left< \frac{d}{dx}V(x) \right>
On the left you have impulse over change in time, and on the right you have change in potential energy over change is position. Both are ways of measuring force in classical physics. In fact, this Newton’s second law!
\frac{d}{dt}\left< p \right> = \left< F \right>

If we do the same for change in expectation value of position with respect to time, we get an equation for velocity:
\frac{d}{dt}\left< x \right> = \left< \frac{\partial}{\partial t} x \right> + \frac{1}{i\hbar}\left< [x,H] \right>
Again, position has no explicit time-dependence, so the first term is zero. However, now position commutes with potential energy (since it’s just a function of position), and position does not commute with kinetic energy (momentum). So we’re left with:
\frac{d}{dt}\left< x \right> = \frac{1}{i\hbar} \frac{1}{2m}\left< [x,p^2] \right>
If we go through the dummy wavefunction process we’ll arrive at:
\frac{d}{dt}\left< x \right> = \frac{\left< p \right>}{m}

What if we try a much more complication operator, like the translation operator? I covered how this operator works in a previous blog entry. A translation operator shifts a wavefunction’s position by some distance a:
T(a)\Psi(x) = \Psi(x+a)
What does the change in this expectation value of this operator look like?
\frac{d}{dt}\left< T(a) \right> = \left< \frac{\partial}{\partial t} T(a) \right> + \frac{1}{i\hbar}\left< [T(a),H] \right>
There’s no explicit time dependence here, so the first term is zero. The commutator term is all that is left. We must now consider the explicit form of T(a).
T(a) = e^{a \frac{d}{dx}}
It only contains x derivative operators (momentum operators) and so it commutes with the kinetic energy term in the Hamiltonian, but not the potential energy term.
\frac{d}{dt}\left< T(a) \right> = \frac{1}{i\hbar}\left< [e^{a \frac{d}{dx}},V(x)] \right>
How do we evaluate something like this with a derivative operator in the exponent? This is what Taylor expansions are for!
e^{a \frac{d}{dx}} = 1+ \sum\limits_{n=1}^{\infty} \frac{a^n}{n!} (\frac{d}{dx})^n
After using a dummy wavefunction and Pascal’s triangle a bit, you get:
\frac{d}{dt}\left< T(a) \right> = \frac{1}{i\hbar}\sum\limits_{n=1}^{\infty} \frac{a^n}{n!} \sum\limits_{m=0}^{n-1}  \frac{n!}{m!(n-m)!} \left< ((\frac{d}{dx})^{n-m}V) (\frac{d}{dx})^m \right>
which can be expressed as a sum of products of force derivatives and momentum.
-\sum\limits_{n=1}^{\infty} \sum\limits_{m=0}^{n-1}  \frac{1}{(i\hbar)^{m+1}}\frac{a^n}{m!(n-m)!} \left< ((\frac{d}{dx})^{n-m-1}F) p^m  \right>
This tells us the change in a translation operator with respect to time can be quantified with this particular series of force and momentum products. This result looks messy, but it seems intuitive that force would be involved here.

Factorials!

If you’ve ever dealt with factorials, you may have felt like yelling, and not because of the exclamation point. Evaluating something like 20! by hand will feel like forever, and you might make a mistake along the way.

20! =1 \cdot 2 \cdot 3 \cdot 4 \cdot 5 \cdot 6 \cdot 7 \cdot 8 \cdot 9 \cdot 10 \cdot 11 \cdot 12 \cdot 13 \cdot 14 \cdot 15 \cdot 16 \cdot 17 \cdot 18 \cdot 19 \cdot 20

You might ask if there a nice formula to evaluate all this multiplication quickly, since there is a nice formula for the sum of all these numbers.

1 + 2 + 3 + 4 + 5 + 6 + 7 +8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 = \frac{1}{2} \cdot 20 \cdot 21 = 210
Or more generally:
\sum\limits_{k=1}^N k = \frac{1}{2} N (N+1)
The reason why this nice addition formula works can be demonstrated with a little geometry. A sum from 1 to N can be arranged like stacked blocks in a triangle (here we’ll go up to 5).

tri1

Then the area of the triangle can be found by taking a twin triangle and fitting it together into a rectangle, then divide by 2 to get the original triangle area (to arrive at 15).

rect1

1+2+3+4+5 = 5\cdot6/2 = 15.
(These sums are called triangular numbers.)

So what about a nice formula for factorials? Well, one was discovered by a happy fellow named Adrien-Marie Legendre (pictured below). It’s called Legendre’s Formula.

Legendre

This formula allows you to express factorials as a product of prime numbers. For example:
10! = 2^8 3^4 5^2 7
This may seem ugly at first, but it is better than chugging through the multiplication to arrive at the long, boring answer of 3628800. Legendre’s formula is faster, and it gives you more information about the number. In any case, you can always evaluate 2^8 3^4 5^2 7 to get 3628800 later. The only caveat to finding N! is knowing all the prime numbers less than or equal to N.

The reason we need the primes is because we want to find the prime factors of N!
Integers are either prime or composite. Composite integers can be broken up into prime factors. For example:
12 = 2\cdot 2\cdot 3
10 = 2\cdot 5
51 = 3\cdot 17
Since a factorial number is definitely composite, it can be broken down into prime factors as well.

Let’s use 10! as our example:
10! = 1 \cdot 2 \cdot 3 \cdot 4 \cdot 5 \cdot 6 \cdot 7 \cdot 8 \cdot 9 \cdot 10
We want to find all the prime factors of 10! First, we need to know which prime numbers are factors of 10! To do that is easy, find all the primes less than 10, which are:
{2, 3, 5, 7}
No other primes are needed, since we see all the numbers being multiplied in 10! are 10 and lower. So that is pretty convenient. We know from this that 10! = 2^a 3^b 5^c 7^d. So now we want to find a,b,c and d.

We will start with 2. How many factors of 2 are there in 10! We can start by underlining the multiples of 2 in the factorial:
1 \cdot \underline{2} \cdot 3 \cdot \underline{4} \cdot 5 \cdot \underline{6} \cdot 7 \cdot \underline{8} \cdot 9 \cdot \underline{10}
There are 5 multiples of 2. So make a list of 1 to 5 and find the multiples of 2 in that list:
1, \underline{2}, 3, \underline{4}, 5
There are 2 multiples of 2 in this list. So make a list of 1 to 2 and find multiples of 2 in that list:
1, \underline{2}
There is 1 multiple of 2 in this list, and 1 is less than 2, so we can stop.
We can then sum these numbers from each list:
5 + 2 + 1 = 8
There are 8 factors of 2 in 10!

You might have noticed that this sort of recurring list pattern works because some of the numbers between 1 and 10 have more than 1 factor of 2. For example: 4 has 2, and 8 has 3.
So you could have counted factors of 2 more directly:
1 \cdot \underline{2} \cdot 3 \cdot \underline{\underline{4}} \cdot 5 \cdot \underline{6} \cdot 7 \cdot \underline{\underline{\underline{8}}} \cdot 9 \cdot \underline{10}.
Both methods are equivalent, but the previous one requires less thinking in my opinion.

The same is done with 3, 5 and 7. For 3:
1 \cdot 2 \cdot \underline{3} \cdot 4 \cdot 5 \cdot \underline{6} \cdot 7 \cdot 8 \cdot \underline{9} \cdot 10
1,2,\underline{3}
Count the underlines to see that there are 4 factors of 3 in 10!

For 5:
1 \cdot 2 \cdot 3 \cdot 4 \cdot \underline{5} \cdot 6 \cdot 7 \cdot 8 \cdot 9 \cdot \underline{10}
There are 2 factors of 5 in 10!

For 7:
1 \cdot 2 \cdot 3 \cdot 4 \cdot 5 \cdot 6 \cdot \underline{7} \cdot 8 \cdot 9 \cdot 10

Finally, 1 factor of 7 in 10!

So we can then write:
10! = 2^8 3^4 5^2 7

So if this is Legendre’s Formula, there should be a formula, right? What I showed above seems more like an algorithm, but it is equivalent to:

\nu_p (N!) = \sum\limits_{k=1}^\infty  \lfloor \frac{N}{p^k} \rfloor

Where the \lfloor \rfloor brackets are the floor function, which rounds a number downward to the next integer. For example: \lfloor 2.7 \rfloor = 2.
The resulting \nu_p is the exponent for the prime of interest p.
So going back to the 10! example, for p=2:
\nu_2(10!) =  \lfloor \frac{10}{2} \rfloor + \lfloor \frac{10}{4} \rfloor + \lfloor \frac{10}{8} \rfloor + \lfloor \frac{10}{16} \rfloor  + ... = 5+2+1 = 8
Then do the same for p= 3,5, and 7, to arrive at 2^8 3^4 5^2 7 once again.

As a side note: even though this sum technically goes to infinity, any term where p^k > N will be pushed to zero by the floor function.

Now that you know how to quickly evaluate a factorial, try evaluating 20!

The Structure Behind Trigonometric Functions

This entry is dedicated to my friend Bud in Milwaukee.

Sine, cosine, tangent, etc. are actually abbreviations for combinations of complex exponential functions:

\text{sin}(\theta) = \frac{1}{2i} (e^{i\theta} - e^{-i\theta})

\text{cos}(\theta) = \frac{1}{2} (e^{i\theta} + e^{-i\theta})

\text{tan}(\theta) = \frac{ e^{i\theta} - e^{-i\theta} }{ i (e^{i\theta} + e^{-i\theta}) }

It’s just easier to write out something like “cos”, instead of the complex exponential form. However, it is handy to know the complex exponential forms to derive trig identities on the fly. So how exactly are trig functions made of complex exponential functions?

In a previous entry, I briefly discuss how the exponential function relates angle to Cartesian coordinates. When you put an imaginary number in the exponential function it results in a complex number (a number with a real and an imaginary component). If you treat the real component like an x coordinate, and the imaginary component like a y coordinate, then any complex number can represent a point  on a 2-D plane.

To visualize what’s actually going on in terms of geometry, see the figure below. Imagine a point (pictured in red) that can travel along a unit circle (a circle with radius 1). You can track the point’s angular position \varphi along the circle, and you can also track its position in terms of x and y. There is a relationship between the two ways to track position, and this relationship can be expressed as:
e^{i\varphi} = x + iy
euler3
At this point, you may see that x = \text{cos}(\varphi) and y = \text{sin}(\varphi).
By definition, cosine is adjacent over hypotenuse h
\text{cos}(\varphi) = x/h
and sine is opposite over hypotenuse
\text{sin}(\varphi) = y/h
and since radius = h = 1, you get:
x = \text{cos}(\varphi) and y = \text{sin}(\varphi)

So e^{i\varphi} = x + iy is really:
e^{i\varphi} = \text{cos}(\varphi) + i \text{sin}(\varphi)

From this equation, you can get the equations listed at the beginning by adding or subtracting complex conjugates. For example:
Take the complex conjugate (which just means to change all the i to -i).
e^{-i\varphi} = \text{cos}(\varphi) - i \text{sin}(\varphi)
Now add it to the original to get:
e^{i\varphi} + e^{-i\varphi} = \text{cos}(\varphi) + i \text{sin}(\varphi) + \text{cos}(\varphi) - i \text{sin}(\varphi)
e^{i\varphi} + e^{-i\varphi} = 2 \text{cos}(\varphi)
Now divide by 2 and you have the identity of cosine shown at the beginning!
\text{cos}(\varphi) = \frac{1}{2} (e^{i\varphi} + e^{-i\varphi})

So how do we know that something like e^{i\varphi} breaks up into a complex number like x + iy? One way is through Taylor expansions. I won’t get into how Taylor expansions are derived, since it requires calculus. The only thing you need to know here is that a function can be expressed as an infinitely long polynomial. In the case of the exponential function:
e^{z} = 1 + z + \frac{1}{2}z^2 + \frac{1}{6}z^3 + \frac{1}{24}z^4 + ...
or to be complete, we can write it as a summation:
e^{z} = \sum\limits_{n=0}^{\infty} \frac{1}{n!} z^n

So if we replace the variable z with i \varphi we get:
e^{i\varphi} = 1 + i\varphi + \frac{1}{2}(i\varphi)^2 + \frac{1}{6}(i\varphi)^3 + \frac{1}{24}(i\varphi)^4 + ...
The terms in the polynomial with odd exponents will be imaginary, and the terms with even exponents will be real. Recall that:
i^0 = 1
i^1 = i
i^2 = -1
i^3 = -i
i^4 = 1
So then we get:
e^{i\varphi} = 1 + i\varphi - \frac{1}{2}\varphi^2 - \frac{1}{6}i\varphi^3 + \frac{1}{24}\varphi^4 + ...
We can then group all the real terms together and all the imaginary terms together:
e^{i\varphi} = \left(1 - \frac{1}{2}\varphi^2 + \frac{1}{24}\varphi^4 -... \right) + i \left( \varphi - \frac{1}{6} \varphi^3 + ... \right)
or to be complete:
e^{i\varphi} = \sum\limits_{n=0}^{\infty} \frac{(-1)^n}{(2n)!} \varphi^{2n} + i \sum\limits_{n=0}^{\infty} \frac{(-1)^n}{(2n+1)!} \varphi^{2n+1}
So you can see that e^{i\varphi} indeed breaks up into a complex number. By definition, this is equivalent to:
e^{i\varphi} = \text{cos}(\varphi) + i \text{sin}(\varphi)

It is also worth noting that this gives us the Taylor series for sine and cosine:
\text{sin}(\varphi) = \sum\limits_{n=0}^{\infty} \frac{(-1)^n}{(2n+1)!} \varphi^{2n+1}

\text{cos}(\varphi) = \sum\limits_{n=0}^{\infty} \frac{(-1)^n}{(2n)!} \varphi^{2n}

For an example, let’s derive a trig identity using the complex exponential forms.
Let’s show that:
\text{sin}(\theta) \text{cos}(\theta) = \frac{1}{2} \text{sin}(2 \theta)

First write the complex exponential forms:
\text{sin}(\theta) = \frac{1}{2i} (e^{i\theta} - e^{-i\theta})

\text{cos}(\theta) = \frac{1}{2} (e^{i\theta} + e^{-i\theta})
Then multiply them:
\frac{1}{2i} (e^{i\theta} - e^{-i\theta}) \cdot \frac{1}{2} (e^{i\theta} + e^{-i\theta})
= \frac{1}{2} \cdot \frac{1}{2i} (e^{2 i \theta} + 1 - 1 - e^{-2 i \theta})
= \frac{1}{2} \cdot \frac{1}{2i} (e^{2 i \theta} - e^{-2 i \theta})
This is the same exponential form as the sine function with a couple adjustments (a factor of 1/2 out front and a factor of 2 inside the function):
= \frac{1}{2} \text{sin} (2 \theta)

Look up some other trig identities and try to derive them using complex exponentials.

More Buffon’s Needle

In my previous blog entry, I discussed Buffon’s needle (a method of measuring \pi using random numbers). I tried to recreate what would occur in physical reality without using trig functions in the measurement of \pi. The tricky part was trying to make a uniform probability distribution along the angle \theta, which is what you expect with physical needles being dropped. In this entry, I will do something more exact, but less physical.

In reality, the probability density in \theta is uniform due to there being no preferential direction for a needle to point. To simulate this with uniform random numbers without using trigonometric functions (to avoid just measuring python’s \pi) is tricky. The way I did this was assign uniform random numbers (uniform probability density) to a projection of a needle, either perpendicular or parallel to the lines (with a probability of 0.5 assigned to each case). This results in a probability distribution in \theta that isn’t quite uniform, but it’s close enough to approximate \pi after a large number of needles are dropped:
2 \frac{N_T}{N_C} = \pi
where N_T and N_C are the total number of needles dropped and number of needles crossing a line, respectively.

If instead of assigning a uniform probability to either projection fairly, what if we assign it to just one? This was shown in the previous post. As the number of needles dropped increases, the ratio 2 \frac{N_T}{N_C} approaches some number around 2.52.6, shown in the figure below (\pi shown in red). Why is this the case?
DPI1

The reason why has to do with a uniform probability density along a projection (q) does not translate into a uniform probability density along \theta, see figure below. This means needles have a preferred direction. It turns out this can still be used to extract a measurement of \pi!
BuffonS2
The uniform probability density is assigned to the projection q:
\rho_q = 1/L, for 0 \leq q \leq L, and \rho_q=0 elsewhere.
To get this in terms of \theta, some substitutions are made in the probability density integral:
\int_0^L \rho_q dq = 1
The variables q and \theta are related by:
\text{cos}(\theta) = q/L
so by taking the derivative we get the substitution:
dq = -L\text{sin}(\theta) d \theta

So now the probability density integral can be written as:
\int_0^{\pi/2} \rho_q L \text{sin}(\theta) d \theta
which is also:
= \int_0^{\pi/2} \rho_{\theta} d \theta
So it gives the form of the probability density for \theta in terms of the one for q:
\rho_{\theta} = \rho_q L\text{sin}(\theta)

Plug in \rho_q = 1/L to get:
\rho_{\theta} = \text{sin}(\theta)
and it is obviously NOT uniform, it’s just the sine function!
Plotting a histogram of the number of needles per angle \theta will give the shape of the sine function:
DPI2

Following the same steps as in the previous entry, we can find out why when a uniform probability is assigned to one projection, 2 N_T/N_C approaches 2.5 instead of \pi.
For simplicity W=L, and again for a needle to be crossing a line it has to be below the upper bound:
x \leq \frac{L}{2}\text{sin}(\theta)
and the probability of a needle crossing a line is:
P = \int_0^{\frac{L}{2}\text{sin}(\theta)} \int_0^{\pi/2} \rho_x \rho_{\theta} d \theta d x
Plugging in the probability densities: \rho_x = 2/L and \rho_\theta = \text{sin}(\theta):
P = \int_0^{\pi/2} \text{sin}^2(\theta) d \theta = \pi/4

P = \frac{N_C}{N_T} when the number of needles dropped is large, so by combining these equations:
2 \frac{N_T}{N_C} = \frac{8}{\pi} = 2.52
This is the number that is being approached!
But wait, \pi is there, so we can extract it by:
4 \frac{N_C}{N_T} = \pi

By plotting this ratio as a function of number of needles dropped, we see it approaching \pi. See the figure below.
DPI3

Even though this isn’t how needles behave in reality, it’s a simpler way to measure \pi by using random numbers in a simulation. The code used to make these plots is shown below.

By the last drop:
2 \frac{N_T}{N_C} = 2.54
4 \frac{N_C}{N_T} = 3.14

import os
import sys
import math
import numpy
import pylab
import random

CL = 0
NN = 0

PIa = []
PIb = []
NNa = []
PIc = []
ANGLES = []

for ii in range(100000):

    PPy = random.uniform(0.0,1.0) #A-end y position (lines separated by unit distance)

    RRx = random.uniform(0.0,1.0) #x projection
    RRy = numpy.sqrt(1.0 - RRx**2) #y projection

    QQy = PPy + RRy #B-end y position
    QQyf = math.floor(QQy)
    PPyf = math.floor(PPy)

    NN = NN + 1
    if (QQyf-PPyf > 0.0): #if crossing line
        CL = CL + 1

    NNa.append(NN)
    PIc.append(math.pi)
    RAT = (1.0*CL)/(1.0*NN)
    if (CL==0):
        PIa.append(0)
    else:
        PIa.append(2.0/RAT)

    PIb.append(4*RAT)

    ANGLES.append(abs(math.atan(RRy/(RRx+0.0001)))/math.pi)

print PIa[-1]
print PIb[-1]

pylab.figure(1)
pylab.xlabel("NT")
pylab.ylabel("2 NT/NC")
pylab.plot(NNa,PIc,'r')
pylab.plot(NNa,PIa,'k')
pylab.savefig('DPI1.png')

pylab.figure(2)
pylab.hist(ANGLES, bins=10)
pylab.xlabel("Acute angle in factors of pi")
pylab.ylabel("Needles dropped")
pylab.savefig('DPI2.png')

pylab.figure(3)
pylab.xlabel("NT")
pylab.ylabel("4 NC/NT")
pylab.plot(NNa,PIc,'r')
pylab.plot(NNa,PIb,'k')
pylab.savefig('DPI3.png')

pylab.show()