Thursday, June 28, 2012

Maximum Entropy Distributions

Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?

In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.

First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².

Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.

Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.

Wednesday, March 14, 2012

## Happy π Day math lovers!

We here at Say It With Science would like to celebrate it with you by sharing some interesting trivia about one of our favorite physicists, Richard Feynman, and one of our favorite constants, π (pi). The Feynman Point is a sequence of six 9’s beginning at the 762nd decimal place of π, named after Nobel Prize winning physicist Richard Feynman. Feynman had memorized π to this point so that he could end his recitation of the mathematical constant by saying “nine nine nine nine nine nine and so on…”. At this point someone less knowledgeable about mathematics might assume the number continues this way forever, however we know better. It is believed that π is a normal number, meaning that its digits are as uniformly distributed among the digits 1 through 9 (or the digits of any other base you choose to use). If π is a normal number then the chances of coming across six 9’s in a row is 0.08%. Strange occurrences like this are what makes math beautiful. π Day is a perfect reason to start memorizing as many digits of π as you can! Happy π Day!

Friday, October 28, 2011

The Virial Theorem

In the transition from classical to statistical mechanics, are there familiar quantities that remain constant? The Virial theorem defines a law for how the total kinetic energy of a system behaves under the right conditions, and is equally valid for a one particle system or a mole of particles.

Rudolf Clausius, the man responsible for the first mathematical treatment of entropy and for one of the classic statements of the second law of thermodynamics, defined a quantity G (now called the Virial of Clausius):

G ≡ Σi(pi · ri)

Where the sum is taken over all the particles in a system. You may want to satisfy yourself (it’s a short derivation) that taking the time derivative gives:

dG/dt = 2T + Σi(Fi · ri)

Where T is the total kinetic energy of the system (Σ  ½mv2) and dp/dt = F. Now for the theorem: the Virial Theorem states that if the time average of dG/dt is zero, then the following holds (we use angle brackets ⟨·⟩ to denote time averages):

2⟨T⟩ = - Σi(Fi · ri)

Which may not be surprising. If, however, all the forces can be written as power laws so that the potential is V=arn (with r the inter-particle separation), then

2⟨T⟩ = n⟨V⟩

Which is pretty good to know! (Here, V is the total kinetic energy of the particles in the system, not the potential function V=arn.) For an inverse square law (like the gravitational or Coulomb forces), F∝1/r2 ⇒ V∝1/r, so 2⟨T⟩ = -⟨V⟩.

Try it out on a simple harmonic oscillator (like a mass on a spring with no gravity) to see for yourself. The potential Vkx², so it should be the case that the time average of the potential energy is equal to the time average of the kinetic energy (n=2 matches the coefficient in 2⟨T⟩). Indeed, if x = A sin( √[k/m] · t ), then v = A√[k/m] cos( √[k/m] · t ); then x2 ∝ sin² and v² ∝ cos², and the time averages (over an integral number of periods) of sine squared and cosine squared are both ½. Thus the Virial theorem reduces to

2 · ½m·(A²k/2m) = 2 · ½k(A²/2)

Which is easily verified. This doesn’t tell us much about the simple harmonic oscillator; in fact, we had to find the equations of motion before we could even use the theorem! (Try plugging in the force term F=-kx in the first form of the Virial theorem, without assuming that the potential is polynomial, and verify that the result is the same). But the theorem scales to much larger systems where finding the equations of motion is impossible (unless you want to solve an Avogadro’s number of differential equations!), and just knowing the potential energy of particle interactions in such systems can tell us a lot about the total energy or temperature of the ensemble.

Wednesday, August 24, 2011

## Gabriel’s Horn

Gabriel’s Horn is a three dimensional surface that contains a finite volume but has an infinite surface area. It is made by taking the two dimensional graph of y=1/x and revolving it around the x-axis (with the domain of x ≥ 1). If we look at x coordinates from 1 to a, the volume can be found by the equation: Which is really just sum of the area of each circular cross section, hence it is the integral of πr2 (with r being the distance from the x-axis to the function). The surface area of the horn can be found by the equation: (Which is slightly more complicated to derive but a full explanation can be found here.) As you can see if we let a approach infinity, the surface area diverges, whereas the volume converges to π.

Wednesday, August 17, 2011

## Vedic Multiplication

(Technically called Nikhilam Navatashcaramam Dashatah) This is a quick and simple way to multiply any two numbers. It’s easiest when the numbers are both close to a power of ten, but it will always work. The first step is to chose a power of ten that the numbers are closest to. In my example I will find the product of 14 and 12. Since 12 and 14 are close to 10 I will chose 10. 14 is 4 more than 10, and 12 is 2 more than 10, so I will write these numbers off to the side, as shown.

+4 times +2 is 8 so I write this number on the right. Then I cross add the 14 and the 2 or I add 12 and 4 to get 16. I write this number to the left, and put these two numbers together to get the right answer 168. (Although I say “put these numbers together” what is actually going on is that 16 is being multiplied by 10 then 8 is added. Knowing this will be helpful when the number on the right is larger than the chosen power of ten.)

Here’s an example with larger numbers. Since they are closer to 100, 100 is used instead of 10. This time the numbers are less than the chosen power of ten, but the same method can be used. Multiply -8 by -11 to get 88 (write that on the right), and add 89 to -8, or 92 to -11 to get 81 (write that on the left). 81 is then multiplied by 100 (since that is the power of ten we chose) and 88 is added. Hence the correct answer to 92x89 is 8188. This is a neat trick, but why does it work? consider the following algebra:

(x+a)(x+b)=c
x^2+ x*a+x*b+a*b=c
x(x+a+b)+a*b=c

Say x is the power of ten we chose. Then a and b are the the two numbers that represent how far our factors are from the chosen power of ten.

Thursday, August 11, 2011

Fractal Geometry: An Artistic Side of Infinity

Fractal Geometry is beautiful. Clothes are designed from it and you can find fractal calendars for your house. There’s just something about that infinitely endless pattern that intrigues the eye— and brain.

Fractals are “geometric shapes which can be split into parts which are a reduced-size copy of the whole” (source: wikipedia). They demonstrate a property called self-similarity, in which parts of the figure are similar to the greater picture. Theoretically, each fractal can be magnified and should be infinitely self-similar.

One simple fractal which can easily display self-similarity is the Sierpinski Triangle. You can look at the creation of such a fractal:

What do you notice? Each triangle is self similar— they are all equilateral triangles. The side length is half of the original triangle. And what about the area? The area is a quarter of the original triangle. This pattern repeats again, and again.

Two other famous fractals are the Koch Snowflake and the Mandelbrot Set

The Koch Snowflake looks like:

(source: wikipedia)

It is constructed by going in 1/3 of the of the side of an equilateral triangle and creating another equilateral triangle. You can determine the area of a Koch Snowflake by following this link.

The Mandelbrot set…

… is:

the set of values of c in the complex plane for which the orbit of 0 under iteration of the complex quadratic polynomial zn+1 = zn2 + c remains bounded. (source: wikipedia)

It is a popular fractal named after Benoît Mandelbrot. More on creating a Mandelbrot set is found here, as well as additional information.

You can create your own fractals with this fractal generator.

But what makes fractals extraordinary?

Fractals are not simply theoretical creations. They exist as patterns in nature! Forests can model them, so can clouds and interstellar gas!

Artists are fascinated by them, as well. Consider The Great Wave off Kanagawa by Katsushika Hokusai:

Even graphic artists use fractals to create mountains or ocean waves. You can watch Nova’s episode of Hunting the Hidden Dimension for more information.

Monday, August 1, 2011

SOH-CAH-TOA

This simple mnemonic (which bears a resemblance to a Native American name) is used to remember how to compute the sine, cosine and tangent of a right-angled triangle and the relationships between them.

When labeling a triangle, it is imperative to take note of the following parts:

• Hypotenuse (H) - the longest side of the triangle
• Angle (θ) - theta is often used to name the acute angle of a triangle one’s trying to find the sine/cosine/tangent of, however this is not always the case; any consistent labeling is fine
• Adjacent (A) - the side of the triangle next to the selected angle
• Opposite (O) - the side of the triangle opposite the selected angle

Note: Do not think that the way the pictured triangle above is the only way you can label a triangle. The opposite and adjacent can easily switch places if the other angle is selected!

```              Opposite

Sin θ = -----------------

```
```              Adjacent

Cos θ = -----------------

Hypotenuse

```
```              Opposite

Tan θ = -----------------

Hypotenuse```
Saturday, July 30, 2011

Monte Carlo Integration

In the course of theoretical and computational science, we sometimes encounter integrals that cannot be solved exactly, e.g. integrals without analytic solutions. One tool for numerically estimating these integrals uses the Monte Carlo method, and is called Monte Carlo (MC) integration.

Roughly, an MC integration finds the average value of a function over a region of integration and multiplies it by the hypervolume of that region. This is just an extension of the mean value theorem any first year calculus student learns about integrals over R. The first part is easy: to find the average value of the integrand f : Rn→R, we simply select M points xi uniformly distributed throughout the region of integration Ω. Then the average value ⟨f⟩ = (1/M) Σif (xi). The more difficult problem: how do we find the volume of Ω?

Luckily, the MC method makes this easy as well, and thereby becomes a useful tool for computing integrals on domains of any finite dimension. For a given region of integration Ω, it is usually easy to define some hypercube H such that Ω ⊂ H. Now instead of only trying to generate M points inside of Ω, we generate N points inside H (a much easier task, since H is naturally uniform in each dimension). Some of them will fall within Ω anyway; we again define their count to be M. But some of them will not, such that M < N. Since we can easily compute the volume of H (call it V), the volume of Ω is roughly (M/N)⋅V.

In the image above, we’ve chosen the unit circle as our region Ω. Of the 25,000 points plotted, 19,610 fell inside the unit circle and 5,391 fell outside, so M/N = 19 610/(5391+19 610) ≈ 0.7844. Since our “hypercube” is a square of “hypervolume” (or, area) V = 4, (M/N)⋅V = 3.13747 ≈ π, as we’d expect.

Suppose we want to integrate a function f over the unit circle we’ve pictured above; at each point xi inside the circle, we’d evaluate f(xi), and then we’d let ⟨f⟩ be the average of all these various calculations. (Notice that we don’t even evaluate points outside the circle.) Now we have everything we need: the volume of the region and the average value of the function over that region. Their product approximates the integral.

The pseudocode for a simple MC integration might look something like this:

```Program MonteCarlo(f, Ω, H, N):
M ← 0
F ← 0
For i, 0 → N:
p ← RandomPoint(H)
If p ∈ Ω:
F ← F + f(p)
M ← M + 1
avg ← F/M
vol ← (M/N) ∗ Volume(H)
Return avg ∗ vol
```

With a higher N, the precision of the integration will be higher. And the easiness of finding an n-dimensional hypercube volume makes Monte Carlo integration ideal for multidimensional integrals. But it has its drawbacks. Because the points are randomly distributed over H, integrals that have erratic behavior near specific points may not have those values proportionally represented, and highly oscillatory integrals may be difficult to properly average. Adaptive Monte Carlo methods can sometimes identify these points and recursively focus on the more extreme values of the function to get a better estimate. Finding a tight fit for H is also important; any large H will work in theory, but in practice the best results will come for a hypercube that fits tightly around the region of integration.

Monday, July 25, 2011

Originally, in the Newtonian formulation of classical mechanics, equations of motion were determined by summing up vector forces (à la free body diagrams). Is there a different way to find the equations of motion?

In place of drawing a free body diagram, we can represent a system more rigorously by describing its configuration space. The configuration space (often denoted Q) of a system is a mathematical space (a differential manifold) where every point in the space represents a particular state or configuration of the system. A curve drawn through a configuration space, then, represents the evolution of a system through a sequence of configurations.

Consider a rod along which a pendulum can slide. We need two numbers to describe the state of this system: the angle of the swinging pendulum and the position of the pendulum’s base along the rod. These two numbers are generalized coordinates for our system. Just like a traditional, linear vector space has a coordinate basis (like x, y, and z), our configuration space can use our generalized coordinates as a basis; let’s choose to name the position on the rod x and the angle of the pendulum φ. Since x can take any real value and φ can take any value from 0 to 2π (or 0 to 360o, if you like), the x dimension can be represented by a line (R1) and the φ dimension by a circle (or a one-sphere, S1). When we combine these dimensions, our new space — the configuration space of this system — is shaped like an infinite cylinder, R1 x S1. Just imagine connecting a circle to every point on a line… or, conversely, a line to every point on a circle.

The general process of examining a system and the constraints on its movement is a standard first step for solving mechanics problems analytically. After accounting for the constraints on a system, the ways a system can vary are called the degrees of freedom. Their count is often represented by the variable s. Notice: s = dim(Q).

Now that we’ve represented the configuration of our system, we need to talk about the forces present. There are several different ways that we can set up scalar fields on our configuration manifold that represent quantities related to the energy of the system. The simplest to deal with is often the Lagrangian, L = T - V = (Total kinetic energy) - (Total potential energy). Some fancy mathematics (a.k.a. calculus of variations) shows that when we define the Lagrangian in terms of our coordinates and their time derivatives, we can easily derive the equations of motion using the Euler-Lagrange equation.

For more complicated systems, configurations spaces may look different. A double pendulum (a pendulum on a pendulum) would have the topology S1 x S1 = T2, the torus (as pictured). Many systems will have higher dimensions that prevent them from being easily visualized.

Exercise left to the reader: the Lagrangian explicitly takes the time derivatives of the coordinates as arguments; information about the velocities of the system is needed to derive the equations of motion. But this information isn’t included in Q, so Lagrangian dynamics actually happens on TQ, the tangent bundle to Q. This new manifold includes information about how the system changes from every given configuration; since it needs to include a velocity coordinate for each configuration coordinate, dim(TQ) = 2s. TQ is also called Γv, the velocity phase space. T*Q, the cotangent bundle to Q, is the dual of TQ, and is traditionally just called the phase space, Γ; this is where Hamiltonian mechanics takes place.

Thursday, July 21, 2011

(Continued from previous post on the del operator, here)

When ∇ is applied to a vector-valued function, it takes the form of what is called the divergence of the vector function, V(x,y,z), when V is defined by V(x,y,z) = P(x,y,z) i + Q(x,y,z) j + R(x,y,z) k, where P, Q, and R, are scalar functions. The divergence is the dot-product of the del operator and V:

The divergence of V may consequently be thought of as a scalar field. The physical significance of the divergence of a vector field is that it is the amount of density “flowing” into, or out of, a given region of space (visualizing the vector field as a quasi fluid-like entity — consequences in electromagnetism and fluid dynamics follow immediately). Alternatively, it may be visualized as how a particular point behaves as a “source” from a point (in which density is decreasing) or a “sink” toward a point (in which density is increasing).

Another way in which ∇ is applied to vector-valued functions is the scenario in which, instead of the dot-product, the cross-product (or vector-product) of ∇ and the vector function V is taken — creating what is known as the curl of V:

(Note: the matrix determinant does not directly correlate to the definition of the operation, but it is merely a method of better remembering how to carry out the curl calculation.) The curl represents the infinitesimal rotation of a 3-dimensional vector field. This rotation is represented by a vector, whose length is proportional to the magnitude of rotation, and whose direction is that of said rotation about V. The intuitive interpretation of this operation may be represented as this: if, in a vector field describing the motion of a viscous fluid; there is a ball with an imperfect surface, then the ball will rotate due to the motion of the fluid about it.

The Laplace operator (also known as the Laplacian) is a differential operator which is defined to be the divergence of the gradient of a scalar field (in Euclidean space):

The Laplace operator has analogs in many coordinate systems, and for that reason its intuitive interpretations have many forms. In physical systems, the Laplace operator may be interpreted as the flux density (flux per unit area) of the gradient flow of a scalar field. Common applications lie in the analysis of gravitational potential fields, electric potential fields, and heat flow. This operator gives rise to the d’Alembert operator, which is its analog in 4-dimensional space-time. albanhouse delves into this particular operator’s connection to electromagnetism and tensor analysis in this recent post on Maxwell’s equations.