Thursday, June 28, 2012
Maximum Entropy Distributions



Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?



In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.



First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².



Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.



Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.

Maximum Entropy Distributions

Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?

In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.

First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².

Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.

Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.

Thursday, August 11, 2011
Fractal Geometry: An Artistic Side of Infinity
Fractal Geometry is beautiful. Clothes are designed from it and you can find fractal calendars for your house. There’s just something about that infinitely endless pattern that intrigues the eye— and brain. 
Fractals are “geometric shapes which can be split into parts which are a reduced-size copy of the whole” (source: wikipedia). They demonstrate a property called self-similarity, in which parts of the figure are similar to the greater picture. Theoretically, each fractal can be magnified and should be infinitely self-similar. 
One simple fractal which can easily display self-similarity is the Sierpinski Triangle. You can look at the creation of such a fractal:

What do you notice? Each triangle is self similar— they are all equilateral triangles. The side length is half of the original triangle. And what about the area? The area is a quarter of the original triangle. This pattern repeats again, and again. 
Two other famous fractals are the Koch Snowflake and the Mandelbrot Set. 
The Koch Snowflake looks like: 
 (source: wikipedia)
It is constructed by going in 1/3 of the of the side of an equilateral triangle and creating another equilateral triangle. You can determine the area of a Koch Snowflake by following this link.
The Mandelbrot set…

… is:

the set of values of c in the complex plane for which the orbit of 0 under iteration of the complex quadratic polynomial zn+1 = zn2 + c remains bounded. (source: wikipedia)

It is a popular fractal named after Benoît Mandelbrot. More on creating a Mandelbrot set is found here, as well as additional information. 
You can create your own fractals with this fractal generator. 
But what makes fractals extraordinary?
Fractals are not simply theoretical creations. They exist as patterns in nature! Forests can model them, so can clouds and interstellar gas! 
Artists are fascinated by them, as well. Consider The Great Wave off Kanagawa by Katsushika Hokusai: 

Even graphic artists use fractals to create mountains or ocean waves. You can watch Nova’s episode of Hunting the Hidden Dimension for more information. 

Fractal Geometry: An Artistic Side of Infinity

Fractal Geometry is beautiful. Clothes are designed from it and you can find fractal calendars for your house. There’s just something about that infinitely endless pattern that intrigues the eye— and brain. 

Fractals are “geometric shapes which can be split into parts which are a reduced-size copy of the whole” (source: wikipedia). They demonstrate a property called self-similarity, in which parts of the figure are similar to the greater picture. Theoretically, each fractal can be magnified and should be infinitely self-similar. 

One simple fractal which can easily display self-similarity is the Sierpinski Triangle. You can look at the creation of such a fractal:

What do you notice? Each triangle is self similar— they are all equilateral triangles. The side length is half of the original triangle. And what about the area? The area is a quarter of the original triangle. This pattern repeats again, and again. 

Two other famous fractals are the Koch Snowflake and the Mandelbrot Set

The Koch Snowflake looks like: 

 (source: wikipedia)

It is constructed by going in 1/3 of the of the side of an equilateral triangle and creating another equilateral triangle. You can determine the area of a Koch Snowflake by following this link.

The Mandelbrot set…

… is:

the set of values of c in the complex plane for which the orbit of 0 under iteration of the complex quadratic polynomial zn+1 = zn2 + c remains bounded. (source: wikipedia)

It is a popular fractal named after Benoît Mandelbrot. More on creating a Mandelbrot set is found here, as well as additional information. 

You can create your own fractals with this fractal generator. 

But what makes fractals extraordinary?

Fractals are not simply theoretical creations. They exist as patterns in nature! Forests can model them, so can clouds and interstellar gas! 

Artists are fascinated by them, as well. Consider The Great Wave off Kanagawa by Katsushika Hokusai: 

Even graphic artists use fractals to create mountains or ocean waves. You can watch Nova’s episode of Hunting the Hidden Dimension for more information. 

Monday, July 25, 2011
Originally, in the Newtonian formulation of classical mechanics, equations of motion were determined by summing up vector forces (à la free body diagrams). Is there a different way to find the equations of motion?



In place of drawing a free body diagram, we can represent a system more rigorously by describing its configuration space. The configuration space (often denoted Q) of a system is a mathematical space (a differential manifold) where every point in the space represents a particular state or configuration of the system. A curve drawn through a configuration space, then, represents the evolution of a system through a sequence of configurations.



Consider a rod along which a pendulum can slide. We need two numbers to describe the state of this system: the angle of the swinging pendulum and the position of the pendulum’s base along the rod. These two numbers are generalized coordinates for our system. Just like a traditional, linear vector space has a coordinate basis (like x, y, and z), our configuration space can use our generalized coordinates as a basis; let’s choose to name the position on the rod x and the angle of the pendulum φ. Since x can take any real value and φ can take any value from 0 to 2π (or 0 to 360o, if you like), the x dimension can be represented by a line (R1) and the φ dimension by a circle (or a one-sphere, S1). When we combine these dimensions, our new space — the configuration space of this system — is shaped like an infinite cylinder, R1 x S1. Just imagine connecting a circle to every point on a line… or, conversely, a line to every point on a circle.



The general process of examining a system and the constraints on its movement is a standard first step for solving mechanics problems analytically. After accounting for the constraints on a system, the ways a system can vary are called the degrees of freedom. Their count is often represented by the variable s. Notice: s = dim(Q).



Now that we’ve represented the configuration of our system, we need to talk about the forces present. There are several different ways that we can set up scalar fields on our configuration manifold that represent quantities related to the energy of the system. The simplest to deal with is often the Lagrangian, L = T - V = (Total kinetic energy) - (Total potential energy). Some fancy mathematics (a.k.a. calculus of variations) shows that when we define the Lagrangian in terms of our coordinates and their time derivatives, we can easily derive the equations of motion using the Euler-Lagrange equation.



For more complicated systems, configurations spaces may look different. A double pendulum (a pendulum on a pendulum) would have the topology S1 x S1 = T2, the torus (as pictured). Many systems will have higher dimensions that prevent them from being easily visualized.



Exercise left to the reader: the Lagrangian explicitly takes the time derivatives of the coordinates as arguments; information about the velocities of the system is needed to derive the equations of motion. But this information isn’t included in Q, so Lagrangian dynamics actually happens on TQ, the tangent bundle to Q. This new manifold includes information about how the system changes from every given configuration; since it needs to include a velocity coordinate for each configuration coordinate, dim(TQ) = 2s. TQ is also called Γv, the velocity phase space. T*Q, the cotangent bundle to Q, is the dual of TQ, and is traditionally just called the phase space, Γ; this is where Hamiltonian mechanics takes place.

Originally, in the Newtonian formulation of classical mechanics, equations of motion were determined by summing up vector forces (à la free body diagrams). Is there a different way to find the equations of motion?

In place of drawing a free body diagram, we can represent a system more rigorously by describing its configuration space. The configuration space (often denoted Q) of a system is a mathematical space (a differential manifold) where every point in the space represents a particular state or configuration of the system. A curve drawn through a configuration space, then, represents the evolution of a system through a sequence of configurations.

Consider a rod along which a pendulum can slide. We need two numbers to describe the state of this system: the angle of the swinging pendulum and the position of the pendulum’s base along the rod. These two numbers are generalized coordinates for our system. Just like a traditional, linear vector space has a coordinate basis (like x, y, and z), our configuration space can use our generalized coordinates as a basis; let’s choose to name the position on the rod x and the angle of the pendulum φ. Since x can take any real value and φ can take any value from 0 to 2π (or 0 to 360o, if you like), the x dimension can be represented by a line (R1) and the φ dimension by a circle (or a one-sphere, S1). When we combine these dimensions, our new space — the configuration space of this system — is shaped like an infinite cylinder, R1 x S1. Just imagine connecting a circle to every point on a line… or, conversely, a line to every point on a circle.

The general process of examining a system and the constraints on its movement is a standard first step for solving mechanics problems analytically. After accounting for the constraints on a system, the ways a system can vary are called the degrees of freedom. Their count is often represented by the variable s. Notice: s = dim(Q).

Now that we’ve represented the configuration of our system, we need to talk about the forces present. There are several different ways that we can set up scalar fields on our configuration manifold that represent quantities related to the energy of the system. The simplest to deal with is often the Lagrangian, L = T - V = (Total kinetic energy) - (Total potential energy). Some fancy mathematics (a.k.a. calculus of variations) shows that when we define the Lagrangian in terms of our coordinates and their time derivatives, we can easily derive the equations of motion using the Euler-Lagrange equation.

For more complicated systems, configurations spaces may look different. A double pendulum (a pendulum on a pendulum) would have the topology S1 x S1 = T2, the torus (as pictured). Many systems will have higher dimensions that prevent them from being easily visualized.

Exercise left to the reader: the Lagrangian explicitly takes the time derivatives of the coordinates as arguments; information about the velocities of the system is needed to derive the equations of motion. But this information isn’t included in Q, so Lagrangian dynamics actually happens on TQ, the tangent bundle to Q. This new manifold includes information about how the system changes from every given configuration; since it needs to include a velocity coordinate for each configuration coordinate, dim(TQ) = 2s. TQ is also called Γv, the velocity phase space. T*Q, the cotangent bundle to Q, is the dual of TQ, and is traditionally just called the phase space, Γ; this is where Hamiltonian mechanics takes place.

Tuesday, July 19, 2011
The del operator, denoted with what is called the nabla symbol (an inverted delta), is a differential operator connecting differential calculus of functions to the study of vectors, and vector-valued functions. The del operator has several forms, and is defined by

where ∂/∂x indicates the partial derivative with respect to x (similarly for y and z), and i, j, and k indicate the three standard unit vectors (all in the Cartesian coordinate system).
To a scalar function, F(x,y,z), in the Cartesian coordinate system, the del operator may be applied to create what is known as the gradient of F — defined as

The inclusion of the unit vectors in ∇ lead to the gradient’s being a vector-valued function in three dimensions, and the vector consequently is directed toward the greatest increase in F, at any point (x,y,z). Its magnitude is equal to the maximum rate of increase, and hence may be used as an analog to the 1-dimensional derivative, in 3 dimensions.
When ∇ is applied to a scalar field (function) of the form F(x,y,z), and a vector field a (= <a_1, a_2, a_3>) is chosen, the directional derivative takes the form

in which the dot-product of the gradient of F and and the vector a is taken. The most common analogy is this: if ∂F/∂x gives the rate of change of F in the x direction, then ∇F • a gives the rate of change of F, in vector form, toward the vector a. a is taken to be the unit vector for this calculation. This operation allows the rate of change of a scalar field with respect to an arbitrary — and sometimes changing — direction of a vector (a need not be a vector composed only of constant components), to be calculated. Its most common application for this operation lies in the field of fluid dynamics.
Coming soon: ∇applied to vector functions!

The del operator, denoted with what is called the nabla symbol (an inverted delta), is a differential operator connecting differential calculus of functions to the study of vectors, and vector-valued functions. The del operator has several forms, and is defined by

where ∂/∂x indicates the partial derivative with respect to x (similarly for y and z), and i, j, and k indicate the three standard unit vectors (all in the Cartesian coordinate system).

To a scalar function, F(x,y,z), in the Cartesian coordinate system, the del operator may be applied to create what is known as the gradient of F — defined as

The inclusion of the unit vectors in ∇ lead to the gradient’s being a vector-valued function in three dimensions, and the vector consequently is directed toward the greatest increase in F, at any point (x,y,z). Its magnitude is equal to the maximum rate of increase, and hence may be used as an analog to the 1-dimensional derivative, in 3 dimensions.

When ∇ is applied to a scalar field (function) of the form F(x,y,z), and a vector field a (= <a_1, a_2, a_3>) is chosen, the directional derivative takes the form

in which the dot-product of the gradient of F and and the vector a is taken. The most common analogy is this: if ∂F/∂x gives the rate of change of F in the x direction, then ∇Fa gives the rate of change of F, in vector form, toward the vector a. a is taken to be the unit vector for this calculation. This operation allows the rate of change of a scalar field with respect to an arbitrary — and sometimes changing — direction of a vector (a need not be a vector composed only of constant components), to be calculated. Its most common application for this operation lies in the field of fluid dynamics.

Coming soon: ∇applied to vector functions!

Tuesday, July 12, 2011
Many people have heard of Maxwell’s famous four equations:
∇·E = 0 ∇·B = 0 ∇ x E = -∂B/∂t ∇ x B = μ0J + μ0ε0(∂E/∂t)
But did you know that they can actually all be captured in one simple expression? When we extend classical electrodynamics into Minkowski 4-space (the geometry of special relativity), the time derivatives that appear in the derivation of Maxwell’s equations can be mixed into the same mathematics that takes care of the spatial expressions. By generalizing the Laplacian (∇2) to 4-space, we obtain the d’Alembertian (□2, sometimes written without the exponent). This operator allows us to rewrite Maxwell’s equations in tensor notation as one simple formula, sometimes called the Riemann–Sommerfeld equation:
□2Aμ = -μ0Jμ
Where Aμ is the four-potential (φ, cA) (with φ and A the familiar scalar and vector potentials) and Jμ the four-current (ρc, J) (with ρ and J the familiar charge and current densities), thereby using the power of differential geometry and tensor notation to capture all of classical electrodynamics in one swift stroke.
Background
Maxwell’s Equations describe how electric and magnetic fields relate to each other and how they relate to the presence of charged particles and currents. The way we’ve presented them above is in differential form, which describes how electric fields (E) and magnetic fields (B) change in space and time; they can also be presented in integral form, which describes how the total measure of these fields scales with quantities like current density (J).
Although these equations were originally formulated in terms of fields, physicists would later find that the equations could be much cleaner when expressed in terms of potentials. A similar revolution happened in classical mechanics; although Newton’s forces were a good start, Lagrangian dynamics showed us that understanding the potential energy of the system made solving for equations of motion much simpler than trying to add up dozens of force vectors. In electrodynamics, we have two kinds of potentials: the scalar potential φ (or, sometimes, V), which most people know by the term “voltage”; and the vector potential, A, which is related to magnetic fields and may be less familiar.
Tensors notation (for our purposes) is little more than a convenient way to represent vectors and matrices. When you read the Riemann-Sommerfeld equation, then, you’re really reading 4 equations at once. What are those equations? Repeat the equation four times but each time replace μ with a different value; so, one equation might be □2A1 = -μ0J1. The 1’s in this equation aren’t exponents, but rather tensor indices; just like you might see vector components written as vx, vy, and vz representing different components of v, A0, A1, A2, and A3 represent the four components of the (vector) tensor A.
What are the funny upside-down triangles? The symbol is called a nabla, and is often read as “del”. In some ways, it’s an abuse of notation, but it gives us a convenient way to write the gradient, divergence, curl, and Laplacian operators. The exercise for how this notation works is left to the reader, but note the definitions: ∇U = grad(U), the gradient; ∇·U = div(U), the divergence; ∇ x U = curl(U), the curl; and ∇2U = ∇·(∇U) = div(grad(U)), the Laplacian. Each of these functions has a very important place in vector analysis. Since we represent electric and magnetic fields mathematically as vector fields, these operations give us information about the geometry of those fields. The gradient function tells us the direction that a field changes most at each point; the divergence tells us how much the field is dispersing (a field where everything pointed away from one center point would have a high divergence); the curl tells us how much a field is swirling, so to speak; and the Laplacian tells us how much a field changes in strength as it disperses.
And finally, we have our partial derivatives (∂B/∂t, for example) that give us rates of change (like how a magnetic field B changes with respect to the time t), and our constants, ε0 and μ0. These constants have the interesting relation that, for the speed of light c, ε0μ0=1/c2.
Picture credit: Geek3, from Wikipedia. Licensed under CC-3.0 Attribution Share-Alike.

Many people have heard of Maxwell’s famous four equations:

∇·E = 0
∇·B = 0
∇ x E = -∂B/∂t
∇ x B = μ0J + μ0ε0(∂E/∂t)

But did you know that they can actually all be captured in one simple expression? When we extend classical electrodynamics into Minkowski 4-space (the geometry of special relativity), the time derivatives that appear in the derivation of Maxwell’s equations can be mixed into the same mathematics that takes care of the spatial expressions. By generalizing the Laplacian (∇2) to 4-space, we obtain the d’Alembertian (□2, sometimes written without the exponent). This operator allows us to rewrite Maxwell’s equations in tensor notation as one simple formula, sometimes called the Riemann–Sommerfeld equation:

2Aμ = -μ0Jμ

Where Aμ is the four-potential (φ, cA) (with φ and A the familiar scalar and vector potentials) and Jμ the four-current (ρc, J) (with ρ and J the familiar charge and current densities), thereby using the power of differential geometry and tensor notation to capture all of classical electrodynamics in one swift stroke.


Background

Maxwell’s Equations describe how electric and magnetic fields relate to each other and how they relate to the presence of charged particles and currents. The way we’ve presented them above is in differential form, which describes how electric fields (E) and magnetic fields (B) change in space and time; they can also be presented in integral form, which describes how the total measure of these fields scales with quantities like current density (J).

Although these equations were originally formulated in terms of fields, physicists would later find that the equations could be much cleaner when expressed in terms of potentials. A similar revolution happened in classical mechanics; although Newton’s forces were a good start, Lagrangian dynamics showed us that understanding the potential energy of the system made solving for equations of motion much simpler than trying to add up dozens of force vectors. In electrodynamics, we have two kinds of potentials: the scalar potential φ (or, sometimes, V), which most people know by the term “voltage”; and the vector potential, A, which is related to magnetic fields and may be less familiar.

Tensors notation (for our purposes) is little more than a convenient way to represent vectors and matrices. When you read the Riemann-Sommerfeld equation, then, you’re really reading 4 equations at once. What are those equations? Repeat the equation four times but each time replace μ with a different value; so, one equation might be □2A1 = -μ0J1. The 1’s in this equation aren’t exponents, but rather tensor indices; just like you might see vector components written as vx, vy, and vz representing different components of v, A0, A1, A2, and A3 represent the four components of the (vector) tensor A.

What are the funny upside-down triangles? The symbol is called a nabla, and is often read as “del”. In some ways, it’s an abuse of notation, but it gives us a convenient way to write the gradient, divergence, curl, and Laplacian operators. The exercise for how this notation works is left to the reader, but note the definitions: ∇U = grad(U), the gradient; ∇·U = div(U), the divergence; ∇ x U = curl(U), the curl; and ∇2U = ∇·(∇U) = div(grad(U)), the Laplacian. Each of these functions has a very important place in vector analysis. Since we represent electric and magnetic fields mathematically as vector fields, these operations give us information about the geometry of those fields. The gradient function tells us the direction that a field changes most at each point; the divergence tells us how much the field is dispersing (a field where everything pointed away from one center point would have a high divergence); the curl tells us how much a field is swirling, so to speak; and the Laplacian tells us how much a field changes in strength as it disperses.

And finally, we have our partial derivatives (∂B/∂t, for example) that give us rates of change (like how a magnetic field B changes with respect to the time t), and our constants, ε0 and μ0. These constants have the interesting relation that, for the speed of light c, ε0μ0=1/c2.

Picture credit: Geek3, from Wikipedia. Licensed under CC-3.0 Attribution Share-Alike.

Monday, July 11, 2011

Sizes of Infinity

Everyone knows about infinity, but most people don’t know that there are actually different sizes of infinity: the transfinite numbers. This may out odd at first; after all, how do you define the size of something that is infinite?

For example: there are an infinite number of even numbers. But, to be more precise, there is a “countable infinity” of even numbers; in other words, you can enumerate them.

Say we start at 0 and call it the first even number; then we say that 2 is the second even number, -2 is the third, 4 is the fourth, -4 is the fifth, et cetera. In this way we can assign an integer to each even number. (Note that we can choose any numbering scheme we like; what’s important isn’t how we enumerate them, but rather that we can.) Therefore, the size of the even numbers is as big as the size of the integers, and that size is “countable infinity”, often represented by the symbol aleph zero. A similar argument can be given for odd numbers, positive numbers, negative numbers, primes, and so forth.

On the other hand: when one tries to count the real numbers, a problem arises. For any two real numbers there will always exist another between them. It is not possible to create a one-to-one correspondence between the real numbers and the integers; in other words, they are impossible to count. For a rigorous proof of this fact see Cantor’s diagonal argument.

Notice that between any two rational numbers (numbers of the form a/b) there is also always another number; nevertheless, we can construct a an isomorphism between the rationals and the integers. So, in fact, there are just as many rationals as there are integers! One such isomorphism is the Calkin-Wilf tree.