Thursday, June 28, 2012
Maximum Entropy Distributions



Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?



In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.



First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².



Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.



Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.

Maximum Entropy Distributions

Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?

In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.

First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².

Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.

Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.

Friday, October 28, 2011
The Virial Theorem



In the transition from classical to statistical mechanics, are there familiar quantities that remain constant? The Virial theorem defines a law for how the total kinetic energy of a system behaves under the right conditions, and is equally valid for a one particle system or a mole of particles.



Rudolf Clausius, the man responsible for the first mathematical treatment of entropy and for one of the classic statements of the second law of thermodynamics, defined a quantity G (now called the Virial of Clausius):



G ≡ Σi(pi · ri)



Where the sum is taken over all the particles in a system. You may want to satisfy yourself (it’s a short derivation) that taking the time derivative gives:



dG/dt = 2T + Σi(Fi · ri)



Where T is the total kinetic energy of the system (Σ  ½mv2) and dp/dt = F. Now for the theorem: the Virial Theorem states that if the time average of dG/dt is zero, then the following holds (we use angle brackets ⟨·⟩ to denote time averages):



2⟨T⟩ = - Σi(Fi · ri)



Which may not be surprising. If, however, all the forces can be written as power laws so that the potential is V=arn (with r the inter-particle separation), then



2⟨T⟩ = n⟨V⟩



Which is pretty good to know! (Here, V is the total kinetic energy of the particles in the system, not the potential function V=arn.) For an inverse square law (like the gravitational or Coulomb forces), F∝1/r2 ⇒ V∝1/r, so 2⟨T⟩ = -⟨V⟩.



Try it out on a simple harmonic oscillator (like a mass on a spring with no gravity) to see for yourself. The potential V ∝ kx², so it should be the case that the time average of the potential energy is equal to the time average of the kinetic energy (n=2 matches the coefficient in 2⟨T⟩). Indeed, if x = A sin( √[k/m] · t ), then v = A√[k/m] cos( √[k/m] · t ); then x2 ∝ sin² and v² ∝ cos², and the time averages (over an integral number of periods) of sine squared and cosine squared are both ½. Thus the Virial theorem reduces to



2 · ½m·(A²k/2m) = 2 · ½k(A²/2)



Which is easily verified. This doesn’t tell us much about the simple harmonic oscillator; in fact, we had to find the equations of motion before we could even use the theorem! (Try plugging in the force term F=-kx in the first form of the Virial theorem, without assuming that the potential is polynomial, and verify that the result is the same). But the theorem scales to much larger systems where finding the equations of motion is impossible (unless you want to solve an Avogadro’s number of differential equations!), and just knowing the potential energy of particle interactions in such systems can tell us a lot about the total energy or temperature of the ensemble.

The Virial Theorem

In the transition from classical to statistical mechanics, are there familiar quantities that remain constant? The Virial theorem defines a law for how the total kinetic energy of a system behaves under the right conditions, and is equally valid for a one particle system or a mole of particles.

Rudolf Clausius, the man responsible for the first mathematical treatment of entropy and for one of the classic statements of the second law of thermodynamics, defined a quantity G (now called the Virial of Clausius):

G ≡ Σi(pi · ri)

Where the sum is taken over all the particles in a system. You may want to satisfy yourself (it’s a short derivation) that taking the time derivative gives:

dG/dt = 2T + Σi(Fi · ri)

Where T is the total kinetic energy of the system (Σ  ½mv2) and dp/dt = F. Now for the theorem: the Virial Theorem states that if the time average of dG/dt is zero, then the following holds (we use angle brackets ⟨·⟩ to denote time averages):

2⟨T⟩ = - Σi(Fi · ri)

Which may not be surprising. If, however, all the forces can be written as power laws so that the potential is V=arn (with r the inter-particle separation), then

2⟨T⟩ = n⟨V⟩

Which is pretty good to know! (Here, V is the total kinetic energy of the particles in the system, not the potential function V=arn.) For an inverse square law (like the gravitational or Coulomb forces), F∝1/r2 ⇒ V∝1/r, so 2⟨T⟩ = -⟨V⟩.

Try it out on a simple harmonic oscillator (like a mass on a spring with no gravity) to see for yourself. The potential Vkx², so it should be the case that the time average of the potential energy is equal to the time average of the kinetic energy (n=2 matches the coefficient in 2⟨T⟩). Indeed, if x = A sin( √[k/m] · t ), then v = A√[k/m] cos( √[k/m] · t ); then x2 ∝ sin² and v² ∝ cos², and the time averages (over an integral number of periods) of sine squared and cosine squared are both ½. Thus the Virial theorem reduces to

2 · ½m·(A²k/2m) = 2 · ½k(A²/2)

Which is easily verified. This doesn’t tell us much about the simple harmonic oscillator; in fact, we had to find the equations of motion before we could even use the theorem! (Try plugging in the force term F=-kx in the first form of the Virial theorem, without assuming that the potential is polynomial, and verify that the result is the same). But the theorem scales to much larger systems where finding the equations of motion is impossible (unless you want to solve an Avogadro’s number of differential equations!), and just knowing the potential energy of particle interactions in such systems can tell us a lot about the total energy or temperature of the ensemble.