Maximum Entropy Distributions
Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?
In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.
First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².
Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.
Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.
The Hamilton-Jacobi Equation
This blog has posted more than a few times in the past about classical mechanics. Luckily, classical mechanics can be approached in several ways. This approach, which uses the Hamilton-Jacobi equation (HJE), is one of the most elegant and powerful methods.
Why is the HJE so powerful? Consider a dynamical system with a Hamiltonian H=H(q,p,t). Suppose we knew of a canonical transformation (CT) that generated a new Hamiltonian K=K(Q,P,t) which (for a local chart on phase space) vanishes identically. Then the canonical equations would give that the transformed coordinates (Q,P) are constant in this region. How easy it would be to solve a system where you know that most of the important quantities are constant!
The rub is in finding such a canonical transformation. Sometimes it can’t even be done analytically, but nevertheless this is the goal of the Hamilton-Jacobi method of solving mechanical systems. In the equation given above, S is the generating function of the CT. Coincidentally, it often comes out to just equal the classical action up to an additive constant! This is due to the connection between canonical transformations and mechanical gauge transformations; it turns out that the additive function used to define the latter is the generating function of the former. In general the HJE is a partial differential equation that might be solvable by additive separation of variables… but don’t get too hopeful! Oftentimes the value of the HJE comes not in finding the actual equations of motion but in revealing symmetry and conservation properties of the system.
The Virial Theorem
In the transition from classical to statistical mechanics, are there familiar quantities that remain constant? The Virial theorem defines a law for how the total kinetic energy of a system behaves under the right conditions, and is equally valid for a one particle system or a mole of particles.
Rudolf Clausius, the man responsible for the first mathematical treatment of entropy and for one of the classic statements of the second law of thermodynamics, defined a quantity G (now called the Virial of Clausius):
G ≡ Σi(pi · ri)
Where the sum is taken over all the particles in a system. You may want to satisfy yourself (it’s a short derivation) that taking the time derivative gives:
dG/dt = 2T + Σi(Fi · ri)
Where T is the total kinetic energy of the system (Σ ½mv2) and dp/dt = F. Now for the theorem: the Virial Theorem states that if the time average of dG/dt is zero, then the following holds (we use angle brackets ⟨·⟩ to denote time averages):
2⟨T⟩ = - Σi(Fi · ri)
Which may not be surprising. If, however, all the forces can be written as power laws so that the potential is V=arn (with r the inter-particle separation), then
2⟨T⟩ = n⟨V⟩
Which is pretty good to know! (Here, V is the total kinetic energy of the particles in the system, not the potential function V=arn.) For an inverse square law (like the gravitational or Coulomb forces), F∝1/r2 ⇒ V∝1/r, so 2⟨T⟩ = -⟨V⟩.
Try it out on a simple harmonic oscillator (like a mass on a spring with no gravity) to see for yourself. The potential V ∝ kx², so it should be the case that the time average of the potential energy is equal to the time average of the kinetic energy (n=2 matches the coefficient in 2⟨T⟩). Indeed, if x = A sin( √[k/m] · t ), then v = A√[k/m] cos( √[k/m] · t ); then x2 ∝ sin² and v² ∝ cos², and the time averages (over an integral number of periods) of sine squared and cosine squared are both ½. Thus the Virial theorem reduces to
2 · ½m·(A²k/2m) = 2 · ½k(A²/2)
Which is easily verified. This doesn’t tell us much about the simple harmonic oscillator; in fact, we had to find the equations of motion before we could even use the theorem! (Try plugging in the force term F=-kx in the first form of the Virial theorem, without assuming that the potential is polynomial, and verify that the result is the same). But the theorem scales to much larger systems where finding the equations of motion is impossible (unless you want to solve an Avogadro’s number of differential equations!), and just knowing the potential energy of particle interactions in such systems can tell us a lot about the total energy or temperature of the ensemble.
When describing the trajectory of a point particle in space, we can use simple kinematic physics to describe properties of the particle: force, energy, momentum, and so forth. But are there useful measures we can use to describe the qualities of the trajectory itself?
Enter the Frenet-Serret (or TNB) frame. In this post, we’ll show how to construct three (intuitively meaningful) orthonormal vectors that follow a particle in its trajectory. These vectors will be subject to the Frenet-Serret equations, and will also end up giving us a useful way to interpret curvature and torsion.
First, we define arc length: let s(t) = ∫0t ||x’(τ)|| dτ. (We give a quick overview of integration in this post.) If you haven’t encountered this definition before, don’t fret: we’re simply multiplying the change in position of the particle x’(τ) by the small time step dτ summed over every infinitesimal time step from τ=0 to τ=t=”current time”. The post linked to above also explains a short theorem that may illustrate this point more lucidly.
Now, consider a particle’s trajectory x(t). What’s the velocity of this particle? Its speed, surely, is ds/dt: the change in arc length (distance traveled) over time. But velocity is a vector, and needs a direction. Thus we define the velocity v=(dx/ds)⋅(ds/dt). This simplifies to the more obvious definition dx/dt, but allows us to separate out the latter term as speed and the former term as direction. This first term, dx/ds, describes the change in the position given a change in distance traveled. As long as the trajectory of the particle has certain nice mathematical properties (like smoothness), this vector will always be tangent to the trajectory of the particle. Think of this vector like the hood of your car: even though the car can turn, the hood will always point in whatever direction you’re going towards. This vector T ≡ dx/ds is called the unit tangent vector.
We now define two other useful vectors. The normal vector: N ≡ (dT/ds) / ( |dT/ds| ) is a vector of unit length that always points in whichever way T is turning toward. It can be shown — but not here — that T ⊥ N. The binormal vector B is normal to both T and N; it’s defined as B ≡ T x N. So T, N, and B all have unit length and are all orthogonal to each other. Since T depends directly on the movement of the particle, N and B do as well; therefore, as the particle moves around, the coordinate system defined by T, N, and B moves around as well, connected to the particle. The frame is always orthonormal and always maintains certain relationships to the particle’s motion, so it can be useful to make some statements in the context of the TNB frame.
The Frenet-Serret equations, as promised:
- dT/ds = κN
- dN/ds = -κT + τB
- dB/ds = -τN
Here, κ is the curvature and τ is the torsion. Further reading (lookup the Darboux vector) illustrates that κ represents the rotation of the entire TNB frame about the binormal vector B, and τ represents the rotation of the frame about T. The idea of the particle trajectory twisting and rolling nicely matches the idea of what it might be like to be in the cockpit of one of these point particles, but takes this depth of vector analysis to get to.
Bonus points: remember how v = Tv, with v the speed? Differentiate this with respect to time, play around with some algebra, and see if you can arrive at the following result: the acceleration a = κv2N + (d2s/dt2)T. Thoughtful consideration will reveal the latter term as the tangential acceleration, and knowing that 1/κ ≡ ρ = “the radius of curvature” reveals that the first term is centripetal acceleration.
Originally, in the Newtonian formulation of classical mechanics, equations of motion were determined by summing up vector forces (à la free body diagrams). Is there a different way to find the equations of motion?
In place of drawing a free body diagram, we can represent a system more rigorously by describing its configuration space. The configuration space (often denoted Q) of a system is a mathematical space (a differential manifold) where every point in the space represents a particular state or configuration of the system. A curve drawn through a configuration space, then, represents the evolution of a system through a sequence of configurations.
Consider a rod along which a pendulum can slide. We need two numbers to describe the state of this system: the angle of the swinging pendulum and the position of the pendulum’s base along the rod. These two numbers are generalized coordinates for our system. Just like a traditional, linear vector space has a coordinate basis (like x, y, and z), our configuration space can use our generalized coordinates as a basis; let’s choose to name the position on the rod x and the angle of the pendulum φ. Since x can take any real value and φ can take any value from 0 to 2π (or 0 to 360o, if you like), the x dimension can be represented by a line (R1) and the φ dimension by a circle (or a one-sphere, S1). When we combine these dimensions, our new space — the configuration space of this system — is shaped like an infinite cylinder, R1 x S1. Just imagine connecting a circle to every point on a line… or, conversely, a line to every point on a circle.
The general process of examining a system and the constraints on its movement is a standard first step for solving mechanics problems analytically. After accounting for the constraints on a system, the ways a system can vary are called the degrees of freedom. Their count is often represented by the variable s. Notice: s = dim(Q).
Now that we’ve represented the configuration of our system, we need to talk about the forces present. There are several different ways that we can set up scalar fields on our configuration manifold that represent quantities related to the energy of the system. The simplest to deal with is often the Lagrangian, L = T - V = (Total kinetic energy) - (Total potential energy). Some fancy mathematics (a.k.a. calculus of variations) shows that when we define the Lagrangian in terms of our coordinates and their time derivatives, we can easily derive the equations of motion using the Euler-Lagrange equation.
For more complicated systems, configurations spaces may look different. A double pendulum (a pendulum on a pendulum) would have the topology S1 x S1 = T2, the torus (as pictured). Many systems will have higher dimensions that prevent them from being easily visualized.
Exercise left to the reader: the Lagrangian explicitly takes the time derivatives of the coordinates as arguments; information about the velocities of the system is needed to derive the equations of motion. But this information isn’t included in Q, so Lagrangian dynamics actually happens on TQ, the tangent bundle to Q. This new manifold includes information about how the system changes from every given configuration; since it needs to include a velocity coordinate for each configuration coordinate, dim(TQ) = 2s. TQ is also called Γv, the velocity phase space. T*Q, the cotangent bundle to Q, is the dual of TQ, and is traditionally just called the phase space, Γ; this is where Hamiltonian mechanics takes place.
FBDs: An Intro to Newton’s Laws
To describe some truths of how and why objects move, Newton formulated three laws of motion. Practitioners of the science which resulted from these laws (sometimes called mechanics or kinematics) began and spread the use of free body diagrams (FBDs) to aid in developing a sense of how motion works.
The first law defines inertia, the property of any object with mass to retain a constant velocity unless prompted by an external force. Mass is actually the measure of an object’s inertia, which has an SI unit kilogram (kg). When we draw a free body diagram, we draw only the object (the body) and vectors to indicate forces. Thus the start of an FBD looks like
and, if no force vectors are added, means that the object is moving at a constant (perhaps zero) velocity. This might describe a meteorite in space far away from any planets or stars - since it doesn’t “feel” any forces, it will continue traveling at the same speed in the same direction.
Of course, the very existence of other objects with mass in the universe means at least that any object we choose to study will feel a gravitational attraction to the other objects, and often other forces as well - the normal force from the ground, drag from fluid viscosity, friction between surfaces, pushing from a hand, tension through a rope. The second law then provides for how our body of interest reacts to the application of one or more external forces. An FBD simplifies the bookkeeping of these forces; again, it isolates the body from its surroundings and visualizes only the forces as vectors acting on it:
Newton’s second law states that this object will experience a net acceleration (change in velocity) proportional to the sum of the forces,
F1 + F2 + ··· + Fn = ΣiFi = m anet
and one notices that the proportionality constant is the object’s mass, or inertia. A more massive body accelerates less when acted on by the same set of forces than a less massive body; in other words, it better resists a change to its initial velocity.
The third law, sometimes called the law of reciprocal forces, states that if one object exerts a force on another object, the second object will exert a force of the same magnitude and the opposite direction on the first.
This can be helpful in checking that your FBD has all right forces, and only the right forces. But exercise caution. The third law deals with two forces on two different objects, whereas an FBD is made to deal with all the forces on one object. For instance, it is impossible to see a third law force pair in a free body diagram of a single body.
The ultimate goal of mechanics, which got its start in Newton’s laws and free body diagrams, is to solve the equations of motion for a given system. Eventually the more advanced tools of Lagrangian and Hamiltonian mechanics would develop, to be used respectively to analyze classical and quantum mechanics.