Saturday, December 22, 2012

aangot asked: Hi! Love your blog. I enjoy math, but the unfortunate thing is that Im really bad at it. Is this strange to be bad at a subject you enjoy? And what do you recommend someone does if they want to get better at math?

No that’s not strange at all! Part of the fun of learning something new is its difficulty. If you want to learn more about math all you have to do is stay curious. Wikipedia is always a great resource. Talk to your math teachers or professors and see what fields would be most useful at your age and see if you can buy some text books relating to the subject. Youtube has useful channels too, like Khan Academy

Thursday, June 28, 2012
Maximum Entropy Distributions



Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?



In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.



First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².



Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.



Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.

Maximum Entropy Distributions

Entropy is an important topic in many fields; it has very well known uses in statistical mechanics, thermodynamics, and information theory. The classical formula for entropy is Σi(pi log pi), where p=p(x) is a probability density function describing the likelihood of a possible microstate of the system, i, being assumed. But what is this probability density function? How must the likelihood of states be configured so that we observe the appropriate macrostates?

In accordance with the second law of thermodynamics, we wish for the entropy to be maximized. If we take the entropy in the limit of large N, we can treat it with calculus as S[φ]=∫dx φ ln φ. Here, S is called a functional (which is, essentially, a function that takes another function as its argument). How can we maximize S? We will proceed using the methods of calculus of variations and Lagrange multipliers.

First we introduce three constraints. We require normalization, so that ∫dx φ = 1. This is a condition that any probability distribution must satisfy, so that the total probability over the domain of possible values is unity (since we’re asking for the probability of any possible event occurring). We require symmetry, so that the expected value of x is zero (it is equally likely to be in microstates to the left of the mean as it is to be in microstates to the right — note that this derivation is treating the one-dimensional case for simplicity). Then our constraint is ∫dx x·φ = 0. Finally, we will explicitly declare our variance to be σ², so that ∫dx x²·φ = σ².

Using Lagrange multipliers, we will instead maximize the augmented functional S[φ]=∫(φ ln φ + λ0φ + λ1xφ + λ2x²φ dx). Here, the integrand is just the sum of the integrands above, adjusted by Lagrange multipliers λk for which we’ll be solving.

Applying the Euler-Lagrange equations and solving for φ gives φ = 1/exp(1+λ0+xλ1+x²λ2). From here, our symmetry condition forces λ1=0, and evaluating the other integral conditions gives our other λ’s such that q = (1/2πσ²)½·exp(-x² / 2σ²), which is just the Normal (or Gaussian) distribution with mean 0 and variance σ². This remarkable distribution appears in many descriptions of nature, in no small part due to the Central Limit Theorem.

Friday, October 28, 2011
The Virial Theorem



In the transition from classical to statistical mechanics, are there familiar quantities that remain constant? The Virial theorem defines a law for how the total kinetic energy of a system behaves under the right conditions, and is equally valid for a one particle system or a mole of particles.



Rudolf Clausius, the man responsible for the first mathematical treatment of entropy and for one of the classic statements of the second law of thermodynamics, defined a quantity G (now called the Virial of Clausius):



G ≡ Σi(pi · ri)



Where the sum is taken over all the particles in a system. You may want to satisfy yourself (it’s a short derivation) that taking the time derivative gives:



dG/dt = 2T + Σi(Fi · ri)



Where T is the total kinetic energy of the system (Σ  ½mv2) and dp/dt = F. Now for the theorem: the Virial Theorem states that if the time average of dG/dt is zero, then the following holds (we use angle brackets ⟨·⟩ to denote time averages):



2⟨T⟩ = - Σi(Fi · ri)



Which may not be surprising. If, however, all the forces can be written as power laws so that the potential is V=arn (with r the inter-particle separation), then



2⟨T⟩ = n⟨V⟩



Which is pretty good to know! (Here, V is the total kinetic energy of the particles in the system, not the potential function V=arn.) For an inverse square law (like the gravitational or Coulomb forces), F∝1/r2 ⇒ V∝1/r, so 2⟨T⟩ = -⟨V⟩.



Try it out on a simple harmonic oscillator (like a mass on a spring with no gravity) to see for yourself. The potential V ∝ kx², so it should be the case that the time average of the potential energy is equal to the time average of the kinetic energy (n=2 matches the coefficient in 2⟨T⟩). Indeed, if x = A sin( √[k/m] · t ), then v = A√[k/m] cos( √[k/m] · t ); then x2 ∝ sin² and v² ∝ cos², and the time averages (over an integral number of periods) of sine squared and cosine squared are both ½. Thus the Virial theorem reduces to



2 · ½m·(A²k/2m) = 2 · ½k(A²/2)



Which is easily verified. This doesn’t tell us much about the simple harmonic oscillator; in fact, we had to find the equations of motion before we could even use the theorem! (Try plugging in the force term F=-kx in the first form of the Virial theorem, without assuming that the potential is polynomial, and verify that the result is the same). But the theorem scales to much larger systems where finding the equations of motion is impossible (unless you want to solve an Avogadro’s number of differential equations!), and just knowing the potential energy of particle interactions in such systems can tell us a lot about the total energy or temperature of the ensemble.

The Virial Theorem

In the transition from classical to statistical mechanics, are there familiar quantities that remain constant? The Virial theorem defines a law for how the total kinetic energy of a system behaves under the right conditions, and is equally valid for a one particle system or a mole of particles.

Rudolf Clausius, the man responsible for the first mathematical treatment of entropy and for one of the classic statements of the second law of thermodynamics, defined a quantity G (now called the Virial of Clausius):

G ≡ Σi(pi · ri)

Where the sum is taken over all the particles in a system. You may want to satisfy yourself (it’s a short derivation) that taking the time derivative gives:

dG/dt = 2T + Σi(Fi · ri)

Where T is the total kinetic energy of the system (Σ  ½mv2) and dp/dt = F. Now for the theorem: the Virial Theorem states that if the time average of dG/dt is zero, then the following holds (we use angle brackets ⟨·⟩ to denote time averages):

2⟨T⟩ = - Σi(Fi · ri)

Which may not be surprising. If, however, all the forces can be written as power laws so that the potential is V=arn (with r the inter-particle separation), then

2⟨T⟩ = n⟨V⟩

Which is pretty good to know! (Here, V is the total kinetic energy of the particles in the system, not the potential function V=arn.) For an inverse square law (like the gravitational or Coulomb forces), F∝1/r2 ⇒ V∝1/r, so 2⟨T⟩ = -⟨V⟩.

Try it out on a simple harmonic oscillator (like a mass on a spring with no gravity) to see for yourself. The potential Vkx², so it should be the case that the time average of the potential energy is equal to the time average of the kinetic energy (n=2 matches the coefficient in 2⟨T⟩). Indeed, if x = A sin( √[k/m] · t ), then v = A√[k/m] cos( √[k/m] · t ); then x2 ∝ sin² and v² ∝ cos², and the time averages (over an integral number of periods) of sine squared and cosine squared are both ½. Thus the Virial theorem reduces to

2 · ½m·(A²k/2m) = 2 · ½k(A²/2)

Which is easily verified. This doesn’t tell us much about the simple harmonic oscillator; in fact, we had to find the equations of motion before we could even use the theorem! (Try plugging in the force term F=-kx in the first form of the Virial theorem, without assuming that the potential is polynomial, and verify that the result is the same). But the theorem scales to much larger systems where finding the equations of motion is impossible (unless you want to solve an Avogadro’s number of differential equations!), and just knowing the potential energy of particle interactions in such systems can tell us a lot about the total energy or temperature of the ensemble.

Wednesday, October 12, 2011
Hypercubes
What is a hypercube (also referred to as a tesseract) you say! Well, let’s start with what you know already. We know what a cube is, it’s a box! But how else could you describe a cube? A cube is 3 dimensional. Its 2 dimensional cousin is a square. 
A hypercube is just to a cube what a cube is to a square. A hypercube is 4 dimensional! (Actually— to clarify, hypercubes can refer to cubes of all dimensions. “Normal” cubes are 3 dimensional, squares are 2 dimensional “cubes, etc. This is because a hypercube is an n-dimensional figure whose edges are aligned in each of the space’s dimensions, perpendicular to each other and of the same length. A tesseract is specifically a 4-d cube). 

[source]
Another way to think about this can be found here:

Start with a point. Make a copy of the point, and move it some distance away. Connect these points. We now have a segment. Make a copy of the segment, and move it away from the first segment in a new (orthogonal) direction. Connect corresponding points. We now have an ordinary square. Make a copy of the square, and move it in a new (orthogonal) direction. Connect corresponding points. We now have a cube. Make a copy and move it in a new (orthogonal, fourth) direction. Connect corresponding points. This is the tesseract.

If a tesseract were to enter our world, we would only see it in our three dimensions, meaning we would see forms of a cube doing funny things and spinning on its axes. This would be referred to as a cross-section of the tesseract. Similarly, if we as 3-dimensional bodies were to enter a 2-dimensional world, its 2-dimension citizens would “observe” us as 2-dimensional cross objects as well! It would only be possible for them to see cross-sections of us.
Why would this be significant? Generally, in math, we work with multiple dimensions very often. While it may seem as though a mathematican must then work with 3 dimensions often, it is not necessarily true. The mathematician deals with these dimensions only mathematically. These dimensions do not have a value because they do not correspond to anything in reality; 3 dimensions are nothing ordinary nor special. 
Yet, through modern mathematics and physics, researchers consider the existence of other (spatial) dimensions.  What might be an example of such a theory? String theory is a model of the universe which supposes there may be many more than the usual 4 spacetime dimensions (3 for space, 1 for time). Perhaps understanding these dimensions, though seemingly impossible to visualize, will come in hand. 
Carl Sagan also explains what a tesseract is. 
Image: Peter Forakis, Hyper-Cube, 1967, Walker Art Center, Minneapolis

Hypercubes

What is a hypercube (also referred to as a tesseract) you say! Well, let’s start with what you know already. We know what a cube is, it’s a box! But how else could you describe a cube? A cube is 3 dimensional. Its 2 dimensional cousin is a square. 

A hypercube is just to a cube what a cube is to a square. A hypercube is 4 dimensional! (Actually— to clarify, hypercubes can refer to cubes of all dimensions. “Normal” cubes are 3 dimensional, squares are 2 dimensional “cubes, etc. This is because a hypercube is an n-dimensional figure whose edges are aligned in each of the space’s dimensions, perpendicular to each other and of the same length. A tesseract is specifically a 4-d cube). 

[source]

Another way to think about this can be found here:

Start with a point. Make a copy of the point, and move it some distance away. Connect these points. We now have a segment. Make a copy of the segment, and move it away from the first segment in a new (orthogonal) direction. Connect corresponding points. We now have an ordinary square. Make a copy of the square, and move it in a new (orthogonal) direction. Connect corresponding points. We now have a cube. Make a copy and move it in a new (orthogonal, fourth) direction. Connect corresponding points. This is the tesseract.

If a tesseract were to enter our world, we would only see it in our three dimensions, meaning we would see forms of a cube doing funny things and spinning on its axes. This would be referred to as a cross-section of the tesseract. Similarly, if we as 3-dimensional bodies were to enter a 2-dimensional world, its 2-dimension citizens would “observe” us as 2-dimensional cross objects as well! It would only be possible for them to see cross-sections of us.

Why would this be significant? Generally, in math, we work with multiple dimensions very often. While it may seem as though a mathematican must then work with 3 dimensions often, it is not necessarily true. The mathematician deals with these dimensions only mathematically. These dimensions do not have a value because they do not correspond to anything in reality; 3 dimensions are nothing ordinary nor special. 

Yet, through modern mathematics and physics, researchers consider the existence of other (spatial) dimensions.  What might be an example of such a theory? String theory is a model of the universe which supposes there may be many more than the usual 4 spacetime dimensions (3 for space, 1 for time). Perhaps understanding these dimensions, though seemingly impossible to visualize, will come in hand. 

Carl Sagan also explains what a tesseract is

Image: Peter Forakis, Hyper-Cube, 1967, Walker Art Center, Minneapolis

Monday, September 5, 2011
When describing the trajectory of a point particle in space, we can use simple kinematic physics to describe properties of the particle: force, energy, momentum, and so forth. But are there useful measures we can use to describe the qualities of the trajectory itself?



Enter the Frenet-Serret (or TNB) frame. In this post, we’ll show how to construct three (intuitively meaningful) orthonormal vectors that follow a particle in its trajectory. These vectors will be subject to the Frenet-Serret equations, and will also end up giving us a useful way to interpret curvature and torsion.



First, we define arc length: let s(t) = ∫0t ||x’(τ)|| dτ. (We give a quick overview of integration in this post.) If you haven’t encountered this definition before, don’t fret: we’re simply multiplying the change in position of the particle x’(τ) by the small time step dτ summed over every infinitesimal time step from τ=0 to τ=t=”current time”. The post linked to above also explains a short theorem that may illustrate this point more lucidly.



Now, consider a particle’s trajectory x(t). What’s the velocity of this particle? Its speed, surely, is ds/dt: the change in arc length (distance traveled) over time. But velocity is a vector, and needs a direction. Thus we define the velocity v=(dx/ds)⋅(ds/dt). This simplifies to the more obvious definition dx/dt, but allows us to separate out the latter term as speed and the former term as direction. This first term, dx/ds, describes the change in the position given a change in distance traveled. As long as the trajectory of the particle has certain nice mathematical properties (like smoothness), this vector will always be tangent to the trajectory of the particle. Think of this vector like the hood of your car: even though the car can turn, the hood will always point in whatever direction you’re going towards. This vector T ≡ dx/ds is called the unit tangent vector.



We now define two other useful vectors. The normal vector: N ≡ (dT/ds) / ( |dT/ds| ) is a vector of unit length that always points in whichever way T is turning toward. It can be shown — but not here — that T ⊥ N. The binormal vector B is normal to both T and N; it’s defined as B ≡ T x N. So T, N, and B all have unit length and are all orthogonal to each other. Since T depends directly on the movement of the particle, N and B do as well; therefore, as the particle moves around, the coordinate system defined by T, N, and B moves around as well, connected to the particle. The frame is always orthonormal and always maintains certain relationships to the particle’s motion, so it can be useful to make some statements in the context of the TNB frame.



The Frenet-Serret equations, as promised:



 dT/ds = κN
 dN/ds = -κT + τB
 dB/ds = -τN
Here, κ is the curvature and τ is the torsion. Further reading (lookup the Darboux vector) illustrates that κ represents the rotation of the entire TNB frame about the binormal vector B, and τ represents the rotation of the frame about T. The idea of the particle trajectory twisting and rolling nicely matches the idea of what it might be like to be in the cockpit of one of these point particles, but takes this depth of vector analysis to get to.



Bonus points: remember how v = Tv, with v the speed? Differentiate this with respect to time, play around with some algebra, and see if you can arrive at the following result: the acceleration a = κv2N + (d2s/dt2)T. Thoughtful consideration will reveal the latter term as the tangential acceleration, and knowing that 1/κ ≡ ρ = “the radius of curvature” reveals that the first term is centripetal acceleration.



—



Photo credit: Salix alba at en.wikipedia

When describing the trajectory of a point particle in space, we can use simple kinematic physics to describe properties of the particle: force, energy, momentum, and so forth. But are there useful measures we can use to describe the qualities of the trajectory itself?

Enter the Frenet-Serret (or TNB) frame. In this post, we’ll show how to construct three (intuitively meaningful) orthonormal vectors that follow a particle in its trajectory. These vectors will be subject to the Frenet-Serret equations, and will also end up giving us a useful way to interpret curvature and torsion.

First, we define arc length: let s(t) = ∫0t ||x’(τ)|| dτ. (We give a quick overview of integration in this post.) If you haven’t encountered this definition before, don’t fret: we’re simply multiplying the change in position of the particle x’(τ) by the small time step dτ summed over every infinitesimal time step from τ=0 to τ=t=”current time”. The post linked to above also explains a short theorem that may illustrate this point more lucidly.

Now, consider a particle’s trajectory x(t). What’s the velocity of this particle? Its speed, surely, is ds/dt: the change in arc length (distance traveled) over time. But velocity is a vector, and needs a direction. Thus we define the velocity v=(dx/ds)⋅(ds/dt). This simplifies to the more obvious definition dx/dt, but allows us to separate out the latter term as speed and the former term as direction. This first term, dx/ds, describes the change in the position given a change in distance traveled. As long as the trajectory of the particle has certain nice mathematical properties (like smoothness), this vector will always be tangent to the trajectory of the particle. Think of this vector like the hood of your car: even though the car can turn, the hood will always point in whatever direction you’re going towards. This vector T ≡ dx/ds is called the unit tangent vector.

We now define two other useful vectors. The normal vector: N ≡ (dT/ds) / ( |dT/ds| ) is a vector of unit length that always points in whichever way T is turning toward. It can be shown — but not here — that TN. The binormal vector B is normal to both T and N; it’s defined as BT x N. So T, N, and B all have unit length and are all orthogonal to each other. Since T depends directly on the movement of the particle, N and B do as well; therefore, as the particle moves around, the coordinate system defined by T, N, and B moves around as well, connected to the particle. The frame is always orthonormal and always maintains certain relationships to the particle’s motion, so it can be useful to make some statements in the context of the TNB frame.

The Frenet-Serret equations, as promised:

  • dT/ds = κN
  • dN/ds = -κT + τB
  • dB/ds = -τN

Here, κ is the curvature and τ is the torsion. Further reading (lookup the Darboux vector) illustrates that κ represents the rotation of the entire TNB frame about the binormal vector B, and τ represents the rotation of the frame about T. The idea of the particle trajectory twisting and rolling nicely matches the idea of what it might be like to be in the cockpit of one of these point particles, but takes this depth of vector analysis to get to.

Bonus points: remember how v = Tv, with v the speed? Differentiate this with respect to time, play around with some algebra, and see if you can arrive at the following result: the acceleration a = κv2N + (d2s/dt2)T. Thoughtful consideration will reveal the latter term as the tangential acceleration, and knowing that 1/κ ≡ ρ = “the radius of curvature” reveals that the first term is centripetal acceleration.

Photo credit: Salix alba at en.wikipedia
Tuesday, August 30, 2011
Imagine you had a function P that upon swallowing a subset E of a universal set       Ω will return a number x from the real number line. Keep imagining that P must also obey the following rules:
If P can eat the subset, it will always return a nonnegative number.
If you give P the universe       Ω, it will give you back 1.
 If you collected together disjoint subsets and gave them to P to process, the result would be the same as feeding P each subset individually and adding the answers.
Simple, if odd out of context.
Mathematicians have a curious way of pulling magic of out simplicity.
~
Probability today is studied as a mathematical science based on the three axioms (flavored by set theory) stated above. These are the “first principles” from which many other, derivative propositions have been speculated and proved. The results of the modern study of probability fuel many branches of  engineering, including signals processing in electrical and computer  engineering, the insurance and finance industries, which translate  probabilities into economic movement, and many other enterprises. Along the way it borrowed from the other giants of mathematics, analysis and algebra, and goes on generating new research ideas for itself and other fields. This is the way of math: set down a bunch of rules (preferably simple to start) and see how their consequences play out.
But what is probability? If it is quantitative measure, what is it measuring? How valid is that measure and how could it be checked? Even these are rich questions to probe. A working qualitative description for practitioners might be that probability quantifies uncertainty. It answers with some degree of success such questions as “What is the chance?” or “How likely is this?” If a system contains uncertainty, probability provides the model for handling it, and data gathered from the system can validate or improve the probability model.
According to Wikipedia, there are three main interpretations for probability:
Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The probability of a random event denotes the relative frequency of occurrence of an experiment’s outcome, when repeating the experiment. Frequentists  consider probability to be the relative frequency “in the long run” of  outcomes.
Subjectivists assign numbers per subjective probability, i.e., as a degree of belief.
Bayesians include expert knowledge as well as experimental data to produce  probabilities. The expert knowledge is represented by a prior  probability distribution. The data is incorporated in a likelihood  function. The product of the prior and the likelihood, normalized,  results in a posterior probability distribution that incorporates all  the information known to date.

~
So let’s reinterpret the math.
Let       Ω be the sample space, the set of all possible outcomes, be Ei be subsets of       Ω which denote different events for different i, and 𝔹 be the set of all events. Then a probability map P is defined as any function from 𝔹 → ℝ satisfying
P(Ei) ≥ 0All probabilities are non-negative.
P(Ω) = 1It is certain that one of the outcomes of Ω will happen.
Ei ∩ Ej = ∅ if i≠j ⇔ P(∑i Ei) = ∑iP(Ei)Probabilities of disjoint events can be added to get the probability of any of them happening.
—
Image generated by Rene Schwietzke using POV-Ray, a raytracing freeware that creates 3D computer graphics.
Further reading:A First Course in Probability (8th ed., 2010), Sheldon Ross.Probability and Statistics (4th ed., 2010), Mark J. Schervish and Morris H. Degroot.

Imagine you had a function P that upon swallowing a subset E of a universal set Ω will return a number x from the real number line. Keep imagining that P must also obey the following rules:

  1. If P can eat the subset, it will always return a nonnegative number.
  2. If you give P the universe Ω, it will give you back 1.
  3. If you collected together disjoint subsets and gave them to P to process, the result would be the same as feeding P each subset individually and adding the answers.

Simple, if odd out of context.

Mathematicians have a curious way of pulling magic of out simplicity.

~

Probability today is studied as a mathematical science based on the three axioms (flavored by set theory) stated above. These are the “first principles” from which many other, derivative propositions have been speculated and proved. The results of the modern study of probability fuel many branches of engineering, including signals processing in electrical and computer engineering, the insurance and finance industries, which translate probabilities into economic movement, and many other enterprises. Along the way it borrowed from the other giants of mathematics, analysis and algebra, and goes on generating new research ideas for itself and other fields. This is the way of math: set down a bunch of rules (preferably simple to start) and see how their consequences play out.

But what is probability? If it is quantitative measure, what is it measuring? How valid is that measure and how could it be checked? Even these are rich questions to probe. A working qualitative description for practitioners might be that probability quantifies uncertainty. It answers with some degree of success such questions as “What is the chance?” or “How likely is this?” If a system contains uncertainty, probability provides the model for handling it, and data gathered from the system can validate or improve the probability model.

According to Wikipedia, there are three main interpretations for probability:

  1. Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The probability of a random event denotes the relative frequency of occurrence of an experiment’s outcome, when repeating the experiment. Frequentists consider probability to be the relative frequency “in the long run” of outcomes.
  2. Subjectivists assign numbers per subjective probability, i.e., as a degree of belief.
  3. Bayesians include expert knowledge as well as experimental data to produce probabilities. The expert knowledge is represented by a prior probability distribution. The data is incorporated in a likelihood function. The product of the prior and the likelihood, normalized, results in a posterior probability distribution that incorporates all the information known to date.

~

So let’s reinterpret the math.

Let Ω be the sample space, the set of all possible outcomes, be Ei be subsets of Ω which denote different events for different i, and 𝔹 be the set of all events. Then a probability map P is defined as any function from 𝔹 → satisfying

  1. P(Ei) ≥ 0
    All probabilities are non-negative.
  2. P(Ω) = 1
    It is certain that one of the outcomes of Ω will happen.
  3. Ei ∩ Ej = ∅ if i≠j ⇔ P(∑i Ei) = ∑iP(Ei)
    Probabilities of disjoint events can be added to get the probability of any of them happening.

Image generated by Rene Schwietzke using POV-Ray, a raytracing freeware that creates 3D computer graphics.

Further reading:
A First Course in Probability (8th ed., 2010), Sheldon Ross.
Probability and Statistics (4th ed., 2010), Mark J. Schervish and Morris H. Degroot.

Wednesday, August 17, 2011

Vedic Multiplication

(Technically called Nikhilam Navatashcaramam Dashatah) This is a quick and simple way to multiply any two numbers. It’s easiest when the numbers are both close to a power of ten, but it will always work. The first step is to chose a power of ten that the numbers are closest to. In my example I will find the product of 14 and 12. Since 12 and 14 are close to 10 I will chose 10. 14 is 4 more than 10, and 12 is 2 more than 10, so I will write these numbers off to the side, as shown.

+4 times +2 is 8 so I write this number on the right. Then I cross add the 14 and the 2 or I add 12 and 4 to get 16. I write this number to the left, and put these two numbers together to get the right answer 168. (Although I say “put these numbers together” what is actually going on is that 16 is being multiplied by 10 then 8 is added. Knowing this will be helpful when the number on the right is larger than the chosen power of ten.)

Here’s an example with larger numbers. Since they are closer to 100, 100 is used instead of 10. This time the numbers are less than the chosen power of ten, but the same method can be used. Multiply -8 by -11 to get 88 (write that on the right), and add 89 to -8, or 92 to -11 to get 81 (write that on the left). 81 is then multiplied by 100 (since that is the power of ten we chose) and 88 is added. Hence the correct answer to 92x89 is 8188. This is a neat trick, but why does it work? consider the following algebra:

(x+a)(x+b)=c
x^2+ x*a+x*b+a*b=c
x(x+a+b)+a*b=c

Say x is the power of ten we chose. Then a and b are the the two numbers that represent how far our factors are from the chosen power of ten.

(Source: docs.google.com)

Thursday, August 11, 2011
Fractal Geometry: An Artistic Side of Infinity
Fractal Geometry is beautiful. Clothes are designed from it and you can find fractal calendars for your house. There’s just something about that infinitely endless pattern that intrigues the eye— and brain. 
Fractals are “geometric shapes which can be split into parts which are a reduced-size copy of the whole” (source: wikipedia). They demonstrate a property called self-similarity, in which parts of the figure are similar to the greater picture. Theoretically, each fractal can be magnified and should be infinitely self-similar. 
One simple fractal which can easily display self-similarity is the Sierpinski Triangle. You can look at the creation of such a fractal:

What do you notice? Each triangle is self similar— they are all equilateral triangles. The side length is half of the original triangle. And what about the area? The area is a quarter of the original triangle. This pattern repeats again, and again. 
Two other famous fractals are the Koch Snowflake and the Mandelbrot Set. 
The Koch Snowflake looks like: 
 (source: wikipedia)
It is constructed by going in 1/3 of the of the side of an equilateral triangle and creating another equilateral triangle. You can determine the area of a Koch Snowflake by following this link.
The Mandelbrot set…

… is:

the set of values of c in the complex plane for which the orbit of 0 under iteration of the complex quadratic polynomial zn+1 = zn2 + c remains bounded. (source: wikipedia)

It is a popular fractal named after Benoît Mandelbrot. More on creating a Mandelbrot set is found here, as well as additional information. 
You can create your own fractals with this fractal generator. 
But what makes fractals extraordinary?
Fractals are not simply theoretical creations. They exist as patterns in nature! Forests can model them, so can clouds and interstellar gas! 
Artists are fascinated by them, as well. Consider The Great Wave off Kanagawa by Katsushika Hokusai: 

Even graphic artists use fractals to create mountains or ocean waves. You can watch Nova’s episode of Hunting the Hidden Dimension for more information. 

Fractal Geometry: An Artistic Side of Infinity

Fractal Geometry is beautiful. Clothes are designed from it and you can find fractal calendars for your house. There’s just something about that infinitely endless pattern that intrigues the eye— and brain. 

Fractals are “geometric shapes which can be split into parts which are a reduced-size copy of the whole” (source: wikipedia). They demonstrate a property called self-similarity, in which parts of the figure are similar to the greater picture. Theoretically, each fractal can be magnified and should be infinitely self-similar. 

One simple fractal which can easily display self-similarity is the Sierpinski Triangle. You can look at the creation of such a fractal:

What do you notice? Each triangle is self similar— they are all equilateral triangles. The side length is half of the original triangle. And what about the area? The area is a quarter of the original triangle. This pattern repeats again, and again. 

Two other famous fractals are the Koch Snowflake and the Mandelbrot Set

The Koch Snowflake looks like: 

 (source: wikipedia)

It is constructed by going in 1/3 of the of the side of an equilateral triangle and creating another equilateral triangle. You can determine the area of a Koch Snowflake by following this link.

The Mandelbrot set…

… is:

the set of values of c in the complex plane for which the orbit of 0 under iteration of the complex quadratic polynomial zn+1 = zn2 + c remains bounded. (source: wikipedia)

It is a popular fractal named after Benoît Mandelbrot. More on creating a Mandelbrot set is found here, as well as additional information. 

You can create your own fractals with this fractal generator. 

But what makes fractals extraordinary?

Fractals are not simply theoretical creations. They exist as patterns in nature! Forests can model them, so can clouds and interstellar gas! 

Artists are fascinated by them, as well. Consider The Great Wave off Kanagawa by Katsushika Hokusai: 

Even graphic artists use fractals to create mountains or ocean waves. You can watch Nova’s episode of Hunting the Hidden Dimension for more information. 

Thursday, August 4, 2011
Have you ever looked out on a starry night and wondered what else is out there? Perhaps, who else? And if there were to be someone, something there— would they be looking out for you, too?
Don’t worry, you’re not alone. Others have theorized about it: Frank Drake (an American Radio astronomer who wrote the famous Arecibo message) made an entire equation. Behold, The Drake Equation.
N = R* × fp × ne × fl × fi × fc × L
The Drake Equation is an equation for  predicting the number of civilizations in the Milky Way Galaxy capable of  interstellar communication.
Short descriptions of what the variables of the equation represent can be found here.

The variables represent the average rate of star formation per year in our galaxy, the fraction of those stars which have planets, the average number of planets that can potentially support life per star which has planets, the fraction of those which actually go on to develop life in the future, the fraction of those which go on to develop intelligent life, the fraction of those which can release detectable signals of their existence, and (finally) the length of time for which these civilizations release signals.

That all seems like a mess, but you get the idea.
According to Drake’s parameters:

50% of new stars develop planets
0.4 planets will be habitable
90% of habitable planets develop life
10% of new instances of life develop intelligence
10% of such life develops interstellar communications
These civilizations, might, on average, last 10,000 years.

To be fair, we are not sure on the actual figures. Drake’s values gives an answer of 10, meaning that 10 of these theoretical civilizations would be able to communicate.
But the importance of Drake’s equations is not necessarily the numerical value. It lies in all the questions that the equation led him to. Who knows exactly how many stars there are and what not? These figures are yet to be discovered.
So next time you look above, remember to always question. You’re not alone in questioning and you don’t know where these questions can lead you. Like Drake, you might be led to discover companions from different worlds.

Have you ever looked out on a starry night and wondered what else is out there? Perhaps, who else? And if there were to be someone, something there— would they be looking out for you, too?

Don’t worry, you’re not alone. Others have theorized about it: Frank Drake (an American Radio astronomer who wrote the famous Arecibo message) made an entire equation. Behold, The Drake Equation.

N = R* × fp × ne × fl × fi × fc × L

The Drake Equation is an equation for predicting the number of civilizations in the Milky Way Galaxy capable of interstellar communication.

Short descriptions of what the variables of the equation represent can be found here.

The variables represent the average rate of star formation per year in our galaxy, the fraction of those stars which have planets, the average number of planets that can potentially support life per star which has planets, the fraction of those which actually go on to develop life in the future, the fraction of those which go on to develop intelligent life, the fraction of those which can release detectable signals of their existence, and (finally) the length of time for which these civilizations release signals.

That all seems like a mess, but you get the idea.

According to Drake’s parameters:

  • 50% of new stars develop planets
  • 0.4 planets will be habitable
  • 90% of habitable planets develop life
  • 10% of new instances of life develop intelligence
  • 10% of such life develops interstellar communications
  • These civilizations, might, on average, last 10,000 years.

To be fair, we are not sure on the actual figures. Drake’s values gives an answer of 10, meaning that 10 of these theoretical civilizations would be able to communicate.

But the importance of Drake’s equations is not necessarily the numerical value. It lies in all the questions that the equation led him to. Who knows exactly how many stars there are and what not? These figures are yet to be discovered.

So next time you look above, remember to always question. You’re not alone in questioning and you don’t know where these questions can lead you. Like Drake, you might be led to discover companions from different worlds.

Tuesday, August 2, 2011
Number Harmony
It is easy to recognize octaves because the frequency of an octave above a certain pitch is exactly twice the frequency of that pitch. Octaves harmonize so well that they almost sound identical, so we call these notes by the same name: an octave above or below middle C is  another C; an octave above or below concert A, 440 Hz, is another A (880  or 220 Hz). Mathematically, if a certain note H has frequency f then a note with frequency 2nf, where n is an integer, is n octaves above H (if n is negative, it is a positive power of 1/2 and represents |n| octaves below H).
Not alone in their ability to harmonize well, octaves are joined by all the intervals that make up a major or minor scale (in the Western music system), notably including perfect fifths (fifth note of a scale, 3/2 times the frequency of the starting note) and major or minor thirds (third note of a scale, respectively 5/4 or 6/5 times the frequency of the starting note). All these frequencies are ratios of relatively small whole numbers - this contributes to the harmony of the notes, just like the ratio 2/1 does for octaves. The simpler the frequency ratio, the higher the quality of harmony achieved by an interval when played out loud. The only requirement is for  the ratio to be a (positive) rational number, able to be written with whole numbers for the numerator and denominator.
However, suppose you tuned a piano perfectly according to one of the scales. Then you can play that scale and it would be perfectly in tune - but the harmony of all the other scales get thrown off! For example, E is both the third note of a C major scale and the second note of a D major scale. By tuning the piano to the C major scale, you guarantee that an E has frequency 5/4 times the frequency of a C (C to E is a major third). In a perfect C scale, D has frequency 9/8 times that of a C. Call these frequencies fC, fD, and fE.
fE = (5/4)fCfD = (9/8)fC⇒ fE = (5/4)(8/9)fD = (10/9)fD
This is still a relatively simple rational number ratio, but it’s the wrong ratio. In a perfect D major scale, E has frequency 9/8 that of D. The relative error when tuning to C is
|10/9 - 9/8| = |80-81|/72 = 1/72.
In the first days of the harpsichord and piano (keyboard instruments), tuners chose one scale to tune to, sacrificing the harmony of the other scales. Interestingly, some of the music from that era took that into account; on one hand some scales were considered “sweeter” than others based on common tuning practices, and on the other some songs were purposely written in one of the sour-sounding scales for their dissonant harmonies.
Today’s most common tuning, or temperament, is called equal temperament. Each scale sounds equally good (or equally bad, depending on your tolerance of imperfection), and the only interval which is perfectly preserved is octaves. Since, in the Western music system, there are 12 semitones from octave to octave (12 white and black keys from a note to an octave above the note), each of those keys is assigned the frequency of exactly the twelfth root of 2 times the key preceding it. What’s great about that, of course, is that this is a completely egalitarian system: no scale is sweeter or sourer sounding that any other. Yet the cost is the complete destruction of the rational number harmonies: the twelfth root of 2 is as irrational as they come, and could never in any number theorist’s wildest dreams be written as a ratio of whole numbers.
—
Further reading:”Why you’ve never really heard the Moonlight Sonata,” Jan Swafford, Slate Magazine

Number Harmony

It is easy to recognize octaves because the frequency of an octave above a certain pitch is exactly twice the frequency of that pitch. Octaves harmonize so well that they almost sound identical, so we call these notes by the same name: an octave above or below middle C is another C; an octave above or below concert A, 440 Hz, is another A (880 or 220 Hz). Mathematically, if a certain note H has frequency f then a note with frequency 2nf, where n is an integer, is n octaves above H (if n is negative, it is a positive power of 1/2 and represents |n| octaves below H).

Not alone in their ability to harmonize well, octaves are joined by all the intervals that make up a major or minor scale (in the Western music system), notably including perfect fifths (fifth note of a scale, 3/2 times the frequency of the starting note) and major or minor thirds (third note of a scale, respectively 5/4 or 6/5 times the frequency of the starting note). All these frequencies are ratios of relatively small whole numbers - this contributes to the harmony of the notes, just like the ratio 2/1 does for octaves. The simpler the frequency ratio, the higher the quality of harmony achieved by an interval when played out loud. The only requirement is for  the ratio to be a (positive) rational number, able to be written with whole numbers for the numerator and denominator.

However, suppose you tuned a piano perfectly according to one of the scales. Then you can play that scale and it would be perfectly in tune - but the harmony of all the other scales get thrown off! For example, E is both the third note of a C major scale and the second note of a D major scale. By tuning the piano to the C major scale, you guarantee that an E has frequency 5/4 times the frequency of a C (C to E is a major third). In a perfect C scale, D has frequency 9/8 times that of a C. Call these frequencies fC, fD, and fE.

fE = (5/4)fC
fD = (9/8)fC
fE = (5/4)(8/9)fD = (10/9)fD

This is still a relatively simple rational number ratio, but it’s the wrong ratio. In a perfect D major scale, E has frequency 9/8 that of D. The relative error when tuning to C is

|10/9 - 9/8| = |80-81|/72 = 1/72.

In the first days of the harpsichord and piano (keyboard instruments), tuners chose one scale to tune to, sacrificing the harmony of the other scales. Interestingly, some of the music from that era took that into account; on one hand some scales were considered “sweeter” than others based on common tuning practices, and on the other some songs were purposely written in one of the sour-sounding scales for their dissonant harmonies.

Today’s most common tuning, or temperament, is called equal temperament. Each scale sounds equally good (or equally bad, depending on your tolerance of imperfection), and the only interval which is perfectly preserved is octaves. Since, in the Western music system, there are 12 semitones from octave to octave (12 white and black keys from a note to an octave above the note), each of those keys is assigned the frequency of exactly the twelfth root of 2 times the key preceding it. What’s great about that, of course, is that this is a completely egalitarian system: no scale is sweeter or sourer sounding that any other. Yet the cost is the complete destruction of the rational number harmonies: the twelfth root of 2 is as irrational as they come, and could never in any number theorist’s wildest dreams be written as a ratio of whole numbers.

Further reading:
Why you’ve never really heard the Moonlight Sonata,” Jan Swafford, Slate Magazine