## Phase Transitions: When a Model is Screaming Out that it Needs Revision

So, I’ve been thinking a lot lately about one particular sub-field of physics and how it might apply to modeling data.

In general, the thing that has been on my mind is the area of statistical mechanics. That’s the study of how a huge number of particles in atomic-level physics can aggregate together into our macroscopic world. But beyond the large area of stat mech, I’ve been thinking in particular about the topic of phase transitions.

What happens when these large scale models cease to be consistent with the small-scale behavior of the system?

This is interesting to me because, in a lot of ways, big data involves the same sort of problems.

We amalgamate the behavior of millions of website visitors, or IoT devices, or whatever, into a meaningful model that can be described by a few parameters.

When we do this, we should be interested in looking at when our model will suddenly need to be revised.

So, I’m going to take this space here to talk a bit about how these come about in our physical descriptions of nature, and then speculate a bit on how they might be leveraged in discussing data.

## Unstable Equations

Let’s think about a gas — for simplicity, let’s say that it’s kept in a well-insulated container. How might we describe it? Naïvely, we’d probably go and use a few variables that described the observed state of the gas. We’d use stuff like the pressure P of the gas, its volume V, and its temperature T.

If we already know about molecules, we might use the number N of molecules in the gas. If the gas is diffuse enough, we get a relationship between these things that is something like our favorite formula from intro chemistry:

Where k is a constant called Boltzmann’s constant.

However, this isn’t the end-all of gas laws. If we start to factor in the simplest sorts of interactions between particles, we find that the more dense the gas is, the stronger particles attract each other, and the more space the particles take up. With that factored in, we get something like:

where a and b are empirical constants that depend on the gas under consideration.

Now, for sufficiently high values of T, this equation is nice and well-behaved. But it should be easy enough to see that if T is low enough, then you can get behavior that is somewhat crazy.

In particular, for constant values of N and T, and sufficiently low T, you can find solutions where P > 0, which is crazy when you think of it — increasing the volume of the gas increases the pressure!

No stable gas can satisfy these conditions. You might think this means that the van der Waals equation is junk (and it’s true that it’s just an approximation). But what it really means is something more interesting — the van der Waals equation of state is telling you that you don’t have a gas at all anymore. The gas is undergoing a phase transition, and is either liquefying or solidifying.

And, on the other side of that phase transition, we will need a new equation of state to describe a liquid or a solid.

Now, what does this mean for data?

## Changing the Focus

In a lot of ways, we are doing the same thing with big data. We are measuring the net effects of millions of individuals doing billions of things. We are tracking their aggregate behavior using a model that has far fewer degrees of freedom than the underlying data.

And the model isn’t wrong — it’s the model screaming out that it no longer applies. It shouts out that it needs to be replaced with a model that applies more directly to the underlying situation.

For instance, imagine that we’re running a regression on a series of sensors for a predictive analytics platform, such as SparkCognition’s SparkPredict, and we are attempting to predict future sensor output on a wind turbine from previous sensor output.

Generically, we’re going to be measuring a bunch of things like airflow velocity, temperature, pressure, etc. We build a model, train it, deploy it and watch it evolve — and at first, it works beautifully.

But then, winter comes, the behavior of the turbine changes, and we’re predicting negative pressure. We know that’s impossible, but that’s just the model telling us that we need to come up with a new model, applicable to winter. We’ve undergone a phase transition, and we’ve learned about it from our model, itself.

There are obviously much deeper dives to take here. But one of the first lessons is always that fundamental question that all mathematicians and scientists should ask themselves:

Is my model making sense?