Big can be beautiful

Scale does matter with data. The more of it you have the more value you get from it. But, as any data scientist will point out, you must start small, understand the data fully and keep it as simple as possible.

Imagine that you are selling films on-line. The data you capture is limited to what a person bought and when. It is simple data and easily understood. With a small amount we might be able to see some patterns e.g. we might spot someone who buys a film every couple of weeks.

This tells us two things. Firstly we might be able to sell more to that person if we time our promotions to fit their buying habits. Secondly, if we increase the amount of data to hand we might spot similar patterns that we can apply across a larger number of people e.g. people who buy a particular ‘rom com’ box set ‘typically’ buy two other box sets as well.

The complexity of the data has remained unchanged, we still know only what someone has purchased and when. But the volume of data involved gives the patterns that we see enough credibility for them to become the bases for predictions across the whole population of people.

But don’t rush to be ‘big’

However when data is complex increasing its scale can become counter-productive.
As an example, within one of the country’s biggest Housing Associations we are tackling an age old problem in the world of social housing. If you put more money into maintaining a property, do you spend less on repairs? If we can show this to be so, can we then find a way to minimise the spend on both?

This sounds like a big data problem – thousands of properties, multiple things to maintain within each, and hundreds of thousands of repairs.

But care is needed. I can take one property and see that more repairs are needed to, say, a kitchen as it gets older. But what I may not have spotted is that half way through my period of analysis the tenants changed. An older couple, who put a relatively light load on the kitchen were replaced by a family with young children.

Jumping in with big data might lead me to assume a relationship between the age of the kitchen and the number of repairs, but I may have simply shown a relationship between category of resident and the number of repairs.

It is only by paring the data down to isolate all the different variables that might have an influence over the end result can the correlations of interest be explored.

Be prepared for a little math

But if the data is small, is it safe to assume that a relationship at that scale can be assumed for the whole population of properties?

Yes, if you are prepared to use a little math.

We have demonstrated a clear relationship between the age of two components, kitchens and bathrooms, and the volume of repairs to them.

But our data sets were small. So we used basic statistical rules to show that the findings at a small scale can be assumed for the whole population of properties – but with levels of probability attached. E.g. with every 5 years increase in age, kitchens need ‘x’ more repairs per year – at a 90% level of confidence (in other words, there is a 10% probability that this statement is wrong).

Then big data can be made to work

With these factors known more data can now be collected. The more of it we have, the better the probabilities will be shown to be. But its vital that the data is no more complex than that used to determine the initial findings. If it’s kept just as simple, at a large scale it can be used to predict repairs in the same way that simple DVD sales data can be used to make purchase predictions across a whole population of people.

The task is to find a way to minimise the total cost spent on both maintenance and repairs. As confidence grows in the hypothesis that ‘x’ more repairs are needed per kitchen with every 5 years it increases in age we can start to put some sums of money into the equation.

We already know what it costs to replace a kitchen, and we know what it costs to repair a kitchen. What we don’t know, but we can now start to model is how the total for both sets of cost will change if we assume different frequencies of kitchen replacement.

E.g. if we replace kitchens say every 15 years in all our properties we can now calculate both the cost of replacing them and the costs of repairing them (because we now know how many repairs to expect after 15 years).

We can now change the frequency of replacement and see how both sets of costs change.

Big savings from big data

Using some simple big data techniques we can now say what we believe the optimum age to be for replacing the major components put into houses. And this allows us to estimate how much money could be saved if components are replaced at their optimum age rather than that achieved at present.

Repairs and planned maintenance are the two biggest areas of expense for most Social Housing organisations. Big data techniques can show how big savings can be achieved.


If you would like to understand better how the simple use of big data techniques can help your organisation please don’t hesitate to get in touch.

Paul Clarke
Develin Consulting