In message , Paul Giverin writes
Sorry, that was a bit opaque. OK, for the Civic Type-R I used to own, I've got a spreadsheet with 20,000 miles or so of fills, mileage and whether it was Optimax or 95RON. So, knowing the volume of the tank, the composition of the petrol in before filling and the type of petrol added, I can calculate what % Optimax the car was running on for that fill.
It's then possible mathematically to work out whether high values of % Optimax are associated with high values of miles per gallon more often than can be explained just by random coincidence. One such test is Kendall's correlation coefficient. It's part of a branch of statistics called "nonparametric statistics", which deals with data by ranking it rather than by looking at the actual values. The main advantage of nonparametric methods is that they make fewer assumptions about the underlying data and are less likely to "see things that aren't there". IOW, they're generally more conservative.
Anyway, applying that tells me that the more Optimax was in the tank, the more likely I was to get higher mpg. Taking a bit of a liberty and using another statistical method to fit a straight line to the data tells me that it's only worth an extra 1.6 miles per gallon at 100% Optimax, and that there's a shit load of scatter. So I'd only see that extra 1.6 miles per gallon averaged out over thousands of miles, and it wasn't worth the difference in the price of Optimax.
You can't do the same analysis for Matthew's data, because while I randomly stuck the odd tank of Optimax in, he more or less used nothing else for 12 months. So an alternative approach is to divide his data into two groups, tanks of 95RON and tanks of Optimax/VPower. You can then use a different statistical method to determine whether any difference in miles per gallon between them is likely to be down to chance. A suitable non-parametric test for this is called the Mann-Whitney U-test. You basically put both lists of mpg numbers together, sort them and number them with their rank. Then you separate them, throw away the raw number and keep the rank, like this:
95RON 30, 29, 33, 28, 31 Optimax 30.5, 32, 34, 36mpg rank fuel
28 1 95 29 2 95 30 3 95 30.5 4 Opt 31 5 95 32 6 Opt 33 7 95 34 8 Opt 36 9 Opt95 RON 1,2,3,5,7 sum = 1+2+3+5+7 = 18 Optimax 4,6,8,9 sum = 4+6+8+9 = 27
You then add the ranks up for each group and plug the sum of the ranks and the number of items in each group into a formula and end up with a number you can look up in a statistical table. By comparing the number you found with the number in the table, you can determine the probability that using Optimax or 95RON actually made no difference to fuel consumption, and any apparent difference was just down to chance. For Matthew's data, it's less than 1 in 20. So we can be 95% sure that Matthew's car gave better mpg when he was running it on Optimax than when he was running it on 95RON.
But, there's a problem; there is no control for all of the other factors which could affect fuel consumption. Almost all of the Optimax data is from May 2006-April 2007, 47000-58000 miles. Maybe it's just because by then the car was fully run in. Maybe his pattern of journeys or his driving style changed. Maybe the weather was milder. Who knows?
Clear as mud?