It’s nice to be back after a pretty crazy two weeks or so.

Let me start off by stating that this blog post is simply me pondering and may not be correct. Feel free to comment on inaccuracies or improvements!

In preparation for an exam and my natural tendencies to be masochistic, I am forcing myself to find the exact complexities of some sorting algorithms and I decided to start with a favorite – mergesort. Mergesort divides an array or linked list first into two halves (or close to it) and then recursively divides the successive lists into halves until it ends up with two lists containing 1 element each – the base case. The elements are then compared and switched so that they are in order, and form their own list.

At successive levels we compare the last element of the first sublist to the first element of the second sublist and merge them together to form another list. This process continues up the recursion tree until the entire original list is sorted. For a more comprehensive and precise description, see this article on mergesort.

**Easy: The Worst Case**

The worst case is easy as any CS student will tell you. Looking at the recursion trees below, we see that in the worst case, we must perform “layers” of “parallel” merges corresponding to the height of the recursion tree, each merge performing comparisons. Then, the worst case complexity of merge sort is .

**Three Cases**

Trivially, if the number of elements is a power of 2 (figure a), all of the sublists at each level of the recursion tree will have the same size. This forms a recursion tree that is full and balanced. In this case, we have the worst case complexity because each element is involved in exactly merge operations each of which take time. In situation b the number of elements is 1 smaller than a power of 2 (i.e. 3, 7, 15). In situation c, the number of elements is 1 greater than a power of 2 (i.e. 3, 9, 17).

In situation a, the number of merge operations required for each of the elements is which yields operations.

In situations b and c, some elements require more merge operations than others; however, the number of merges differs by at most 1. The number of merges for each element is *approximately * and is exactly or , then, the total number of work performed is equal to

In situation a, we have

Then what? We need to find expressions for the constants and . I will call them and . I drew out several recursion trees and got the following table:

From the table, I extracted the following relationships. I will leave the proof by induction to the reader.

This is not quite **exactly** as I have heard some people say, but it is pretty darn close (). Using the code below, I simulated several values for and and the corresponding plot for .

n <- seq(1,2000) an <- bn <- ifelse(ceiling(log(n)/log(2))== floor(log(n)/log(2)),n/2,0) i <- seq(1,floor(log(max(n))/log(2))) bn[2**i + 1] <- n[2**i + 1] - 2 an[2**i + 1] <- 2 an[1] <- 0 bn[1] <- 0 for (i in 2:length(an)){ an[i] <- ifelse(an[i] == 0, 2 + an[i-1], an[i]) bn[i] <- ifelse(bn[i] == 0, bn[i-1] - 1, bn[i]) } T <- an*ceiling(log(n)/log(2)) + bn*floor(log(n)/log(2)) plot(T~n, type= l ,main="Exact Runtime of Mergesort", xlab=expression(italic(n)), ylab=expression(italic(T(n)))) curve(x*log(x)/log(2), add=TRUE, col="red") legend(topleft, c("Exact Solution" ,expression(n*log(n))),lwd=c(1,1), col=c( "black" , "red" )) Nlog2N <- n*log(n)/log(2) my.model <- lm(T~Nlog2N) TSS <- sum((T - mean(T))^2) RSS <- sum(my.model$residuals^2) MSS <- TSS - RSS R2 <- MSS/RSS

Note that there is not a perfect overlap here.

**The Precise Regression Issue**

Naturally, if the exact running time of merge sort is then I should get a regression model that has a perfect fit to the data…

Call: lm(formula = T ~ Nlog2N) Residuals: Min 1Q Median 3Q Max -90.15504 -13.84243 0.02602 19.70344 44.44878 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.016e+01 1.191e+00 8.526 <2e-16 *** Nlog2N 1.005e+00 9.825e-05 10224.908 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 28.46 on 1998 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 1.045e+08 on 1 and 1998 DF, p-value: < 2.2e-16

Hmm. The value is exactly 1. This confused me for a while, until I realized that this was a *precision* issue in R, and not a regression issue. Note that if this were a perfect fit, the Residuals section should be completely 0. Also, the base of the logarithm ( or 2) does not matter; the results are exactly the same! So why is ?

So why is this an issue? Most people do simply round up 0.99999809 to 1, but 1 is profound to statisticians: it means that the linear model has a perfect fit. Ok, so maybe it is a precision issue or option within R. But maybe not… look at the p-value! The question to take home is: why is it that the p-value has such precision, while the value does not? Personally, if we had to round, I would round the p-value to 0, and keep the precision of . Thoughts?

**Conclusion to the Original Purpose of this Post**

So, to make a really long story just long, the total running time is not *exactly* , rather it is .

That’s a really neat exercise you went through. I like how you took an idea and worked through an example in practice.

w/r/t your question about R rounding R^2, I suspect it is a relic of the domain of application of most linear models. Most models are no where near that close to 1. And with limited data feeding into the model, there’s a heck of a lot more error elsewhere in the model than the rounding error you are seeing in the rounding of R^2. So in practical application it makes sense to round R^2 to two decimal places. In other words, you’ve backed into an unusual “corner solution” or extreme example for lm().

From an applied point of view it makes me wonder why the P and F stats are not rounded 😉 It may be as simple as a desire to display the output in a given space.

-JD Long

I think it is not a precision issue, just a printing issue. In R something like,

print(summary(my.model),digits=12)

will show more digits.

Moreover, R^2=1 is not very special statistically. Just because a particular sample has a perfect linear fit, does not mean that the underlying random variables have an exact linear relationship. Maybe this is just a fluke sample. Moreover, do we really need to differentiate between no noise and a tiny bit of noise?

Conversely, with floating-point arithmetic, a sample from two random variables which have an exact linear relationship may give an R^2 unequal to 1.

Examples cannot prove theorems. Floating point examples even less so.

Thank you for your comment. You bring up two excellent points. I was referring to the sample at hand. If we are interested in the population parameters and fit, it would not necessarily have a perfect fit. Also, I was ignoring the effect floating point arithmetic would have.

Oh, but also… in this sample, there is not a perfect fit, so the R^2 for this sample should not be 1. That’s where I was originally coming from.

@Ryan — An educational example almost worthy of John Cook.

@JD,

Calculation of the p-value for the F-stat is bumping up against the limits of floating-point arithmetic:

> .Machine$double.eps

[1] 2.220446e-16

I suspect that R reports the p-value as < 2.2e-16 rather than round to 0 to avoid claiming that the probability of observing a more extreme F-statistic is 0. More than one of my professors in grad school harped on us for that.

Mike

Another good point. One would hope though that the same would apply to the R^2 value.