Efficiently Equating Triangular and Square Numbers

Number Theory is full of fun questions that may not have the most immediate practical applications, until they do. For example, many of the concepts which are foundational to cryptography were initially viewed as having no real world applications and being little more than mathematical and academic exercises, but as we know cryptography is a crucial component upon which the security of our technological systems are built upon. For fun, let’s look at an efficient way to find values where a triangular number is equal to a square number. This post also provides a great showcase of how computer science and mathematics have such deep overlaps.

First of all, we need to know what triangular and square numbers are. Let’s think about each number being a dot. Then a triangular number is one where the number contains enough dots to form a proper triangle. For example, \(1\), \(3\), \(6\), etc. because, well, one is just one, three would have one dot and then two below it, and six would have one dot, followed by two, followed by three. You may have already noticed the pattern. \(6\) is a triangular number because \(1 + 2 + 3 = 6\). \(6\) is the third triangular number. Therefore, if we want the fourth, we can simply add the the next number as the base of the previous triangular number. We can now determine a relation between the previous and successive triangular numbers.

\[T(n) = T(n-1) + n\]

Since the \(n\)th triangular number is found by adding up the numbers from \(1\) to \(n\), we can also find them using the following formula:

\[T(n) = \sum_{k=1}^{n} k = \frac{n(n+1)}{2}\]

More concretely, \(T(4) = \sum_{k=1}^{4} k\).

Once again thinking about each value from \(1\) to \(n\) being dots, a square is one where we can form a square. This is always going to simply be \(m^2\) for some \(m\).

This also means that every \(S(k) = T(n) + T(n-1)\)

If we want to determine what numbers are both triangular and sqaure numbers simulatneously, then we should just equate the two formulae.

\[\frac{n(n+1)}{2} = m^2 \rightarrow n^2 + n - 2m^2 = 0\]

Solving for \(n\) using the quadratic formula will give us a nice means of finding numbers which satisfy both triangular and square numbers simultaneously:

\[n = \frac{1}{2} \Big( -1 \pm \sqrt{1 + 8m^2 } \Big)\, .\]

However, this doesn’t mean that we can just plug in any value of \(m\) and expect to get a value of \(n\). In fact, the radicant, the stuff under the radical sign, needs to form a perfect square itself. It also needs to be an odd perfect square since we’ll always be subtracting one and then halving the result, which itself must be a positive integer, \(n \in \mathbb{Z}^+\).

I really don’t want to have to compute these by hand, so let’s write a little bit of code to assist with this process.

We’ll write a function called n which will take an Int64, \(m\), as input and return a tuple of three Real numbers or Nothing if the radicant is not a perfect square. We’ll compute the radicant, determine if it is a perfect square, and then use vectorization, that’s the .* to compute the positive and negative solutions, and finally package everything into a \(3\)-tuple.

function n(m::Int64)::Union{Nothing, Tuple{Real, Real, Real}}
    radicant = sqrt(1 + 8 * m * m)
    radicant != floor(radicant) && return nothing

    n_output = 0.5 .* (-1 + radicant, -1 - radicant)
    (m, n_output...)
end

function print_trisqr_numbers(upper_bound)::Nothing
    for k in 1:upper_bound
        x = n(k)
        isnothing(x) && continue
        println(k, ": ", x)s
    end
end

Running print_trisqr_numbers(2000) will yield:

(1, 1.0, -2.0)
(6, 8.0, -9.0)
(35, 49.0, -50.0)
(204, 288.0, -289.0)
(1189, 1681.0, -1682.0)

In every case here, given \((a, b, c)\), \(a^2 = \sum_{k=1}^b k\). It also turns out that we will never use the negative solution, so we can actually modify our function to just compute the positive solution. We’ll modify our code and types and output a tuple of two Int64 values which given our quadratic equation above will turn out to be \((m, n)\) when \(m\) forms a square which matches the triangular number of \(n\).

function n(m::Int64)::Union{Nothing, Tuple{Int64, Int64}}
    radicant = sqrt(1 + 8 * m * m)
    radicant != floor(radicant) && return nothing

    n_output = 0.5 * (-1 + radicant)
    (m, convert(Int64, n_output))
end

Running print_trisqr_numbers(2000) again,

(1, 1)
(6, 8)
(35, 49)
(204, 288)
(1189, 1681)

We are looking good now for getting more points.

Let’s see where it starts to get hairy. I’ll run it for up to 200e6.

I actually ran into an InexactError exception and the numbers may begin to get large, so I’m going to modify both functions to handle this using BigInt types. We’re going to hold off including any exception handling for the moment until we have more clear exceptions to actually handle. I am going to give a default value of \(100,000\) to the upper_bound argument for our print_trisqr_numbers procedure and add some help content. I’ll also change both to keyword arguments and add an extra argument for lower_bound that defaults to \(1\).

Note that I’m using the terms function and procedure semi-interchangeably for print_trisqr_numbers, however for the record a function always has a non-null return value while a procedure always returns void or null or, in the case of Julia, nothing. Essentially a procedure operates on something by reference or through a pointer, often (but not always) yielding some sort of side-effect without returning a result directly.

Let’s also verify that BigInt is a subtype of Integer, so if I set the parameter types as Integer, then the function should be able to accept any sort of integers for the lower_bound and upper_bound.

BigInt <: Integer
> true

In Julia, the <: operator reads as “is a subtype of.” Good deal.

"""
    print_trisqr_numbers(lower_bound, upper_bound) -> Nothing    
    print_trisqr_numbers(lower_bound) -> Nothing
    print_trisqr_numbers(upper_bound) -> Nothing

Procedure to iteratively test values ranging from `lower_bound` 
    through `upper_bound` and print out $$(m, n)$$ such that: 
    $$m^2 == \sum_{k=1}^n k$$.
"""
function print_trisqr_numbers(; lower_bound::Integer = 1, upper_bound::Integer = 100_000)::Nothing
    @time begin
        for m in BigInt(lower_bound):BigInt(upper_bound)
            x = n(m)
            isnothing(x) && continue
            println(x)
        end
    end
end

We’ll also update n(m):

"""
    n(m) -> Union{Nothing, Tuple{BigInt, BigInt}}

Compute the mapping from $$\text{Dom}(n)$$ to $$\text{Cod}(n)$$ 
    given an $$m$$ as input and where $$\text{Cod}(n) \subset 
    \mathbb{R}^2$$. If the root is not a perfect square, 
   we know it maps to the nullspace and return `nothing`.
"""
function n(m::BigInt)::Union{Nothing, Tuple{BigInt, BigInt}}
    radicant = sqrt(one(BigInt) + BigInt(8) * m * m)
    radicant != floor(radicant) && return nothing
    
    n_output = 0.5 * (-one(BigInt) + radicant)
    
    (m, convert(BigInt, n_output))
end

Running this updated function to verify we are getting the output we expect.

(1, 1)
(6, 8)
(35, 49)
(204, 288)
(1189, 1681)
(6930, 9800)
(40391, 57121)
(235416, 332928)
(1372105, 1940449)
(7997214, 11309768)
(46611179, 65918161)
 65.550897 seconds (1.70 G allocations: 46.198 GiB, 13.15% gc time, 0.05% compilation time)

Looks like we are starting to get some friction seeing that this is taking over a minute to run. We can run these calls in parallel since they are not relying on previous computations.

Our function is returning \((n, m)\) for some given \(m\), so we essentially have a mathematical mapping of \(n(m): m \mapsto (n, m)\) let’s define a vector space which contains all of these points so we can ask some additional questions about the mapping and the vector space in a rigorous manner.

Let \(S\) be a vector space such that \(S \subset \mathbb{R}^2\) and contains all of the vectors described by \((n, m)\). We want to find every point of \(S\) up until computing the next result is just unreasonable.

  1. We’ll use Big-O concepts to determine where our upper bounds is – really called Big-\(\Theta\). This will give us a general idea of how many flops we can execute comfortably before things get particularly slow. We can use some multi-threading and/or parallelization tools from Julia to run the individual computations asynchronously. Our function \(n(m)\) is clearly constant, \(O(1)\), since it directly maps points from \(\text{Dom}(n)\) to \(\text{Cod}(n)\), however our print procedure obviously grows with the number of points that we test in \(\text{Dom}(n)\), therefore we would argue that together they require linear time, \(O(N)\).
  2. What if we can cut down on the number of \(m\) to compute, shrinking the number of \(m\) to check in the domain of \(n(m)\), \(\text{Dom}(n)\)? To be super clear, we are not reducing the number of points in the domain, but rather reducing the count of those we need to check since we will already know they map to the nullspace. Each time we check an \(m\), we are executing the sqrt and floor functions. If we can mathematically identify when something is or is not a perfect square root, that would significantly cut down the number of items to check, thereby reducing the computational time significantly. And in fact, there is something we can do. We may be able to use a concept from number theory called Quadratic Residues which may help us pre-determine if a square root is a perfect square or not, thereby being able to skip over a massive, massive number of input values. I’ll need to dig into this a bit and determine if the cost to compute the quadratic residue is less than computing the root.

Mathematically, our linear mapping, \(n(m)\), appears to be what we call “surjective” or “onto” meaning that for every point that exists in the domain, \(\text{Dom}(n)\), there is a point in the co-domain, \(\text{Cod}(n)\). In many cases, the points map to the nullspace, represented by the \(0\) vector, since that would be our representation of nothing. Because many points from the domain map to the nullspace, \(n(m)\) is not “injective” or “one-to-one” and therefore since it is not both injective and surjective, it is not bijective, which also means it is not linearly independent. Often times, mathematics is all about collecting these various facts about the objects which then provide us with additional facts and properties that we can use to further explore the objects, draw conclusions, and rigorously express ourselves.

Still working on this one. :-) More to read soon!