Randomized Numerical Linear Algebra

By: DJ Rich

Posted: Updated:

This is a follow up post on my video ‘Is the Future of Linear Algebra.. Randomized?’, which explains Randomized Numerical Linear Algebra (‘Rand-NLA’), the methods of using randomization to speed up linear algebra computations. It was spurred by a paper that convinced me its a promising path through the decades-old operations-complexity wall of NLA algorithms. This post is to answer outstanding questions and add updates after the video is released.

Is Rand-NLA just Monte Carlo?

This is a definitional issue that shouldn’t matter (who cares how we label things?), but it’s worth addressing to spare Rand-NLA some of the Monte Carlo stigma. That stigma is slowness; Monte Carlo methods are remarkably general (you can answer many more computational questions when an approximate answer is acceptable and the tactic of simulate-and-count is available), but the cost of that generality is reduced speed. Repeated simulations are often overkill; they can answer our specific question but can also answer many others, and so intuitively, we’re doing more work than necessary.

But Rand-NLA isn’t like this. It isn’t a problem agnostic strategy for answering a huge set of questions. It addresses specific but ubiquitous problems, like matrix decompositions or linear least squares problems, with randomization as a means of focusing compute on the small subset of the given information that determines the answer. It allows us to precisely separate essential from irrelevant information and this quality is unlike anything to be imagined when recalling Monte Carlo methods. Another quality unlike Monte Carlo is that Rand-NLA probabilities of approximations can sometimes be so favorable, they’re comparable to their deterministic alternatives.

Who cares about least squares?

In the paper1, solving linear least squares problems (albeit a generalized flavor) faster is a primary focus of Rand-NLA. One might hope, as I did, to find randomized algorithms for matrix-multiplication and be disappointed. Further, in machine learning, there’s a mild stigma against modeling with least squares due to its sensitivity to outliers. A recommended approach to making a model ‘robust’ is to switch out the least squares objective for something like minimizing mean absolute error or the Huber loss. Also, least squares is favored historically because it makes for an easy-to-optimize objective, but as compute grows, shouldn’t we graduate to more intuitively desirable objectives, like mean absolute error?

This thinking, which I’m guilty of, is myopic. First, if least squares offers a compute advantage and suffers outliers, our desperation for compute speed means we should accept the tradeoff and just handle outliers differently (clip them!). Second, least squares is ubiquitous despite the above thinking, evidenced by its setting as the default loss for regression problems in so much software. Surely quicken those fitting routines, where possible, is useful. Lastly, linear least squares is about finding a point within a linear subspace that is closest to an arbitrary point. It’s about aiming an input at an output through a linear function. As such, it can serve as a widely useful subroutine, pushing and corralling a point through a large and complex optimization space. One example is Newton’s Method, where each step involves solving a system of equations, a special case of the linear least squares problem.

Randomization for NLA has been around for over a decade. Why is it promising now?

It takes a long time for good research to change software, especially when the software is as foundational as LAPACK and the idea is as deviating as randomized NLA. I anticipate it’ll be at least another decade before randomized routines are used as frequently as LAPACK. That said, I see a few reasons to be confident in the trend:

  • The machine learning craze has created an unprecedented demand for compute, and machine learning in particular is friendly to the probabilistic and approximate nature of randomized NLA. Most avenues for expanding compute will be pursued and randomization is so cheap and effective, it won’t be neglected. It already has considerable attention.
  • Dominant and motivating randomized algorithms have been developed recently. Before any foundational software is displaced, experimental software needs to demonstrate the advantage and that’s happened decisively recently. We saw this with the list of papers discussed in the video.
  • The ‘Randomized Numerical Linear Algebra’ paper itself demonstrates the NLA community’s organized motivation to make this change. Excuse the argument-by-authority, but it’s indicative that the original designers of LAPACK are in favor of this research direction, and they appear in good company.

The friction, from what I gather, is that developing foundational hardware today is extremely hard. In the 1990’s when LAPACK was released, it was easier to get widespread adoption because there was little competition. By the 2020’s, the NLA software and hardware ecosystem had expanded many times over, and so there are many more things to touch when pushing alternative algorithms. That frustrates change and limits the influence of Rand-NLA. If these ideas were thought and appreciated in the 1990’s, randomized algorithms might very well be the standard they aspire for today.


Footnotes

  1. See section 3.1, where the optimization problems addressable with Rand-NLA are presented. They are generalized flavors of least squares and quadratic minimization.