The Secrets of the Secrets of Optical Flow

02 June 2020 Michael J. Black 6 minute read

optical flow CVPR Longuet-Higgins computer vision

The following paper was awarded the Longuet-Higgins Prize at the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). The prize is given annually by the IEEE Pattern Analysis and Machine Intelligence (PAMI) Technical Committee for "Contributions in Computer Vision that Have Withstood the Test of Time."

Sun, D., Roth, S., Black, M. J., "Secrets of optical flow estimation and their principles," In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 2432–2439, IEEE, June 2010

What makes a paper withstand the test of time? Why this paper? What have I learned from it?

When I started working in computer vision research in the late 1980’s, the field was not very rigorous. Data (images or video) were hard to come by, quantitative evaluation was rare, and we were just happy to get anything to work on one or two images. When every paper had a different algorithm applied to different images, it was impossible to understand how the field was progressing.

The next phase of the field really began with the introduction of high-quality datasets for evaluation and comparison. In the field of optical flow, the Middlebury dataset [1] was a major step forward because it provided a rigorous foundation for comparison of methods. Still, there was a problem.

I was frustrated that every new paper on optical flow would demonstrate better performance on Middlebury than the last, but I didn’t really know why. The problem was that every paper changed multiple things at once; a typical paper changed both the objective function and the optimization method that minimized that function. This made it impossible to assign credit to the causes of any improvement. This was holding the field back.

The “secrets of flow” paper began as an attempt to understand, quantitatively, the state of the art in optical flow. We wanted to know which, among the many engineering decisions, actually mattered. To do so, we adopted a common optimization framework and held that fixed. We took a “classical” formulation of the problem as a baseline and then we systematically varied one thing at a time to measure its impact.

This may not seem revolutionary today now that “ablation studies” are the norm, but at the time, this was not the standard. This systematic approach led to some interesting observations.

First, there was nothing in the field so universally agreed upon as the fact that the classical “Horn and Schunck” (HS) optical flow method [2] was terrible. Everyone just knew this. While, indeed, the well-known limitations of their least-squares formulation exist, when we implemented the method with modern optimization methods, it was surprisingly good — better, in fact, than several much more recent methods.

So what was going on? If you read the original HS paper, you find that they knew exactly how to formulate the problem well and understood how to optimize it properly. But on the computers of the day (1981), actually doing the optimization properly was impossible and they resorted to a terrible hacky optimization strategy. It was not the original formulation, but this hardware-limited approach, that resulted in the terrible performance that people had been citing for 29 years. A systematic approach to separating what is optimized from how it is optimized, revealed a deeper truth. Of course, Marr had taught us years before to make this separation, but nobody had carefully done this for HS because it was such a great punching bag — everyone loves a baseline that is easy to beat!

We then systematically evaluated many of the things that people vary in classical flow methods — pre-filtering, pyramid levels, interpolation methods, derivative filters, etc. There were entire subfields on the design of derivative filters alone. What we found was that most of these things didn’t matter. Here we were careful to do statistical significance testing and provided p-values. This was definitely not the norm in the field at the time. Then and now, everyone likes a bold number in a table that is just a little bit better. When we analyzed all these different formulations using a fixed optimization method, we found few differences that were significant. That meant that there were a lot of papers published that reported “better” results that were not really better. The field just had no way of knowing.

A couple of things did matter, however. Using a robust penalty function was better than Gaussian; in particular the Charbonnier function proved better than more robust functions like those I used in my thesis work [3]. The other key thing that mattered was a heuristic. People had noticed that if you took the flow produced at intermediate stages of the optimization and you just applied a median filter to it, that the results got better [4].

So the most important factor to success was basically a “hack.” Intermediate stages of the optimization are noisy, so apply a median filter. The problem is that it is then unclear what exactly is being optimized — certainly not the original objective function. Our key technical contribution was to show that this heuristic had a formal interpretation as a form of regularization with a large spatial neighborhood. That is, the algorithm with the heuristic actually corresponded to the optimization of a different objective function. We then reformulated the objective function to incorporate this new term. By recasting the median filter heuristic formally in the objective function, we could then see how to improve it by making it a weighted image smoothing term. This resulted in the “Classic+NL” flow method (ie. a Classical flow method with a Non-Local smoothness term). This method was number 1 on the Sintel benchmark at the time.

The above is an example of a more general truth: if a hack consistently produces good results, there is probably a principle behind it. Find the principle, understand it, and you will get even better results.

There was one more thing that helped this paper stand the test of time and that is that we made all the code available in Matlab (while outdated, it is still available). This provided researchers with solid baseline implementations of established methods (Horn & Schunck and Black & Anandan) as well as the latest SOTA method (Classic+NL and its variants).

What did the reviewers think? Definitely Accept, Weak Accept (“For me this paper is between weak accept and borderline”), and Weak Reject. The “Weak Reject” summarized their opinion as

“All ingredients are known. The paper just engineers the bits and pieces together.”

The paper was accepted as a Poster-Spotlight, not an Oral. The lesson is to do the work that you think is important and hopefully the community will agree in time.

In summary, this paper has withstood the test of time because it (1) helped the field establish what really mattered and what didn’t; (2) established an experimental methodology to identify what changes really matter; (3) provided a pretty decent baseline flow method (SOTA at the time) that incorporated all that we knew at the time; (4) provided code so that anyone could compare with us and build on our method. We also followed up this conference paper with a longer, more detailed, journal version of the CVPR paper [5].

It is nice to be able to look back on an old paper and still be happy with it. I’m still happy with this one and I feel so very fortunate to have had great students like Deqing and Stefan as collaborators and friends. That is the final, true, “secret” of this paper: work with great people who you like, and enjoy the process.

References:

Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R., A Database and Evaluation Methodology for Optical Flow, In Int. Conf. on Computer Vision, ICCV, pages: 1–8, Rio de Janeiro, Brazil, October 2007
B. Horn and B. Schunck. Determining optical flow. Artificial Intelligence, 16:185–203, Aug. 1981.
M. J. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. CVIU, 63:75– 104, 1996.
A. Wedel, T. Pock, C. Zach, D. Cremers, and H. Bischof. An improved algorithm for TV-L1 optical flow. In Dagstuhl Motion Workshop, 2008.
Sun, D., Roth, S., Black, M. J., Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles behind Them, International Journal of Computer Vision (IJCV), 106(2):115–137, 2014,

Michael J. Black

optical flow CVPR Longuet-Higgins computer vision

The Perceiving Systems Department is a leading Computer Vision group in Germany.

We are part of the Max Planck Institute for Intelligent Systems in Tübingen — the heart of Cyber Valley.

We use Machine Learning to train computers to recover human behavior in fine detail, including face and hand movement. We also recover the 3D structure of the world, its motion, and the objects in it to understand how humans interact with 3D scenes.

By capturing human motion, and modeling behavior, we contibute realistic avatars to Computer Graphics.

To have an impact beyond academia we develop applications in medicine and psychology, spin off companies, and license technology. We make most of our code and data available to the research community.