Skip to main content

Fast Fourier transform is slow

In the early 90's, when I was still getting my undergraduate computer engineering degree, I took a class in signal processing that I really enjoyed.  One of the major projects we were tasked to do was to implement a 2 dimensional fast fourier transform.

The class spent several lectures proving that the number of mathematical for a simple implementation for convolution in the spacial domain required O(n^2) math operations.  On the other hand, if you switched to the frequency domain, you required O(n log(n)) operations to perform the FFT, O(n) to perform a single multiplication per pixel and then another O(n log(n)) to convert it back to the spacial domain.

We proved it was better!


And then we implemented it, and it was orders of magnitude slower.  Theory and practice didn't match.

We showed it was slow!


The issue was a boundary we had abstracted away in our performance proofs - memory access was not a uniform cost.  The math operations we were optimizing away were now completely irrelevant due to the limitations of the computer system we were running on: a disk-less Motorola 6800 based Sun workstation.

Because we were supposed to be showing how much better the FFT code was as the image sizes increased, we had test images at 16x16, 64x64, 256x256, 512x512 and 1024x1024 pixels.  Each of these images were then converted into C double variables.  So the largest image was 8 MB, or twice the RAM of the computer (without considering the operating system).  Additionally, like students everywhere, we were all trying to do our projects at the same time.  So the machine's memory was oversubscribed.

The final nail in the performance coffin was the memory pattern that the FFT algorithm uses is not localized (for simple implementations, a lot of work has been done to improve this, but for the class we had to do this from scratch):
causing multiple page faults.  On a computer with local storage, this would still have been slower because now memory access would have to wait for an IO operation.  But it was made worse because the server was diskless - each IO operation went over a network:
Each IO operation now required a network request/reply trip to a file server over a non-switched 1 Gb  Ethernet network.  If any two servers on the network were transmitting at the same time, they would go into the randomized exponential back off algorithm.

On the other hand, the standard convolution algorithm slowly moved predictably through the image, generating a small number of page faults, making it run much faster.

When memory size is a limitation, there is an entire field of research dedicated to cache oblivious algorithms. This was one of the first times I had run into a case that so clearly illustrated that the assumptions you bring with you are boundaries that you can cross at your peril, sometimes without even knowing that you have done so.

Comments

Popular posts from this blog

Spring Boot native builds when internet downloads are blocked made simple

 No direct access to the internet If you work at a company that controls their software bill of materials, it's quite common to be blocked from directly downloading from: Maven Central Docker hub GitHub (the public parts) Getting the bits Maven Maven is first, because without it, you won't be able to compile your Spring Boot application, let alone move on to turning it into a native docker image. I will be showing changes need to work with artifactory, but you should be able to adapt it to other mirror solutions.  repositories {   maven {     name = "central"     url = "https://artifactory.example.com/central"     credentials {       username = "${project.ext.properties.artifactory_username}"       password = "${project.ext.properties.artifactory_apikey}"     }   } } With this configuration change, you should be able to download your plugins and dependencies, allowing you to compile and ...

Kotlin Notebook when you're blocked from Maven Central

 TLDR; If you are blocked getting to maven central when first using Kotlin Notebooks because of company firewalls, you can use a tool like Fiddler Tool to redirect to a different network location. Kotlin Notebooks Kotlin Notebooks are a JDK based environment that brings the Python based Jupyter Notebooks  expressiveness to IntelliJ. From the blog post announcing the plugin, it looks like this: At home, the installation of jar files looked like this: I played around with it at home, but I couldn't use it at work.  Many companies, mine included, do not allow software components to be used when downloaded directly from the internet. In my companies case, we use a product called Artifactory, which allows you to mirror the content from Maven Central while still applying policies like CVE scanning, tracking, etc. The way it should work IntelliJ, as one of the leading IDE's, generally supports this quite well.  In fact, there is a whole setting page dedicated to dealing wi...

Active vs. Passive Log4jShell remediation

 Log4jShell  All computer professionals should be aware of the Log4jShell ( CVE-2021-44228 ) and it follow on defects.  There is no shortage of opinions and lessons to be be learned: The difficulty of performing safe interpretation The problems when assumptions are not clearly documented.  I, for one, was completely shocked to find out that a logging system would actually attempt to do variable substitution in an actual message. The difficulty of finding and resolving issues with such a common library that is not provided by an OS package manager. IT'S A LOG4J CHRISTMAS One of my favorite podcasts, Security Now - episode 850 , discussed an analysis by Google regarding the depth of log4j dependencies.  From the show notes : One contributing reason is because Log4j is, more often than not, an indirect dependency. Java libraries are built by writing some code which uses functions from other Java libraries, which are built by writing some code which uses functions f...