My views on OpenMP
In private email a correspondent observed that OpenMP makes threading very easy, but "it really seems under utilized in the community." (Here, 'community' is 'scientific programming.') I was surprised to find out that I had strong views on the topic.
OpenMP sits between several other pieces of technology, being:
- GPU computing
- cloud computing
- POSIX and other common threading libraries
The new hotness is GPUs. Wes Faler gave a presentation at the recent 28th Chaos Communication Congress on Evolving Custom Communication Protocols. He mentioned they ported C++ code over to the GPU. The unoptimized version was 7 times slower on the GPU than the CPU. However, they do many evaluations using the same function, and because there are so many compute threads in the GPU, the overall time was a factor of 7 faster. Similarly, Haque et al. showed that a 4 core desktop machine, properly tuned, was "only" about 5x slower than a GPU card.
It looks like GPU computing is currently the approach to take if you do a lot of evaluation of similar tasks, assuming you have the GPUs and programming time available. That performance (and the novel way of computing) interests people who might otherwise use OpenMP.
Cloud computing is another hotness. Alex Martelli was recently interviewed by Larry Hastings in Radio Free Python episode #2. At 33:47 Larry asked about Python's global interpreter lock and Alex's reply was:
I hate threading anyway. Multiprocessing is the way to go, and message-passing, not shared memory. That just doesn't scale. I use multithreading so I can use all of my 16 cores, or whatever is the average number of cores in a machine these days. Big furry deal. I've got a few thousand servers waiting for me in the data center and how do I use those with threading?The topic comes up several times in the ensuing discussion.
What good indeed is OpenMP, which might be used for a 16 node machine, if you're working on problems which involve 10,000 distributed servers?
Even single nodes have multiple cores these days, and a good OpenMP implemenation might help make good use of the nodes in that cloud. However, you have to compare OpenMP to traditional POSIX multithreading. OpenMP works for C/C++ and Fortran, but not for Python nor (it seems) Java, nor other languages which support pthreads. You're out of luck if you want to use OpenMP with one of those other languages.
Some things scale up wonderfully well by adding one or two OpenMP directives, but parallelism is rarely as trivial as giving a few hints to the compiler. I think that the non-trivial cases of parallelizing with OpenMP are about as much work as using pthreads, or a system like Grand Central Dispatch. I'll work through an example of doing that in my next essay.
I do believe that OpenMP scales better than these alternatives for some cases, in part because the compiler is doing the work rather than using a library API. My tests so far show that pthreads and OpenMP have about the same scaling with two processors, and I need four or more cores to show a strong OpenMP advantage.
Most desktop/laptop computers just don't yet have 8+ cores. (Alex Martelli said otherwise, but perhaps he's talking about Google's data centers.) Most people develop for their own computers, which lessens the incentive to work on good multicore scaling.
I have a four-core machine, and I'm willing to write a Python extension in C which uses OpenMP. Even then I've run into some difficulties. It took a while but I figured out how to configure Python's setup.py so it includes the right "use OpenMP" flag for each compiler. It includes a hard-coded list of compilers which do and do not support OpenMP. Also, did you know that on a Mac you must run OpenMP tasks in the main thread, and not in a pthread? Otherwise your program crashes; even when you have a single OpenMP thread! I had to figure out a workaround so I could use my library unchanged inside Django.
People are interested in OpenMP development, but some who might use OpenMP are drawn to other technologies. Some tasks are very appropriate for OpenMP, but they are almost as appropriate for other, more common technologies. OpenMP scales well, but most people don't have the hardware where OpenMP shines. Even when they do, they have to work in one of a handful of languages, and in somewhat restricted circumstances.
All these contribute to diminishing OpenMP utilization in the community.
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2010 Dalke Scientific Software, LLC.