Results 1 to 2 of 2
Hello. My colleague needs BLAS for his scientific program. It performs numerous linear algebra operations and takes hours to run, so every bit of optimization would be appreciated. He runs ...
- 03-18-2007 #1Just Joined!
- Join Date
- Aug 2005
- Posts
- 66
Blas, Atlas, Acml
Hello. My colleague needs BLAS for his scientific program. It performs numerous linear algebra operations and takes hours to run, so every bit of optimization would be appreciated. He runs it on AMD X2 3800+. Would ATLAS provide performance much higher than standard BLAS? What performance gains can we expect from native AMD's ACML? How to install ACML? Is it possible to use header files that come with std. BLAS with optimized libs that come with ACML?
- 03-18-2007 #2Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
I finished a project last year that needed the blas. I agree with your approach that looks for the best software. As you know there are several issues at work in situations like this.
First of all, there is your algorithm. The best use of your time is usually to find the best algorithm for the basic problem. Then one needs to identify which might be the best library to use for the mechanisms that you don't wish to write. I thought the Atlas library project didn't look current, but that may be a bad recollection on my part.
At the latter stage, and for some of your specific questions, using a Google search for "compare Blas, Atlas, Acml" produces a small set of around 1K hits. There are some that seem directly on point. One which includes a number of anecdotal one-liner accounts is http://icl.cs.utk.edu/lapack-forum/s...lay-q.cgi?q=5l
There is also the issue of compilers. I used a proprietary compiler, the Portland Group system, because that's what the client wanted. There are a lot of optimization options, and one can spend a lot of time trying them out.
In my project, the single area of optimization that made the most difference was in the IO scheme. Because the datasets were so large (presumably like yours), I looked at striped disks ("RAID0"), memory (lots of it, ramdisks, etc.), and really fast IO hardware. (The Intel guys said of problems like this, "we just throw hardware at it"
). I used an AMD-based computer to start, but ended up on a machine that was built as a fast file server -- the CPU was a bit slower, but it could handle 12 GB of memory, and had really good IO characteristics.
If you were paying for CPU time at a commercial center, then perhaps decreasing the CPU time would be best. However, if this is running on a box that will just sit in the corner of your office / lab and crank away, then real time might be the area to look at (which probably means IO). In almost any case, for long-running problems, you need to be able to save and restart from checkpoints.
Best wishes ... cheers, drl
( edit 1: typo )Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )


Reply With Quote