Find the answer to your Linux question:
Results 1 to 4 of 4
I was in a computer architecture class last semester, and the last chapter was about all this multiprocessor stuff. It's quite confusing and I didn't really learn it that well. ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2009
    Posts
    7

    what's the big deal with parallelism?


    I was in a computer architecture class last semester, and the last chapter was about all this multiprocessor stuff. It's quite confusing and I didn't really learn it that well.

    What's the big deal about it? Do multiple cores really provide the speedup they promise? Is task level parallelism a good idea, or do the separate processes need to like communicate with each other a lot?

    In that textbook I also read the section about GPUs. It was mostly nvidia buzzwords and didn't explain anything very well. Sounds like a lot or proprietary voodoo that only the top maybe five computer architecture people in the world really understand. Sounds like nvidia is also way ahead of any other cpu/gpu maker, including intel amd or even the government maybe lol if they do that sort of thing.

    And how does vector processing play into things? How many applications really rely on vector processing? I know a lot of this is geared at graphics and sometimes sound, but do they have much benefit to normal processing? Will it change how programming is done like in a serious way?

    I guess this is kind of like the change from learning about calculus in a scalar way (Calc 1 and early Calc 2) to transitioning to a vector way of thinking (vector calculus). Which still is more complicated, I understand calculus in a scalar perspective, but I forgot a lot of those vector calc formulas, and how they relate to a scalar math world. I wish math the way it's taught was more generalized to account for vector and matrix (arbitrary R^n).

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,596
    Great post Jason! Excellent questions. For the first item, computer architecture is like standards - everyone has them, and they are all different!

    Anyway, multiprocessors and multicore systems (basically the same thing) allow you to do more than one thing at the same time, at least to the extent that the rest of the hardware doesn't gate the performance on access to common peripherals such as memory, I/O, etc. Modern systems are getting quite good about that. Speedup from a single cpu/core system to a multicpu/core system is not linear, typically, but close to it. I have a dual CPU, 8 core workstation, and it allows me to do a lot more background stuff than I would be able to otherwise. However, memory and I/O are major gating factors. My system is about 3 years old, and was state-of-the-art when I built it. Newer systems, out in the past year, are far superior in concurrent memory access and I/O than mine is, and would probably give me a 2-4x speedup in overall performance on my typical work mix for equivalent hardware (8 cores, 8GB RAM, 8TB sata disc space).

    As for parallelism, task level parallelism is the most common situation, but I have found that appropriate use of thread level parallelism can really speed up some applications, such as data searching/retrieval and audio video processing. You have to evaluate your work loads to determine if they are conducive to a parallel approach.

    GPU for number crunching? Everything I have read indicates that this can provide AWESOME speedups when doing major number crunching tasks that can be broken into small bits of work. Consider that a medium level nVidia GPU has 128-256 or more math cores. Most of the top super computers in the world today use these for their "heavy lifting" when doing things like environmental modeling, fluid dynamics, etc. And yes, nVidia seems to be at the forefront of this work.

    Vector processing? Man, I haven't played with a VPU for many years (about 20 I think). They serve(d) a similar purpose to the GPU's we were discussing in dealing with crunching numbers for algorithms that are suited to parallel processing, such as matrix math, Fourier transformations, etc. I've kind of lost touch with current developments in that field, so this is as much as I can tell you. As for their changing how programming is done in a serious way, I think that they did, and a lot of the lessons learned were taken in by groups like nVidia.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Jan 2007
    Posts
    15
    That was a very good reply.

    Multiple procs and cores can and do speed things up, but it really depends on the situation.

    Think about a computer that only does one thing. It sends a command to the processor and waits for a reply. In a single processor/core world, this is serial. The processor does one thing at a time. Data gets loaded into a cache and is processed one command at a time. Each command has to wait for the previous to finish.

    If you want to run more than one program, the commands have to take turns on the processor, which obviously slows things down.

    Now I can speed this up by adding a second processor. If one processor is busy, I can now send my new instructions to the one that isn't. In this fashion, my two running programs are sped up relative to the time it would take them to run simultaneously on a single core machine, but not compared to their speed running individually on a single core machine. So this situation works well for multiple processes, like you would encounter in a typical desktop environment.

    What happens if we have one process that needs a lot of processor time? Well, if the process is written in such a way that it can be parallelized, we can farm out pieces to the two procs. What's the problem with this?

    Consider I'm running a model with 2 grid cells. To run the model, I start with the initial state, and run calculations on each grid cell, then calculate their interactions between them. I take that final state as my new initial state and start all over. I could easily farm out the calculation of each grid cell to a unique processor, but I can't deal with their interaction, because each processor is unaware of the results of it's partner. So I need to recombine the result from the procs to go forward. This has a cost in terms of overhead. In massive parallel systems, like the large super computers, this communication can be over network or some other protocol....regardless it is the bottle next in the system. This is why you see parallel computing performance level off as the number of nodes increases. Eventually the communication penalty between processors dominates the gains from adding another node.

    So how can I speed this up? In the parallel example above, the processors talk though a shared memory bus (if the procs are on the same board) or over some system, if it's a distributed cluster. If we can move this shared memory bus to the actual chip die, we reduce the time it takes for the procs to talk to each other. This is how multi core procs work. We now have multiple cores with their own L1 cache (to locally store info to be processed), but share an L2 cache.

    This greatly speeds up applications that are programmed to use multiple cores. However, keep in mind, not all applications are built to run parallel and not all data processing jobs do well in the parallel environment. Even on a multicore processor, a program that does a ton of little calculations will be limited by how fast the data can be transfered to the cores.

    Now enter GPU's. GPU's are like multicore processors on crack. Instead of using a few large cores, use a few hundred smaller ones, and connect them to a shared memory with a ridiculously fast bus. Now I can hand out very small pieces of data over a very fast connection to a ton of processors. This is ridiculously fast for things like matrix math and fourier transforms, where I can literally send each core on the gpu a single element of the matrix and have it run an operation. Combine this with PCIE, which gives super high bandwidth between the main system and the graphics cards and you have a way to process very large amounts of data very quickly. The downside is, the processors aren't good for very complex or involved tasks. So if you can't break the process down into a billion very simple, simultaneous instructions, GPS's aren't very useful, and a more traditional multicore system would be faster.

    The reason nVidia is at the forefront right now is because they are the first to release an sdk to access their gpus. Anyone with a CUDA based card and download it and start writing c or fortran code to run jobs on the GPUs. Matlab (as of r2010b) has even picked up native support for the CUDA libraries. They have also started releasing multiple GPU cards targeted specifically to scientific computing (their Tesla line).

    My field of research deals with digital hologram reconstruction, which involves a ton of fourier transforms. When this technology first came out, we bought a tesla cluster (2 cards sli'd with 4 cores each), and it several times faster than our traditional cluster running 3 8 core Xenon nodes.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,596
    Re. Vortmax:

    Also a good reply. The main point here is, I think, that a combination of multi-core/multi-threaded CPU's like Intel/AMD/ARM chips in conjunction with GPU's like nVidias provide us with serious performance multipliers. Send the work to the unit best suited to process it.

    As the Buddha once said, may you live in interesting times! That we do!
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •