Find the answer to your Linux question:
Results 1 to 9 of 9
Hi, Wondering if anyone is able to give some assistance with regards to clustering / grid computing. Have spent countless hours researching the topic and its vast, and somewhat keeps ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Sep 2006
    Posts
    7

    Clustering / Grid Computing


    Hi,

    Wondering if anyone is able to give some assistance with regards to clustering / grid computing. Have spent countless hours researching the topic and its vast, and somewhat keeps leading me around in circles as to what the best solution should be and what’s currently active and worth investing time into.

    The actions of the cluster I want to create involve individual systems joining the cluster and contributing their resources to the cause. I have played around successfully with openMosix / clusterKnoppix following this guide firewall.cx/general-topics-reviews/linuxunix-related/openmoisx-linux-supercomputer.html. The problem with this is that OpenMosix is dead and apparently retaken by LinuxPMI, but from what I can see development seems pretty stagnant and the documentation looks fairly none existent. So as much as I like it I’m worried about investing time into something that is no longer supported.

    Believe what I need is a Single System Image operating system as opposed to the likes of just creating a Beowolf cluster?

    There seems to be so many options like:

    • OpenMosix / LinuxPMI
    • Kerrighed
    • XtreemOS
    • OpenSSI
    • Dragonfly BSD
    • Etc…..

    But the solution needs to be as user friendly as possible, at least to the extent as ClusterKnoppix was.

    Is there even such a thing, any ideas?

    Thanks in advance.

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,448
    One popular technology used today is Hadoop/MapReduce for this sort of issue. That is the technology that organizations such as Google (but using homegrown BigTable/MapReduce instead of Hadoop/MapReduce), Amazon.com, Facebook, et al for their big data processing needs.

    We are starting to use that technology to process the 20-50GB of data a day that each of our data centers generates. We used to utilize more traditional distributed processing methods and relational databases to do that, but they just don't scale as well as the new "big data" techniques do.
    Last edited by Rubberman; 05-02-2012 at 04:01 PM.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Jun 2010
    Posts
    3
    It would be hard to suggest a specific technology without knowing what purpose you'd like to use the grid for.

    Based on the fact that you want resources to come and go, which is a characteristic of grid computing, you may want to take a look at Globus.

    Globus is a middleware for the grid computing environment it allows for a uniform access to the computing resources.

    You may need other tools with it such as Nimrod/G, Gridway or Condor/G for the actuall job submission depending on the size of your grid.

    Hadoop/MapReduce is a specific technology used for working with bid data, unlike grid computing technologies which deal more with sharing computing resources.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,448
    Quote Originally Posted by abienkowski View Post
    It would be hard to suggest a specific technology without knowing what purpose you'd like to use the grid for.

    Based on the fact that you want resources to come and go, which is a characteristic of grid computing, you may want to take a look at Globus.

    Globus is a middleware for the grid computing environment it allows for a uniform access to the computing resources.

    You may need other tools with it such as Nimrod/G, Gridway or Condor/G for the actuall job submission depending on the size of your grid.

    Hadoop/MapReduce is a specific technology used for working with bid data, unlike grid computing technologies which deal more with sharing computing resources.
    Actually, the MapReduce part of the Hadoop stack is a distributed/shared computing environment.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Just Joined!
    Join Date
    Sep 2006
    Posts
    7
    Thank you for the replies.

    The actual purpose is to use a solution for some graduates to render large video files that can often tie up their systems for days/weeks even months on end but are often done at different times. A willing alliance of users would pool their PC's together for the greater good and distribute the load and schedule.

    In an ideal world these would all be on the same network, running 1GB eth, with high performance PC's and the same OS like Linux....the truth is though as you can imagine would be a variety of those variables if not more.

    In another ideal world, regardless of what OS they are running they could just simply submit a job and off it goes!

    To keep things simple I was just trying to decide what options could be used at a basic level, i.e., same OS, same network etc something that could be put in place very simply, then grow / develop something from then onwards.

    Lot of variables and wishful thinking I am sure, but without asking you never know.

    With regards to Hadoop (I haven’t managed to research Globus yet) this uses HDFS - now I've only scratched the surface so my understanding is minuet, I can appreciate how the model works for a general cluster using a blocked structured file system, and I could use this if I build a backend cluster for students to attach to, but you wouldn't necessarily want users PC's to become part of the HDFS, but just share compute power, that possible?

    Many thanks.

  6. #6
    Linux Newbie
    Join Date
    Apr 2012
    Posts
    112
    have a look at this link and links therein.

    Pooling heterogeneous resources seems like a good idea in principle, but when we tried this over the university's network (back in my days as a grad student) we saw pretty poor results. Rendering is embarrassingly parallel, i.e. there is no real need for inter-node communication, so your idea might just work.

    Let us know how you get on.

  7. #7
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,448
    One of the things about hadoop and similar cloud/big-data approaches is that you need plenty of network bandwidth between the various nodes - name, data, processing.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  8. #8
    Just Joined!
    Join Date
    Sep 2006
    Posts
    7
    Thanks for the replies, some really useful stuff here - DrQueue looks perfect and one of the biggest hurdles out the way with.

    Anyone familair with BIONIC that SETI uses at //boinc.berkeley.edu/, thinking I can possibly use this - just playing with it at the moment?

  9. #9
    Just Joined!
    Join Date
    Sep 2006
    Posts
    7
    Just come across this burp.renderfarming.net/ which looks like just what I was talking about, its a mixer of BONIC and Render farming, might just be the thing!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •