Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 15
Hello I am trying to allocate a bucket of hashes in advance and then use pointers from these hashes as and when needed. let me explain I have a mainHash ...
  1. #1
    Just Joined!
    Join Date
    Jan 2008
    Posts
    22

    Memory allocation for GHash table

    Hello

    I am trying to allocate a bucket of hashes in advance and then use pointers from these hashes as and when needed.

    let me explain

    I have a mainHash which i make like so
    mainHash = g_hash_table_new(g_str_hash, g_str_equal);
    Where the key is a string and the value is a pointer to another Hash.

    Now everytime I get a record, i do a lookup in the mainHash and if it is found then I go to the Hash pointed to by the value and then insert the record into that Hash. If not found I create a new hash and then insert this into the mainHash and then the record into this newly created hash.

    What i wish to do is however, create the mainHash such that it contains lets say 5000 pointers to unused Hashes. Now whenever i get a record, i wish to do a lookup on the mainHash and if not found then use one of those pointers from the mainHash and decrement the number of available pointers by 1. Ie allocate memory to a bucket of hashes and then use that bucket whenever i need.

    Right now this is what i am doing :

    mainHash = g_hash_table_new(g_str_hash, g_str_equal);
    .....
    ....
    ..
    <GOT NEW RECORD>
    GHashTable *recHash;
    if ((recHash =(GHashTable*) g_hash_table_lookup(mainHash, key)) == NULL)
    {
    // Look for an entry in the mainHash for this symbolUserId ,
    // if not found then create a new hash with key=symbolUserId, value = pointer to new hash
    // Insert this symbolUserId into the mainHash

    //Memory mgmt - take a memory from previously allocated mem

    //Create new hash
    recHash = g_hash_table_new(g_str_hash, g_str_equal);

    //Insert into mainHash
    g_hash_table_insert(mainHash, g_strdup(key), (gpointer)recHash);

    //Insert record into new hash
    g_hash_table_insert(recHash,strdup(refKey),g_strdu p(refKeys));
    }
    else
    {
    // if an entry is found in the mainHash, then use the value of the found
    // key(symbolUserId) and insert this object into that hash

    //Insert record into found hash
    g_hash_table_insert(recHash, g_strdup(refKey), g_strdup(refKeys));

    }


    Can someone tell me how to allocate the bucket and then use hashes from that bucket.

    thanks.

  2. #2
    Just Joined!
    Join Date
    Jan 2008
    Posts
    22
    Can someone help me with this please. Even simple ideas or pieces of code will do.

    Thanks

  3. #3
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    It looks like the gnome people have made it very easy to do hashing, but (from the little that I've read) they don't seem to have provided easy hooks to do exactly what you want.

    I'm thinking that the reason that they didn't is that they saw now reason for it.

    They might have a point.

    What purpose do you have for allocating an array of these guys ahead of time?
    --
    Bill

    Old age and treachery will overcome youth and skill.

  4. #4
    Just Joined!
    Join Date
    Jan 2008
    Posts
    22
    I dont want to have to keep allocating memory when the process is running as this will slow down my process, instead if I can allocate a chunk when the process comes up it will save me time and also guarantee a specific amount of memory.

    Also the machine that I intend to run the process on gets quite crowded later in the day, and so if I just finish my allocations during the beginning when the process is scheduled to come up I will save myself a whole lot of trouble.

    Also this serves as a benchmark for memory usage for me. I can send myself usage alerts.

  5. #5
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    if I can allocate a chunk when the process comes up it will save me time
    Two rules for optimization:
    1. Do not optimize.
    2. For experts: Do not optimize yet.

    Seriously, compared to whatever else goes on, it's unlikely that allocating memory for each leaf of the hash is not likely to take a large percentage of the time.

    And:
    guarantee a specific amount of memory
    The only time you'll run out of memory is when the system runs out of swap space. If you really anticipate running out of swap space because you need that much memory, then you (and the others using the system at the same time) will encounter severe, severe speed problems before then, just because the swap partition is being thrashed to death.

    But:
    Also this serves as a benchmark for memory usage for me. I can send myself usage alerts.
    Now there's a good reason! And it shows a real curiosity for what goes on under the hood. You'll need that curiosity, because to get what you'll want, you'll need to hack GLib. Since you don't seem to own the system on which you'll be running, you need to perform the following steps.
    1. Download the source for GLib from here. Pick whichever version you want. You probably don't want a bleeding edge one.
    2. Take a gander at the GLib documentation. At the top are instructions for compiling the source, and for linking your programs to it.
    3. For your reference, here's a handy GHash cheat sheet.
    4. Once you've gotten your program to work with your own copy of GHash, dive into the source, figure out how it's constructed, and add a feature by which you can specify how and when you allocate your own memory, rather than letting GHash do it.
    5. Profit!

    I told you you'd need a great amount of curiosity for this! :)

    Hope this helps.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  6. #6
    Just Joined!
    Join Date
    Jan 2008
    Posts
    22
    This method is something that I would like to eventually explore. But as of right now I have to come up with some way to do the same without increasing the binary size and also a quicker solution.

    Is this a possibility - Allocate an array of size = x*sizeof(GHashTable). Now every time i want to create a new hash first I assign it an array value and then initialize it. The problem is that I do not know how to intialize the GHashTable, ie do what the g_hash_table_new(g_str_hash, g_str_equal) does without the memory alloc (guess this is why you recommended the hack into GLib, to find this out).

  7. #7
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    Wait. I just took a look at your original code.

    Your main hash is going to have, as its data, pointers to all these subsidiary hashes, right?

    What are the keys in the main hash to all these subsidiary hashes?

    And what are the keys in each of the subsidiary hashes?

    If the key to each entry in the main hash is also used as the key in the corresponding subsidiary hash, then each subsidiary hash will have exactly one member. What's the percentage in that?

    I'm confused here.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  8. #8
    Just Joined!
    Join Date
    Jan 2008
    Posts
    22
    our main hash is going to have, as its data, pointers to all these subsidiary hashes, right?
    That is right

    But the keys to the mainHash differ from those of the subsidiary hashes. I am basically doing that as a way of optimization to minimize my lookup (times).

    Both the mainHash and subsidiary hashes use string keys but different ones.

    For ex :
    Every time i get a new record , i construct a key and lookup the mainHash for this, if found then I go the hash pointed to by the value and then insert the record there.
    If not found then I create a new hash and insert the record into the new hash and insert the pointer to this hash with the constructed key into the mainHash.

    I do some lookups in the subsidiary hashes every time i do an insert and then do some subsequent processing.

  9. #9
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    I'd just love to hijack this thread and ask you all sorts of questions about the nature of the keys, but before I do that, let me answer this question:
    This method is something that I would like to eventually explore. But as of right now I have to come up with some way to do the same without increasing the binary size and also a quicker solution.
    If by quicker you mean faster to code, then there ain't no magic bullet. Sorry. As far as I know, you'll have to enter hack mode.

    This is an invitation for any lurker to dive right in and tell me I'm <BLINK!!!!!!1111!>WRONG</BLINK>.

    Now. I have a some thoughts related to scale, because they are related to performance.

    I guess you know that if you could estimate ahead of time how many subsidiary hashes you'd have, you wouldn't be in this situation.

    And let's say you could estimate that. Let's make the estimate, say, 100. That would be 100 subsidiary hash tables. It would be cheaper in the long run to have a single hash table with 100 times the number of buckets. (I'm not using the word "buckets" in the same sense you did; I'm using it in the technical hash table sense.)

    And I look at GHash, and find that they don't even allow you to specify how many buckets in the hash table. That's not so good. If you end up hacking the source, that's the first place I'd tweak.

    Another aspect to this situation, though, is that if you have a gazillion entries, you're going to run out of RAM and start swapping. And then your whole performance situation gets all shot to heck. There's no straightforward way to optimize the swapping so that it "knows" what you're most likely to want in main memory. In that case, toss GHash out the window. Don't use a hash. Use a real live disk file, and B-trees. That will tend to keep the most often used parts of data in main memory.

    I don't know whether this helps, but there ya go.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  10. #10
    Just Joined!
    Join Date
    Jan 2008
    Posts
    22
    My app grows quite fast.

    In about an hour top shows me a value of 1GB under RSS and Size. Besides I have an mmapped file of about 200MB which I use for recoveries (when I need to restart after a crash).
    After about 2.2GB in the RSS in top my process crashes. Some instances go upto 2.7GB.
    I used gdb and found "Cannot allocate 1024 bytes for ...."

    What is happenning here? The 2GB -32 bit,limit ?

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...