Results 1 to 10 of 15
Hello
I am trying to allocate a bucket of hashes in advance and then use pointers from these hashes as and when needed.
let me explain
I have a mainHash ...
- 01-14-2008 #1Just Joined!
- Join Date
- Jan 2008
- Posts
- 22
Memory allocation for GHash table
Hello
I am trying to allocate a bucket of hashes in advance and then use pointers from these hashes as and when needed.
let me explain
I have a mainHash which i make like so
mainHash = g_hash_table_new(g_str_hash, g_str_equal);
Where the key is a string and the value is a pointer to another Hash.
Now everytime I get a record, i do a lookup in the mainHash and if it is found then I go to the Hash pointed to by the value and then insert the record into that Hash. If not found I create a new hash and then insert this into the mainHash and then the record into this newly created hash.
What i wish to do is however, create the mainHash such that it contains lets say 5000 pointers to unused Hashes. Now whenever i get a record, i wish to do a lookup on the mainHash and if not found then use one of those pointers from the mainHash and decrement the number of available pointers by 1. Ie allocate memory to a bucket of hashes and then use that bucket whenever i need.
Right now this is what i am doing :
mainHash = g_hash_table_new(g_str_hash, g_str_equal);
.....
....
..
<GOT NEW RECORD>
GHashTable *recHash;
if ((recHash =(GHashTable*) g_hash_table_lookup(mainHash, key)) == NULL)
{
// Look for an entry in the mainHash for this symbolUserId ,
// if not found then create a new hash with key=symbolUserId, value = pointer to new hash
// Insert this symbolUserId into the mainHash
//Memory mgmt - take a memory from previously allocated mem
//Create new hash
recHash = g_hash_table_new(g_str_hash, g_str_equal);
//Insert into mainHash
g_hash_table_insert(mainHash, g_strdup(key), (gpointer)recHash);
//Insert record into new hash
g_hash_table_insert(recHash,strdup(refKey),g_strdu p(refKeys));
}
else
{
// if an entry is found in the mainHash, then use the value of the found
// key(symbolUserId) and insert this object into that hash
//Insert record into found hash
g_hash_table_insert(recHash, g_strdup(refKey), g_strdup(refKeys));
}
Can someone tell me how to allocate the bucket and then use hashes from that bucket.
thanks.
- 01-15-2008 #2Just Joined!
- Join Date
- Jan 2008
- Posts
- 22
Can someone help me with this please. Even simple ideas or pieces of code will do.
Thanks
- 01-15-2008 #3
It looks like the gnome people have made it very easy to do hashing, but (from the little that I've read) they don't seem to have provided easy hooks to do exactly what you want.
I'm thinking that the reason that they didn't is that they saw now reason for it.
They might have a point.
What purpose do you have for allocating an array of these guys ahead of time?--
Bill
Old age and treachery will overcome youth and skill.
- 01-16-2008 #4Just Joined!
- Join Date
- Jan 2008
- Posts
- 22
I dont want to have to keep allocating memory when the process is running as this will slow down my process, instead if I can allocate a chunk when the process comes up it will save me time and also guarantee a specific amount of memory.
Also the machine that I intend to run the process on gets quite crowded later in the day, and so if I just finish my allocations during the beginning when the process is scheduled to come up I will save myself a whole lot of trouble.
Also this serves as a benchmark for memory usage for me. I can send myself usage alerts.
- 01-16-2008 #5Two rules for optimization:if I can allocate a chunk when the process comes up it will save me time
- Do not optimize.
- For experts: Do not optimize yet.
Seriously, compared to whatever else goes on, it's unlikely that allocating memory for each leaf of the hash is not likely to take a large percentage of the time.
And:
The only time you'll run out of memory is when the system runs out of swap space. If you really anticipate running out of swap space because you need that much memory, then you (and the others using the system at the same time) will encounter severe, severe speed problems before then, just because the swap partition is being thrashed to death.guarantee a specific amount of memory
But:
Now there's a good reason! And it shows a real curiosity for what goes on under the hood. You'll need that curiosity, because to get what you'll want, you'll need to hack GLib. Since you don't seem to own the system on which you'll be running, you need to perform the following steps.Also this serves as a benchmark for memory usage for me. I can send myself usage alerts.
- Download the source for GLib from here. Pick whichever version you want. You probably don't want a bleeding edge one.
- Take a gander at the GLib documentation. At the top are instructions for compiling the source, and for linking your programs to it.
- For your reference, here's a handy GHash cheat sheet.
- Once you've gotten your program to work with your own copy of GHash, dive into the source, figure out how it's constructed, and add a feature by which you can specify how and when you allocate your own memory, rather than letting GHash do it.
- Profit!
I told you you'd need a great amount of curiosity for this! :)
Hope this helps.--
Bill
Old age and treachery will overcome youth and skill.
- 01-16-2008 #6Just Joined!
- Join Date
- Jan 2008
- Posts
- 22
This method is something that I would like to eventually explore. But as of right now I have to come up with some way to do the same without increasing the binary size and also a quicker solution.
Is this a possibility - Allocate an array of size = x*sizeof(GHashTable). Now every time i want to create a new hash first I assign it an array value and then initialize it. The problem is that I do not know how to intialize the GHashTable, ie do what the g_hash_table_new(g_str_hash, g_str_equal) does without the memory alloc (guess this is why you recommended the hack into GLib, to find this out).
- 01-16-2008 #7
Wait. I just took a look at your original code.
Your main hash is going to have, as its data, pointers to all these subsidiary hashes, right?
What are the keys in the main hash to all these subsidiary hashes?
And what are the keys in each of the subsidiary hashes?
If the key to each entry in the main hash is also used as the key in the corresponding subsidiary hash, then each subsidiary hash will have exactly one member. What's the percentage in that?
I'm confused here.--
Bill
Old age and treachery will overcome youth and skill.
- 01-16-2008 #8Just Joined!
- Join Date
- Jan 2008
- Posts
- 22
That is rightour main hash is going to have, as its data, pointers to all these subsidiary hashes, right?
But the keys to the mainHash differ from those of the subsidiary hashes. I am basically doing that as a way of optimization to minimize my lookup (times).
Both the mainHash and subsidiary hashes use string keys but different ones.
For ex :
Every time i get a new record , i construct a key and lookup the mainHash for this, if found then I go the hash pointed to by the value and then insert the record there.
If not found then I create a new hash and insert the record into the new hash and insert the pointer to this hash with the constructed key into the mainHash.
I do some lookups in the subsidiary hashes every time i do an insert and then do some subsequent processing.
- 01-17-2008 #9
I'd just love to hijack this thread and ask you all sorts of questions about the nature of the keys, but before I do that, let me answer this question:
If by quicker you mean faster to code, then there ain't no magic bullet. Sorry. As far as I know, you'll have to enter hack mode.This method is something that I would like to eventually explore. But as of right now I have to come up with some way to do the same without increasing the binary size and also a quicker solution.
This is an invitation for any lurker to dive right in and tell me I'm <BLINK!!!!!!1111!>WRONG</BLINK>.
Now. I have a some thoughts related to scale, because they are related to performance.
I guess you know that if you could estimate ahead of time how many subsidiary hashes you'd have, you wouldn't be in this situation.
And let's say you could estimate that. Let's make the estimate, say, 100. That would be 100 subsidiary hash tables. It would be cheaper in the long run to have a single hash table with 100 times the number of buckets. (I'm not using the word "buckets" in the same sense you did; I'm using it in the technical hash table sense.)
And I look at GHash, and find that they don't even allow you to specify how many buckets in the hash table. That's not so good. If you end up hacking the source, that's the first place I'd tweak.
Another aspect to this situation, though, is that if you have a gazillion entries, you're going to run out of RAM and start swapping. And then your whole performance situation gets all shot to heck. There's no straightforward way to optimize the swapping so that it "knows" what you're most likely to want in main memory. In that case, toss GHash out the window. Don't use a hash. Use a real live disk file, and B-trees. That will tend to keep the most often used parts of data in main memory.
I don't know whether this helps, but there ya go.--
Bill
Old age and treachery will overcome youth and skill.
- 01-17-2008 #10Just Joined!
- Join Date
- Jan 2008
- Posts
- 22
My app grows quite fast.
In about an hour top shows me a value of 1GB under RSS and Size. Besides I have an mmapped file of about 200MB which I use for recoveries (when I need to restart after a crash).
After about 2.2GB in the RSS in top my process crashes. Some instances go upto 2.7GB.
I used gdb and found "Cannot allocate 1024 bytes for ...."
What is happenning here? The 2GB -32 bit,limit ?


Reply With Quote