Results 1 to 7 of 7
hey everyone. i need your opinion regarding optimization of a code.
FYI, i use a standard desktop PC. single-processor. 512mb ram.
i am experimenting on rotating a single 2D image ...
- 01-15-2008 #1Just Joined!
- Join Date
- Jan 2008
- Location
- Philippines
- Posts
- 10
need your opinion: how to make a program run faster
hey everyone. i need your opinion regarding optimization of a code.
FYI, i use a standard desktop PC. single-processor. 512mb ram.
i am experimenting on rotating a single 2D image from 0 to 360 degrees.
without any parallelism/concurrency whatsoever, this kind of program which has a loop structure of:
...runs fine (and fast). Now, i extended this code to rotate a volume-a 3D object which has a loop structure of:HTML Code:for(degree=0,degree<=360;degree++){ for(x=0,x<=width;x++){ for(y=0,y<=height;y++){ .....rotate here and print frame here }//for y }//for x }//for degree
...and because of this structure, this code has a running time of O(n^4). printing each frame takes as much as 5 to 6 minutes! imagine the total of 361 -> [0,360] frames*6 minutes!HTML Code:for(degree=0,degree<=360;degree++){ for(x=0,x<=width;x++){ for(y=0,y<=height;y++){ for(z=0,z<=depth;z++){ .....rotate here and print frame here }//for z }//for y }//for x }//for degree
so, i seeked some suggestions and tried them (people from this forum have been helping me with this):
1.) pthreads
pthread_create,pthread_join/pthread_detach
for this suggestion, i experimented with 2D rotation first.
for each degree, i create a thread (total of 361 threads, a massive amount!)
program ran just fine but at the end, i hit segmentation fault. (with just 3 loops)
2.) fork()
i experimented with 2D rotation first and it worked fine. i was able to print all 361 images.
i extended it to 3D rotation. it worked fine and was able to print the first frame but it took 5-6 minutes also.so, i terminated it later on.
i experimented to simplify my 3d code by just printing a number so that i could see if it will be able to create all the 361 threads. and then, i hit segmentation fault.
i've expected this result from fork() cause it's more expensive than threads. but with this massive amount of threads with these available resources, i still can't meet my goal of making my code run faster. ive written my code as simple as it can be, but it is really slow.
someone said that it is not really recommended for me to use pthreads in this case, and that fork() is more expensive. your opinions and suggestions will be of a very BIG help.
- 01-15-2008 #2I am very relieved to hear that! :)i am experimenting on rotating a single 2D image from 0 to 360 degrees.
without any parallelism/concurrency whatsoever
Well, fork() is more expensive, all right, but that might not really matter. Let's look more closely at this:someone said that it is not really recommended for me to use pthreads in this case, and that fork() is more expensive.
The question here is: what was happening during those 5-6 minutes? Was it the computing of the first frame, or was it all those fork()s going on at the same time?2.) fork()
i experimented with 2D rotation first and it worked fine. i was able to print all 361 images.
i extended it to 3D rotation. it worked fine and was able to print the first frame but it took 5-6 minutes also.so, i terminated it later on.
Try taking that version of the program and crippling it so it still fork()s as it did, but only processes the case of 0 degrees, and then stops. See how long that processing of the first image takes. If it takes only a second or two, fine. But if it takes 5-6 minutes, then fork() takes an extremely low percentage of your computation, so it isn't expensive at all.
There are two rules for optimization:
- Do not optimize.
- For experts: Do not optimize yet.
See, here's what concerns me:
A segmentation fault is not an indication that processing is expensive. It is an indication that there is a bug. It's important to fix all known bugs (and this is one) before trying to optimize. Why? Because it could be that the reason for the slowdown is the bug itself.i experimented to simplify my 3d code by just printing a number so that i could see if it will be able to create all the 361 threads. and then, i hit segmentation fault.
i've expected this result from fork() cause it's more expensive than threads
You say that when you tried fork(), you just printed a number so you see whether you could create all 361 ... processes, I guess, because you were using fork(). But I tried the same thing, and got no segmentation fault. It worked ... um ... faultlessly, and the compute time was way, way under one second.
Here's my code:
Here's the output:Code:#include <sys/types.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(void) { int degrees; int the_pid; for(degrees=0; degrees<=360; degrees++ ) { the_pid=fork(); if(the_pid<0) { printf("fork() error\n"); break; /* <--------- */ } if(the_pid==0) { printf("processing for %d degree%s\n", degrees, "s"+(degrees==1) ); exit(0); } } return 0; } /* main() */
Unless you're doing the fork() wrong, there's some other bug in your code. It is possible that this bug causes the fork() situation to blow up with a segment violation, and that it causes the non-pthread, non-fork() situation to take a long time.Code:wally:~/tuesday/1$ 1 processing for 0 degrees processing for 1 degree processing for 2 degrees processing for 3 degrees processing for 4 degrees processing for 5 degrees processing for 6 degrees processing for 7 degrees processing for 8 degrees processing for 9 degrees processing for 10 degrees processing for 11 degrees processing for 12 degrees processing for 13 degrees processing for 14 degrees processing for 15 degrees processing for 16 degrees processing for 17 degrees processing for 18 degrees processing for 19 degrees processing for 20 degrees processing for 21 degrees processing for 22 degrees processing for 23 degrees processing for 24 degrees processing for 25 degrees processing for 26 degrees processing for 27 degrees processing for 28 degrees processing for 29 degrees processing for 30 degrees processing for 31 degrees processing for 32 degrees processing for 33 degrees processing for 34 degrees processing for 35 degrees processing for 36 degrees processing for 37 degrees processing for 38 degrees processing for 39 degrees processing for 40 degrees processing for 41 degrees processing for 42 degrees processing for 43 degrees processing for 44 degrees processing for 45 degrees processing for 46 degrees processing for 47 degrees processing for 48 degrees processing for 49 degrees processing for 50 degrees processing for 51 degrees processing for 52 degrees processing for 53 degrees processing for 54 degrees processing for 55 degrees processing for 56 degrees processing for 57 degrees processing for 58 degrees processing for 59 degrees processing for 60 degrees processing for 61 degrees processing for 62 degrees processing for 63 degrees processing for 64 degrees processing for 65 degrees processing for 66 degrees processing for 67 degrees processing for 68 degrees processing for 69 degrees processing for 70 degrees processing for 71 degrees processing for 72 degrees processing for 73 degrees processing for 74 degrees processing for 75 degrees processing for 76 degrees processing for 77 degrees processing for 78 degrees processing for 79 degrees processing for 80 degrees processing for 81 degrees processing for 82 degrees processing for 83 degrees processing for 84 degrees processing for 85 degrees processing for 86 degrees processing for 87 degrees processing for 88 degrees processing for 89 degrees processing for 90 degrees processing for 91 degrees processing for 92 degrees processing for 93 degrees processing for 94 degrees processing for 95 degrees processing for 96 degrees processing for 97 degrees processing for 98 degrees processing for 99 degrees processing for 100 degrees processing for 101 degrees processing for 102 degrees processing for 103 degrees processing for 104 degrees processing for 105 degrees processing for 106 degrees processing for 107 degrees processing for 108 degrees processing for 109 degrees processing for 110 degrees processing for 111 degrees processing for 112 degrees processing for 113 degrees processing for 114 degrees processing for 115 degrees processing for 116 degrees processing for 117 degrees processing for 118 degrees processing for 119 degrees processing for 120 degrees processing for 121 degrees processing for 122 degrees processing for 123 degrees processing for 124 degrees processing for 125 degrees processing for 126 degrees processing for 127 degrees processing for 128 degrees processing for 129 degrees processing for 130 degrees processing for 131 degrees processing for 132 degrees processing for 133 degrees processing for 134 degrees processing for 135 degrees processing for 136 degrees processing for 137 degrees processing for 138 degrees processing for 139 degrees processing for 140 degrees processing for 141 degrees processing for 142 degrees processing for 143 degrees processing for 144 degrees processing for 145 degrees processing for 146 degrees processing for 147 degrees processing for 148 degrees processing for 149 degrees processing for 150 degrees processing for 151 degrees processing for 152 degrees processing for 153 degrees processing for 154 degrees processing for 155 degrees processing for 156 degrees processing for 157 degrees processing for 158 degrees processing for 159 degrees processing for 160 degrees processing for 161 degrees processing for 162 degrees processing for 163 degrees processing for 164 degrees processing for 165 degrees processing for 166 degrees processing for 167 degrees processing for 168 degrees processing for 169 degrees processing for 170 degrees processing for 171 degrees processing for 172 degrees processing for 173 degrees processing for 174 degrees processing for 175 degrees processing for 176 degrees processing for 177 degrees processing for 178 degrees processing for 179 degrees processing for 180 degrees processing for 181 degrees processing for 182 degrees processing for 183 degrees processing for 184 degrees processing for 185 degrees processing for 186 degrees processing for 187 degrees processing for 188 degrees processing for 189 degrees processing for 190 degrees processing for 191 degrees processing for 192 degrees processing for 193 degrees processing for 194 degrees processing for 195 degrees processing for 196 degrees processing for 197 degrees processing for 198 degrees processing for 199 degrees processing for 200 degrees processing for 201 degrees processing for 202 degrees processing for 203 degrees processing for 204 degrees processing for 205 degrees processing for 206 degrees processing for 207 degrees processing for 208 degrees processing for 209 degrees processing for 210 degrees processing for 211 degrees processing for 212 degrees processing for 213 degrees processing for 214 degrees processing for 215 degrees processing for 216 degrees processing for 217 degrees processing for 218 degrees processing for 219 degrees processing for 220 degrees processing for 221 degrees processing for 222 degrees processing for 223 degrees processing for 224 degrees processing for 225 degrees processing for 226 degrees processing for 227 degrees processing for 228 degrees processing for 229 degrees processing for 230 degrees processing for 231 degrees processing for 232 degrees processing for 233 degrees processing for 234 degrees processing for 235 degrees processing for 236 degrees processing for 237 degrees processing for 238 degrees processing for 239 degrees processing for 240 degrees processing for 241 degrees processing for 242 degrees processing for 243 degrees processing for 244 degrees processing for 245 degrees processing for 246 degrees processing for 247 degrees processing for 248 degrees processing for 249 degrees processing for 250 degrees processing for 251 degrees processing for 252 degrees processing for 253 degrees processing for 254 degrees processing for 255 degrees processing for 256 degrees processing for 257 degrees processing for 258 degrees processing for 259 degrees processing for 260 degrees processing for 261 degrees processing for 262 degrees processing for 263 degrees processing for 264 degrees processing for 265 degrees processing for 266 degrees processing for 267 degrees processing for 268 degrees processing for 269 degrees processing for 270 degrees processing for 271 degrees processing for 272 degrees processing for 273 degrees processing for 274 degrees processing for 275 degrees processing for 276 degrees processing for 277 degrees processing for 278 degrees processing for 279 degrees processing for 280 degrees processing for 281 degrees processing for 282 degrees processing for 283 degrees processing for 284 degrees processing for 285 degrees processing for 286 degrees processing for 287 degrees processing for 288 degrees processing for 289 degrees processing for 290 degrees processing for 291 degrees processing for 292 degrees processing for 293 degrees processing for 294 degrees processing for 295 degrees processing for 296 degrees processing for 297 degrees processing for 298 degrees processing for 299 degrees processing for 300 degrees processing for 301 degrees processing for 302 degrees processing for 303 degrees processing for 304 degrees processing for 305 degrees processing for 306 degrees processing for 307 degrees processing for 308 degrees processing for 309 degrees processing for 310 degrees processing for 311 degrees processing for 312 degrees processing for 313 degrees processing for 314 degrees processing for 315 degrees processing for 316 degrees processing for 317 degrees processing for 318 degrees processing for 319 degrees processing for 320 degrees processing for 321 degrees processing for 322 degrees processing for 323 degrees processing for 324 degrees processing for 325 degrees processing for 326 degrees processing for 327 degrees processing for 328 degrees processing for 329 degrees processing for 330 degrees processing for 331 degrees processing for 332 degrees processing for 333 degrees processing for 334 degrees processing for 335 degrees processing for 336 degrees processing for 337 degrees processing for 338 degrees processing for 339 degrees processing for 340 degrees processing for 341 degrees processing for 342 degrees processing for 343 degrees processing for 344 degrees processing for 345 degrees processing for 346 degrees processing for 347 degrees processing for 348 degrees processing for 349 degrees processing for 350 degrees processing for 351 degrees processing for 352 degrees processing for 353 degrees processing for 354 degrees processing for 355 degrees processing for 356 degrees processing for 357 degrees processing for 358 degrees processing for 359 degrees processing for 360 degrees wally:~/tuesday/1$
We don't know that. But trying to optimize without fixing all known bugs is like pushing a rope.
In short, I think that fork()ing here will not give you any speedup. But fork()ing will help you find that bug, because that's the only time the segment violation occurs.
Hope this helps.--
Bill
Old age and treachery will overcome youth and skill.
- 01-15-2008 #3Just Joined!
- Join Date
- Jan 2008
- Location
- Philippines
- Posts
- 10
hi. thank you for the pointers. yeah, i agree with you that the segmentation fault im getting from fork()ing maybe from bugs. some of the bugs ive experienced are due to wrong sizes of arrays. iv corrected them anyway. if there any more bugs that i cant see... well, il jz analyse the code again.
btw, thank you for the code. ive written a simple code like that before and sure it is, i didnt hit segfault. got the same output.
and, i also experimented further by, adding another 3nested loops inside:
and so it becomes:HTML Code:if(the_pid==0) { printf("processing for %d degree%s\n", degrees, "s"+(degrees==1) ); exit(0); }
at first i set width=height=depth=5 and it ran fine.HTML Code:if(the_pid==0) { for(x from 0 to (width-1)){ for(y from 0 to (height-1)){ for(z from 0 to (depth-1)){ printf("processing for %d degree%s\n", degrees, "s"+(degrees==1) ); }//for z }//for y }//for x exit(0); }
but when I set them to width=640, height=480, depth=480(dimensions of my volume) .... blag.i hit segfault.
as for the 3d rotation where i printed the first frame for 5-6 minutes... ahm...
il go try what you said and see what happens. =>
- 01-15-2008 #4
Ok, here's the deal.
Your images are 640 by 480 by 480, right? My calculator says that's about 480 megabytes, even if you have only one byte per pixel in your image.
Going back to your previous thread, each creature of type struct spaceAttr has two of these images, an input one and an output one. So we're talking almost a gig right there. Per degree.
So no matter whether you're running flat, fork()ing, or using threads, you're asking to use more than 300 gigs of memory. It's a strong possibility that you don't have 300 gigs of RAM, and possibly even don't have 300 gigs in your swap file.
That being the case, when you do slow down, it's because you haven't yet run out of virtual memory, but you've run out of RAM already, and you're swapping portions of your heap to disk like crazy. A sure performance killer.
And when you get a segmentation fault, chances are you've run out of even swap space, so the memory allocation fails. When you ask for memory and there ain't none, the system responds by giving you a NULL pointer. That's a feature, not a bug. It means that if you don't check for failure to allocate memory but then try to use that nonexistent memory, you'll get a segment violation. That's nature's way of telling you you're doing something wrong.
So. As I said here in the previous thread:
What you'd check for, of course is, whether malloc() returned NULL, and die with an error message if it did.do you check the result of each and every malloc() in your program? I don't see any malloc() in the code you presented, but how about in functions readJpeg() and writeJpeg()?
This is very important. If you aren't already checking the result of each memory allocation, I'd drop everything else and fix that bug first.--
Bill
Old age and treachery will overcome youth and skill.
- 01-17-2008 #5Just Joined!
- Join Date
- Jan 2008
- Location
- Philippines
- Posts
- 10
hey, thank you so much for all the suggestions.
anyway, i happen to think of a new solution (i hope): i just scaled down my input images in proportion to the original dimensions.
the ratio of 640x480 is 1.333. i desired to hav a height of just 200 so i multiplied it w/ 1.333 and my new width will be 267. with these new dimensions, i was able to run my program less than 20 seconds/frame. its much faster. ;D (though,il be able to print all the frames within 1.9 hrs)
one of the things im trying to do here is compute the cubic volume (cm^3) of a real object through those 2D input images. so... with these adjustments and computations in the orig dimensions, maybe il let math and calculus do the work.
- 01-17-2008 #6Excellent!i just scaled down my input images
Woody Allen said, "80% of success is showing up."
In our line of work, 80% of success is thinking outside the box.
You've just done that.
Feel free to come back with further lines of thought on this or other situations.--
Bill
Old age and treachery will overcome youth and skill.
- 01-17-2008 #7Just Joined!
- Join Date
- Jan 2008
- Location
- Philippines
- Posts
- 10
hehe. thank you so much!


Reply With Quote