Find the answer to your Linux question:
Results 1 to 7 of 7
hey everyone. i need your opinion regarding optimization of a code. FYI, i use a standard desktop PC. single-processor. 512mb ram. i am experimenting on rotating a single 2D image ...
  1. #1
    Just Joined!
    Join Date
    Jan 2008
    Location
    Philippines
    Posts
    10

    need your opinion: how to make a program run faster

    hey everyone. i need your opinion regarding optimization of a code.
    FYI, i use a standard desktop PC. single-processor. 512mb ram.

    i am experimenting on rotating a single 2D image from 0 to 360 degrees.
    without any parallelism/concurrency whatsoever, this kind of program which has a loop structure of:

    HTML Code:
        for(degree=0,degree<=360;degree++){
               for(x=0,x<=width;x++){
                    for(y=0,y<=height;y++){
    
                         .....rotate here and print frame here
    
                    }//for y
               }//for x
        }//for degree
    ...runs fine (and fast). Now, i extended this code to rotate a volume-a 3D object which has a loop structure of:

    HTML Code:
        for(degree=0,degree<=360;degree++){
               for(x=0,x<=width;x++){
                    for(y=0,y<=height;y++){
                         for(z=0,z<=depth;z++){
    
                                .....rotate here and print frame here
    
                         }//for z
                    }//for y
               }//for x
        }//for degree
    ...and because of this structure, this code has a running time of O(n^4). printing each frame takes as much as 5 to 6 minutes! imagine the total of 361 -> [0,360] frames*6 minutes!

    so, i seeked some suggestions and tried them (people from this forum have been helping me with this):

    1.) pthreads
    pthread_create,pthread_join/pthread_detach
    for this suggestion, i experimented with 2D rotation first.
    for each degree, i create a thread (total of 361 threads, a massive amount!)
    program ran just fine but at the end, i hit segmentation fault. (with just 3 loops)

    2.) fork()
    i experimented with 2D rotation first and it worked fine. i was able to print all 361 images.
    i extended it to 3D rotation. it worked fine and was able to print the first frame but it took 5-6 minutes also.so, i terminated it later on.
    i experimented to simplify my 3d code by just printing a number so that i could see if it will be able to create all the 361 threads. and then, i hit segmentation fault.

    i've expected this result from fork() cause it's more expensive than threads. but with this massive amount of threads with these available resources, i still can't meet my goal of making my code run faster. ive written my code as simple as it can be, but it is really slow.

    someone said that it is not really recommended for me to use pthreads in this case, and that fork() is more expensive. your opinions and suggestions will be of a very BIG help.

  2. #2
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    i am experimenting on rotating a single 2D image from 0 to 360 degrees.
    without any parallelism/concurrency whatsoever
    I am very relieved to hear that! :)
    someone said that it is not really recommended for me to use pthreads in this case, and that fork() is more expensive.
    Well, fork() is more expensive, all right, but that might not really matter. Let's look more closely at this:
    2.) fork()
    i experimented with 2D rotation first and it worked fine. i was able to print all 361 images.
    i extended it to 3D rotation. it worked fine and was able to print the first frame but it took 5-6 minutes also.so, i terminated it later on.
    The question here is: what was happening during those 5-6 minutes? Was it the computing of the first frame, or was it all those fork()s going on at the same time?

    Try taking that version of the program and crippling it so it still fork()s as it did, but only processes the case of 0 degrees, and then stops. See how long that processing of the first image takes. If it takes only a second or two, fine. But if it takes 5-6 minutes, then fork() takes an extremely low percentage of your computation, so it isn't expensive at all.

    There are two rules for optimization:
    1. Do not optimize.
    2. For experts: Do not optimize yet.

    See, here's what concerns me:
    i experimented to simplify my 3d code by just printing a number so that i could see if it will be able to create all the 361 threads. and then, i hit segmentation fault.

    i've expected this result from fork() cause it's more expensive than threads
    A segmentation fault is not an indication that processing is expensive. It is an indication that there is a bug. It's important to fix all known bugs (and this is one) before trying to optimize. Why? Because it could be that the reason for the slowdown is the bug itself.

    You say that when you tried fork(), you just printed a number so you see whether you could create all 361 ... processes, I guess, because you were using fork(). But I tried the same thing, and got no segmentation fault. It worked ... um ... faultlessly, and the compute time was way, way under one second.

    Here's my code:
    Code:
    #include <sys/types.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    
    int main(void)
    {
      int degrees;
      int the_pid;
    
      for(degrees=0;
          degrees<=360;
          degrees++
         )
      {
        the_pid=fork();
    
        if(the_pid<0)
        {
          printf("fork() error\n");
    
          break;   /* <--------- */
        }
    
        if(the_pid==0)
        {
          printf("processing for &#37;d degree%s\n",
                 degrees,
                 "s"+(degrees==1)
                );
    
          exit(0);
        }
      }
    
      return 0;
    
    } /* main() */
    Here's the output:
    Code:
    wally:~/tuesday/1$ 1
    processing for 0 degrees
    processing for 1 degree
    processing for 2 degrees
    processing for 3 degrees
    processing for 4 degrees
    processing for 5 degrees
    processing for 6 degrees
    processing for 7 degrees
    processing for 8 degrees
    processing for 9 degrees
    processing for 10 degrees
    processing for 11 degrees
    processing for 12 degrees
    processing for 13 degrees
    processing for 14 degrees
    processing for 15 degrees
    processing for 16 degrees
    processing for 17 degrees
    processing for 18 degrees
    processing for 19 degrees
    processing for 20 degrees
    processing for 21 degrees
    processing for 22 degrees
    processing for 23 degrees
    processing for 24 degrees
    processing for 25 degrees
    processing for 26 degrees
    processing for 27 degrees
    processing for 28 degrees
    processing for 29 degrees
    processing for 30 degrees
    processing for 31 degrees
    processing for 32 degrees
    processing for 33 degrees
    processing for 34 degrees
    processing for 35 degrees
    processing for 36 degrees
    processing for 37 degrees
    processing for 38 degrees
    processing for 39 degrees
    processing for 40 degrees
    processing for 41 degrees
    processing for 42 degrees
    processing for 43 degrees
    processing for 44 degrees
    processing for 45 degrees
    processing for 46 degrees
    processing for 47 degrees
    processing for 48 degrees
    processing for 49 degrees
    processing for 50 degrees
    processing for 51 degrees
    processing for 52 degrees
    processing for 53 degrees
    processing for 54 degrees
    processing for 55 degrees
    processing for 56 degrees
    processing for 57 degrees
    processing for 58 degrees
    processing for 59 degrees
    processing for 60 degrees
    processing for 61 degrees
    processing for 62 degrees
    processing for 63 degrees
    processing for 64 degrees
    processing for 65 degrees
    processing for 66 degrees
    processing for 67 degrees
    processing for 68 degrees
    processing for 69 degrees
    processing for 70 degrees
    processing for 71 degrees
    processing for 72 degrees
    processing for 73 degrees
    processing for 74 degrees
    processing for 75 degrees
    processing for 76 degrees
    processing for 77 degrees
    processing for 78 degrees
    processing for 79 degrees
    processing for 80 degrees
    processing for 81 degrees
    processing for 82 degrees
    processing for 83 degrees
    processing for 84 degrees
    processing for 85 degrees
    processing for 86 degrees
    processing for 87 degrees
    processing for 88 degrees
    processing for 89 degrees
    processing for 90 degrees
    processing for 91 degrees
    processing for 92 degrees
    processing for 93 degrees
    processing for 94 degrees
    processing for 95 degrees
    processing for 96 degrees
    processing for 97 degrees
    processing for 98 degrees
    processing for 99 degrees
    processing for 100 degrees
    processing for 101 degrees
    processing for 102 degrees
    processing for 103 degrees
    processing for 104 degrees
    processing for 105 degrees
    processing for 106 degrees
    processing for 107 degrees
    processing for 108 degrees
    processing for 109 degrees
    processing for 110 degrees
    processing for 111 degrees
    processing for 112 degrees
    processing for 113 degrees
    processing for 114 degrees
    processing for 115 degrees
    processing for 116 degrees
    processing for 117 degrees
    processing for 118 degrees
    processing for 119 degrees
    processing for 120 degrees
    processing for 121 degrees
    processing for 122 degrees
    processing for 123 degrees
    processing for 124 degrees
    processing for 125 degrees
    processing for 126 degrees
    processing for 127 degrees
    processing for 128 degrees
    processing for 129 degrees
    processing for 130 degrees
    processing for 131 degrees
    processing for 132 degrees
    processing for 133 degrees
    processing for 134 degrees
    processing for 135 degrees
    processing for 136 degrees
    processing for 137 degrees
    processing for 138 degrees
    processing for 139 degrees
    processing for 140 degrees
    processing for 141 degrees
    processing for 142 degrees
    processing for 143 degrees
    processing for 144 degrees
    processing for 145 degrees
    processing for 146 degrees
    processing for 147 degrees
    processing for 148 degrees
    processing for 149 degrees
    processing for 150 degrees
    processing for 151 degrees
    processing for 152 degrees
    processing for 153 degrees
    processing for 154 degrees
    processing for 155 degrees
    processing for 156 degrees
    processing for 157 degrees
    processing for 158 degrees
    processing for 159 degrees
    processing for 160 degrees
    processing for 161 degrees
    processing for 162 degrees
    processing for 163 degrees
    processing for 164 degrees
    processing for 165 degrees
    processing for 166 degrees
    processing for 167 degrees
    processing for 168 degrees
    processing for 169 degrees
    processing for 170 degrees
    processing for 171 degrees
    processing for 172 degrees
    processing for 173 degrees
    processing for 174 degrees
    processing for 175 degrees
    processing for 176 degrees
    processing for 177 degrees
    processing for 178 degrees
    processing for 179 degrees
    processing for 180 degrees
    processing for 181 degrees
    processing for 182 degrees
    processing for 183 degrees
    processing for 184 degrees
    processing for 185 degrees
    processing for 186 degrees
    processing for 187 degrees
    processing for 188 degrees
    processing for 189 degrees
    processing for 190 degrees
    processing for 191 degrees
    processing for 192 degrees
    processing for 193 degrees
    processing for 194 degrees
    processing for 195 degrees
    processing for 196 degrees
    processing for 197 degrees
    processing for 198 degrees
    processing for 199 degrees
    processing for 200 degrees
    processing for 201 degrees
    processing for 202 degrees
    processing for 203 degrees
    processing for 204 degrees
    processing for 205 degrees
    processing for 206 degrees
    processing for 207 degrees
    processing for 208 degrees
    processing for 209 degrees
    processing for 210 degrees
    processing for 211 degrees
    processing for 212 degrees
    processing for 213 degrees
    processing for 214 degrees
    processing for 215 degrees
    processing for 216 degrees
    processing for 217 degrees
    processing for 218 degrees
    processing for 219 degrees
    processing for 220 degrees
    processing for 221 degrees
    processing for 222 degrees
    processing for 223 degrees
    processing for 224 degrees
    processing for 225 degrees
    processing for 226 degrees
    processing for 227 degrees
    processing for 228 degrees
    processing for 229 degrees
    processing for 230 degrees
    processing for 231 degrees
    processing for 232 degrees
    processing for 233 degrees
    processing for 234 degrees
    processing for 235 degrees
    processing for 236 degrees
    processing for 237 degrees
    processing for 238 degrees
    processing for 239 degrees
    processing for 240 degrees
    processing for 241 degrees
    processing for 242 degrees
    processing for 243 degrees
    processing for 244 degrees
    processing for 245 degrees
    processing for 246 degrees
    processing for 247 degrees
    processing for 248 degrees
    processing for 249 degrees
    processing for 250 degrees
    processing for 251 degrees
    processing for 252 degrees
    processing for 253 degrees
    processing for 254 degrees
    processing for 255 degrees
    processing for 256 degrees
    processing for 257 degrees
    processing for 258 degrees
    processing for 259 degrees
    processing for 260 degrees
    processing for 261 degrees
    processing for 262 degrees
    processing for 263 degrees
    processing for 264 degrees
    processing for 265 degrees
    processing for 266 degrees
    processing for 267 degrees
    processing for 268 degrees
    processing for 269 degrees
    processing for 270 degrees
    processing for 271 degrees
    processing for 272 degrees
    processing for 273 degrees
    processing for 274 degrees
    processing for 275 degrees
    processing for 276 degrees
    processing for 277 degrees
    processing for 278 degrees
    processing for 279 degrees
    processing for 280 degrees
    processing for 281 degrees
    processing for 282 degrees
    processing for 283 degrees
    processing for 284 degrees
    processing for 285 degrees
    processing for 286 degrees
    processing for 287 degrees
    processing for 288 degrees
    processing for 289 degrees
    processing for 290 degrees
    processing for 291 degrees
    processing for 292 degrees
    processing for 293 degrees
    processing for 294 degrees
    processing for 295 degrees
    processing for 296 degrees
    processing for 297 degrees
    processing for 298 degrees
    processing for 299 degrees
    processing for 300 degrees
    processing for 301 degrees
    processing for 302 degrees
    processing for 303 degrees
    processing for 304 degrees
    processing for 305 degrees
    processing for 306 degrees
    processing for 307 degrees
    processing for 308 degrees
    processing for 309 degrees
    processing for 310 degrees
    processing for 311 degrees
    processing for 312 degrees
    processing for 313 degrees
    processing for 314 degrees
    processing for 315 degrees
    processing for 316 degrees
    processing for 317 degrees
    processing for 318 degrees
    processing for 319 degrees
    processing for 320 degrees
    processing for 321 degrees
    processing for 322 degrees
    processing for 323 degrees
    processing for 324 degrees
    processing for 325 degrees
    processing for 326 degrees
    processing for 327 degrees
    processing for 328 degrees
    processing for 329 degrees
    processing for 330 degrees
    processing for 331 degrees
    processing for 332 degrees
    processing for 333 degrees
    processing for 334 degrees
    processing for 335 degrees
    processing for 336 degrees
    processing for 337 degrees
    processing for 338 degrees
    processing for 339 degrees
    processing for 340 degrees
    processing for 341 degrees
    processing for 342 degrees
    processing for 343 degrees
    processing for 344 degrees
    processing for 345 degrees
    processing for 346 degrees
    processing for 347 degrees
    processing for 348 degrees
    processing for 349 degrees
    processing for 350 degrees
    processing for 351 degrees
    processing for 352 degrees
    processing for 353 degrees
    processing for 354 degrees
    processing for 355 degrees
    processing for 356 degrees
    processing for 357 degrees
    processing for 358 degrees
    processing for 359 degrees
    processing for 360 degrees
    wally:~/tuesday/1$
    Unless you're doing the fork() wrong, there's some other bug in your code. It is possible that this bug causes the fork() situation to blow up with a segment violation, and that it causes the non-pthread, non-fork() situation to take a long time.

    We don't know that. But trying to optimize without fixing all known bugs is like pushing a rope.

    In short, I think that fork()ing here will not give you any speedup. But fork()ing will help you find that bug, because that's the only time the segment violation occurs.

    Hope this helps.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  3. #3
    Just Joined!
    Join Date
    Jan 2008
    Location
    Philippines
    Posts
    10
    hi. thank you for the pointers. yeah, i agree with you that the segmentation fault im getting from fork()ing maybe from bugs. some of the bugs ive experienced are due to wrong sizes of arrays. iv corrected them anyway. if there any more bugs that i cant see... well, il jz analyse the code again.

    btw, thank you for the code. ive written a simple code like that before and sure it is, i didnt hit segfault. got the same output.

    and, i also experimented further by, adding another 3nested loops inside:
    HTML Code:
    if(the_pid==0)
        {
          printf("processing for %d degree%s\n",
                 degrees,
                 "s"+(degrees==1)
                );
    
          exit(0);
        }
    and so it becomes:
    HTML Code:
    if(the_pid==0)
        {
           for(x from 0 to (width-1)){
           for(y from 0 to (height-1)){
           for(z from 0 to (depth-1)){
                 printf("processing for %d degree%s\n",
                 degrees,
                 "s"+(degrees==1)
                );
            }//for z
            }//for y
            }//for x
          exit(0);
        }
    at first i set width=height=depth=5 and it ran fine.
    but when I set them to width=640, height=480, depth=480(dimensions of my volume) .... blag.i hit segfault.

    as for the 3d rotation where i printed the first frame for 5-6 minutes... ahm...
    il go try what you said and see what happens. =>

  4. #4
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    Ok, here's the deal.

    Your images are 640 by 480 by 480, right? My calculator says that's about 480 megabytes, even if you have only one byte per pixel in your image.

    Going back to your previous thread, each creature of type struct spaceAttr has two of these images, an input one and an output one. So we're talking almost a gig right there. Per degree.

    So no matter whether you're running flat, fork()ing, or using threads, you're asking to use more than 300 gigs of memory. It's a strong possibility that you don't have 300 gigs of RAM, and possibly even don't have 300 gigs in your swap file.

    That being the case, when you do slow down, it's because you haven't yet run out of virtual memory, but you've run out of RAM already, and you're swapping portions of your heap to disk like crazy. A sure performance killer.

    And when you get a segmentation fault, chances are you've run out of even swap space, so the memory allocation fails. When you ask for memory and there ain't none, the system responds by giving you a NULL pointer. That's a feature, not a bug. It means that if you don't check for failure to allocate memory but then try to use that nonexistent memory, you'll get a segment violation. That's nature's way of telling you you're doing something wrong.

    So. As I said here in the previous thread:
    do you check the result of each and every malloc() in your program? I don't see any malloc() in the code you presented, but how about in functions readJpeg() and writeJpeg()?
    What you'd check for, of course is, whether malloc() returned NULL, and die with an error message if it did.

    This is very important. If you aren't already checking the result of each memory allocation, I'd drop everything else and fix that bug first.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  5. #5
    Just Joined!
    Join Date
    Jan 2008
    Location
    Philippines
    Posts
    10
    hey, thank you so much for all the suggestions.

    anyway, i happen to think of a new solution (i hope): i just scaled down my input images in proportion to the original dimensions.

    the ratio of 640x480 is 1.333. i desired to hav a height of just 200 so i multiplied it w/ 1.333 and my new width will be 267. with these new dimensions, i was able to run my program less than 20 seconds/frame. its much faster. ;D (though,il be able to print all the frames within 1.9 hrs)

    one of the things im trying to do here is compute the cubic volume (cm^3) of a real object through those 2D input images. so... with these adjustments and computations in the orig dimensions, maybe il let math and calculus do the work.

  6. #6
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    i just scaled down my input images
    Excellent!

    Woody Allen said, "80% of success is showing up."

    In our line of work, 80% of success is thinking outside the box.

    You've just done that.

    Feel free to come back with further lines of thought on this or other situations.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  7. #7
    Just Joined!
    Join Date
    Jan 2008
    Location
    Philippines
    Posts
    10
    hehe. thank you so much!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...