Find the answer to your Linux question:
Results 1 to 9 of 9
I'm working on a keyfile generator, and I'm trying to give users an option to speficy the block_size in bytes, kilobytes, megabytes, or gigabytes. I use the formula Code: size ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Linux Newbie SagaciousKJB's Avatar
    Join Date
    Aug 2007
    Location
    Yakima, WA
    Posts
    162

    Converting bytes to gigabytes in C


    I'm working on a keyfile generator, and I'm trying to give users an option to speficy the block_size in bytes, kilobytes, megabytes, or gigabytes.

    I use the formula

    Code:
    size *= 1024*1024*
    And downard accordingly for kilobytes and megabytes, but

    Code:
    size *= 1024*1024*1024
    Keeps producing results that are inaccurate. Sometimes, for instance with 3 gigs, it will work, however with 4, it will just equal out to 0, and for 5, to 1.


    I've tried using both signed and unsigned as well as long integers, thinking perhaps maybe I was overflowing the integer and causing it to cut off bits, but that doesn't have any effect. Is it an issue with memory, or just how I'm performing the calculation?

    Below is the "important" part of the code, or at least everything that effects this part of it. I am confident it isn't any other part of the program because the other options aside from gigabytes work fine.

    Code:
    unsigned int byte, i = 1, size;
            size = (argv[1][1]=='k')?atoi(argv[3]):atoi(argv[2]);
    
            if(block_size == 'b' || block_size == 'B') {
            /*bytes*/
            size=size;
            }
            else if(block_size == 'k' || block_size == 'K') {
            /*kilobytes*/
            size *= 1024;
            }
            else if(block_size == 'm' || block_size == 'M') {
            /*megabytes*/
            size *= 1024*1024;
            }
            else if(block_size == 'g' || block_size == 'G') {
            size *= 1024*1024*1024;
            }
    I'm thinking about just removing the gigabyte option, because whoever heard of a keyfile gigabytes in size, but it is for a XOR encryption program ( which works best with keys the same size as the file ). Even then, however, if a user tried to use "7500" megabytes to get a 7.5 gig file, they would still have this problem, so I need to narrow down what the issue is here.

    I'm reasonably sure that I'm not overflowing my integers, but it's been so long since I've had that happen I don't remember all the details surrounding it.

  2. #2
    Linux Engineer GNU-Fan's Avatar
    Join Date
    Mar 2008
    Posts
    935
    Hello,

    reasonably sure not to overflow your integers, huh?

    1024 = 2^10
    -> 1024*1024*1024 = 2^30
    4 = 2^2
    -> 4GB = 2^32

    32 Bits = 4 Bytes

    What does a sizeof(size) result in?
    And what is changed by making it long?
    And now long long?

  3. #3
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    I'm reasonably sure that I'm not overflowing my integers
    Actually, you are indeed overflowing your integers.

    GNU-Fan is correct. On a typical installation, a variable of type int or long occupies four bytes of memory.

    8 bits, you surely know, give you 256 possibilities. 32 bits give you 256*256*256*256 possibilities.

    If the variable is of type signed int or signed long, those values range from
    Code:
    -128*256*256*256
    to
    Code:
    128*256*256*256-1
    If the variables is of type unsigned int or unsigned long, those values range from
    Code:
    0
    to
    Code:
    256*256*256*256-1
    which is 4 gig minus one.

    GNU-Fan is further correct in suggesting that you try variables of type long long. Either signed or unsigned will work, because you're not going to find files whose size cannot easily be stored in 64 bits.

    But you're going to have to adjust other things as well. I can think of four.
    1. Just in case the user decides to present you with a huge file and specifies the file size in bytes (rather than gigabytes), you should use strtoll() instead of atoi(). Note that there are two ells at the end of the function name. There's another function with just one ell, but that's not what you want.
    2. If you're actually going to open these files, you'll need to make provisions for huge files. There's more than one way to do this, but the standard, portable way is to declare this at the top of every source file, whether a .c file or a .h file (or at the top of some file that every file includes, just so it gets compiled before everything else in every module that is part of this program):
      Code:
      #define _FILE_OFFSET_BITS 64
    3. Examine closely all file operations that involve specifying or retrieving a byte count, such as lseek() and stat(). While you're at it, review the man pages for those functions. You're already using variables of type off_t where the man pages suggest, right, instead of int or long? You should be, and since you are, you will be immediately rewarded, because defining _FILE_OFFSET_BITS to be 64 automatically makes all variables of type off_t to contain 8 bytes, not four. You'll need those larger variables to deal with larger files.
    4. If you want to use functions such as sscanf(), fprintf() with variables which are of type off_t, you'll have to manually go in and change the format from %d or %ld to %lld.

    That should cover it.

    Hope this helps.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  4. #4
    Linux Newbie SagaciousKJB's Avatar
    Join Date
    Aug 2007
    Location
    Yakima, WA
    Posts
    162
    Quote Originally Posted by wje_lf View Post
    Actually, you are indeed overflowing your integers.

    GNU-Fan is correct. On a typical installation, a variable of type int or long occupies four bytes of memory.

    8 bits, you surely know, give you 256 possibilities. 32 bits give you 256*256*256*256 possibilities.

    If the variable is of type signed int or signed long, those values range from
    Code:
    -128*256*256*256
    to
    Code:
    128*256*256*256-1
    If the variables is of type unsigned int or unsigned long, those values range from
    Code:
    0
    to
    Code:
    256*256*256*256-1
    which is 4 gig minus one.

    GNU-Fan is further correct in suggesting that you try variables of type long long. Either signed or unsigned will work, because you're not going to find files whose size cannot easily be stored in 64 bits.

    But you're going to have to adjust other things as well. I can think of four.
    1. Just in case the user decides to present you with a huge file and specifies the file size in bytes (rather than gigabytes), you should use strtoll() instead of atoi(). Note that there are two ells at the end of the function name. There's another function with just one ell, but that's not what you want.
    2. If you're actually going to open these files, you'll need to make provisions for huge files. There's more than one way to do this, but the standard, portable way is to declare this at the top of every source file, whether a .c file or a .h file (or at the top of some file that every file includes, just so it gets compiled before everything else in every module that is part of this program):
      Code:
      #define _FILE_OFFSET_BITS 64
    3. Examine closely all file operations that involve specifying or retrieving a byte count, such as lseek() and stat(). While you're at it, review the man pages for those functions. You're already using variables of type off_t where the man pages suggest, right, instead of int or long? You should be, and since you are, you will be immediately rewarded, because defining _FILE_OFFSET_BITS to be 64 automatically makes all variables of type off_t to contain 8 bytes, not four. You'll need those larger variables to deal with larger files.
    4. If you want to use functions such as sscanf(), fprintf() with variables which are of type off_t, you'll have to manually go in and change the format from %d or %ld to %lld.

    That should cover it.

    Hope this helps.
    I actually am unaware of the man pages for lseek and stat, , but picked up through reading mailing lists that I needed the _FILE_OFFSET_BITS definition, I'm really not sure at all what you mean by off_t.

    Changing the variables to type "unsigned long long" seems to be working perfectly according to gdb, but I'm still kind of curious about the rest of the points you made. I don't have the man pages for stat or lseek installed, so I'll have to go find them after I finish this post up. Right now I'm kind of having a little bit of a problem with strtoll. I can't figure out the proper number to use for "base". If I set it to 0, and then entered "12884901888" I get 0, and if I set it to 16 I get "18446744071638620296", and with it at 8, I get "10". This looks like what I need to do doesn't it?

    Code:
    size = (argv[1][1]=='k')?strtoll(argv[3],NULL,0):strtoll(argv[2],NULL,0);

  5. #5
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    off_t is a variable type, like int or long or long long. Its size varies depending on whether you define _FILE_OFFSET_BITS.

    I guess you already know that you can pick up the man pages by googling this:
    Code:
    linux man stat
    and
    Code:
    linux man lseek
    It's a good idea to use whatever variable type is recommended by a function's man page (in this case, off_t). In the case of off_t, I can think of two reasons for such goodness.
    1. It ensures that your variables will be of the correct type regardless of whether you define _FILE_OFFSET_BITS.
    2. It helps you avoid errors by guessing a substitute file type. For example, unsigned long long might seem to work well as a substitute for off_t, but it's not quite accurate. You want signed long long, though you can leave out the "signed" part. Why? Here's an example: If you call lseek() and you get an error, lseek() returns a result of minus one. At best, it's bad form to check whether an unsigned integer of any size is -1.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  6. #6
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    Right now I'm kind of having a little bit of a problem with strtoll. I can't figure out the proper number to use for "base".
    Read the man page for strtoll() and everything should be revealed.

    In general, you should install man pages on your system and be familiar with the man page for every function you call.

    The reason you get something really weird for base 8 is that in base 8, digits are restricted between 0 and 7, and you have an 8 in your input.

    I don't know why you get 0 when you specify base 0. Try putting that in a short, complete program; if you don't get what you expect, post that entire short, complete program here. (By "complete", I mean so I can copy and paste it into a new window and expect it to compile, so I can run it myself.)
    --
    Bill

    Old age and treachery will overcome youth and skill.

  7. #7
    Linux Newbie SagaciousKJB's Avatar
    Join Date
    Aug 2007
    Location
    Yakima, WA
    Posts
    162
    Actually, I was looking at the the man page on Google, but I guess perhaps it's time to rest my tired eyes... I thought "stdlib.h" was "stdio.h" and only realized it when I opened a different man page on Google with larger font. It works fine the way I posted when I included stdlib.h

    So, the program seems patched up, I think I ought to give it a rest before reading other man pages.

    Also, I like to keep man pages on my system, but I've yet to find a lot of info on how to know which ones you're supposed to install. Like, for example, if I wanted to install man pages for the functions in stdlib, so since I'm on Ubuntu I try, "sudo apt-cache search stdlib man" and get nothing, and sadly I don't even know how I got the man pages for the ones I do have installed.

    Edit

    Gotta love google, apparently I need to use "man 3 function" for some things.


    Thanks for the help guys.

  8. #8
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    apparently I need to use "man 3 function" for some things.
    Hey, man! Ya gotta do this, first chance ya get:
    Code:
    man man
    (Seriously. Do it.)
    --
    Bill

    Old age and treachery will overcome youth and skill.

  9. #9
    Linux Newbie SagaciousKJB's Avatar
    Join Date
    Aug 2007
    Location
    Yakima, WA
    Posts
    162
    Quote Originally Posted by wje_lf View Post
    Hey, man! Ya gotta do this, first chance ya get:
    Code:
    man man
    (Seriously. Do it.)
    Weird, I'd never thought about doing that one before... For some reason I had the idea man was just an alias or some kind of built-in version of less. This seems to be gradually shifting more towards man files than toward C... :P


    Anyway, thanks for the help once again.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •