Find the answer to your Linux question:
Results 1 to 4 of 4
Hi All, I have a C++ stringstream variable that stores some compressed binary data in gzip format. I want to decompress this stringstream variable in memory. First of all, for ...
  1. #1
    Just Joined!
    Join Date
    Nov 2009
    Posts
    43

    Question in-memory decompression of binary data in gzip format

    Hi All,

    I have a C++ stringstream variable that stores some compressed binary data in gzip format.

    I want to decompress this stringstream variable in memory.

    First of all, for in-memory decompression of binary data in gzip format, what third party library do you suggest to use ?

    I noticed zlib library for compression/decompression of gzip and deflate formats.

    However, the two functions handling decompression that zlip provides do not seem to meet my needs exactly:

    Code:
    int uncompress (Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen); 
    int gzread (gzFile file, voidp buf, unsigned len);
    The first one (uncompress) requires me to know the length of the decompressed data in advance to properly allocate enough memory for storage. In my case, it is unknown.

    On the other hand, the second one (gzread) takes a file as input, not a memory buffer.

    What do you suggest for an "efficient" in-memory decompression using zlip or some other library ?

    Thanks.

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    These should be as efficient as anything for gzip'd data. Since you can get the pointer to the raw data in your stringstream object, then uncompress() would be the preferred choice I would think. You can allocate a decompression buffer to a reasonable maximum size that you expect to need. If the uncompress() function returns Z_BUF_ERROR indicating the output buffer was too small then you can reallocate the buffer to a bigger size and call uncompress again. If your initial buffer size is sensible, then this should not occur very often so as to create a performance problem.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Nov 2009
    Posts
    43
    Quote Originally Posted by Rubberman View Post
    These should be as efficient as anything for gzip'd data. Since you can get the pointer to the raw data in your stringstream object, then uncompress() would be the preferred choice I would think. You can allocate a decompression buffer to a reasonable maximum size that you expect to need. If the uncompress() function returns Z_BUF_ERROR indicating the output buffer was too small then you can reallocate the buffer to a bigger size and call uncompress again. If your initial buffer size is sensible, then this should not occur very often so as to create a performance problem.
    In my case, initial buffer size is not sensible, which may create a performance problem.

    Instead, I noticed following set of functions in zlib which should be used in combination:

    int inflateInit(z_streamp strm);
    int inflate(z_streamp strm, int flush);
    int inflateEnd(z_streamp strm);

    They seem to get the exact content size from the gzip header itself, and allocate all data structures accordingly.

    Do you have enough experience with them to suggest their use considering the performance issues ?

    Thanks.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Quote Originally Posted by aryan_ View Post
    In my case, initial buffer size is not sensible, which may create a performance problem.

    Instead, I noticed following set of functions in zlib which should be used in combination:

    int inflateInit(z_streamp strm);
    int inflate(z_streamp strm, int flush);
    int inflateEnd(z_streamp strm);

    They seem to get the exact content size from the gzip header itself, and allocate all data structures accordingly.

    Do you have enough experience with them to suggest their use considering the performance issues ?

    Thanks.
    No. I'm not personally familiar with the zlib API's or C++ class interfaces. This sounds like a good starting point, however. Also, since the source code is easily available, you can see what they do to get the inflated size of the compressed data if you want to go more directly to the data.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...