Find the answer to your Linux question:
Results 1 to 10 of 10
I have a distributed application for evaluating nodes in a large search tree (in the domain of musical chord progressions). Everything works very happily, but I am experiencing a problem ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Nov 2004
    Posts
    6

    Bizarre sockets problem


    I have a distributed application for evaluating nodes in a large search tree (in the domain of musical chord progressions). Everything works very happily, but I am experiencing a problem trying to update it.

    Here is the basic structure of the project ( in C++ ):

    Code:
    /* The class that takes care of moving through the tree and communicating */
    class CCHChordSearchTree;
    
    /* The class that represents a node in the tree */
    class CCHProgression;
    
    /* Lots of other classes that are used by these two, but not important to this discussion. */
    Ok, so everything was working great, until I decide I would like to make CCHChordSearchTree contain a new member that is an array of CCHProgression.

    Like this:
    Code:
    class CCHChordSearchTree
    {
         ...
         CCHProgression m_nBest[ NUM_BEST ];
         ...
    };
    With this new member, everything compiles beautifully. However, my calls to recvfrom() on the client side throw an "Invalid argument" error. I have checked that all the arguments are in fact valid. Here is the call, if you are interested:
    Code:
    void CCHChordSearchTree::ClientLoop( sockaddr_in p_aServer )
    {
         sockaddr_in l_aReceive;
         int l_iSocketReceive;
         socklen_t l_iReceiveLength;
         int l_iRetValue;
         CCHProgression * l_pReceive = new CCHProgression;
         l_iSocketReceive = socket( PF_INET, SOCK_DGRAM, IPPROTO_UDP );
    
         l_aReceive.sin_family = AF_INET;
         l_aReceive.sin_addr.s_addr = htonl( INADDR_ANY );
         l_aReceive.sin_port = htons( CLIENT_IN_PORT );
    
         l_iRetValue = bind( l_iSocketReceive, (sockaddr *)&l_aReceive, sizeof( l_aReceive ) );
         if&#40; l_iRetValue < 0 &#41;
              return;
    
         ...
    
         l_iRetValue = recvfrom&#40; l_iSocketReceive, l_pReceive, sizeof&#40; CCHProgression &#41;, 0, &#40;sockaddr *&#41;&l_aReceive, &l_iReceiveLength &#41;;
         if&#40; l_iRetValue < 0 &#41;
         &#123;
              perror&#40; "recvfrom" &#41;;
         &#125;
    
         ...
    &#125;

    Anyway, if the CCHChordSearchTree class contains a member of type CCHProgression, it works. If the CCHChordSearchTree class contains a member array of CCHProgressions, it does not.

    Oh, another interesting thing: Although recvfrom is reporting an error, if I check the l_pReceive pointer, it does contain the information that the server sent to it. I could just start ignoring the return value of recvfrom, but that sounds a bit dangerous.

    By the way, I am running Debian, with kernel version 2.6.8.

    Does anyone have the slightest idea why this might occur?
    Thanks,
    Chad Hogg

    Mod Edit: Added code tags

  2. #2
    Just Joined!
    Join Date
    Nov 2004
    Posts
    6
    For anyone who may be interested, I found that I could make this bug go away by changing the ordering of the member variables in my class. Specifically, this order fails:
    list<CCHProgression *> m_lQueue;
    CCHProgression m_nBest[ NUM_BEST ];
    If I reverse the order of those two members, it works perfectly. I have created a gcc bug report about this, but do not know if it will get fixed.


    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18457

  3. #3
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    I wouldn't bet that easily that it is a GCC bug. It just might be that you've been doing some core fandango from the far side of the program, and then it comes back to haunt you at the recvfrom, and switching the order of variables just happens to make that core fandango behave differently.

    If I were you, I'd put on the Hazmat suite and go in there with gdb and instruction-step through at least that part of the program.

  4. #4
    Just Joined!
    Join Date
    Nov 2004
    Posts
    6
    Thank you for your response. Could you clarify what you mean by "core fandango" ? I have been through the section that of code in the client up to the recvfrom call with gdb, and nothing looks in any way out of the ordinary. I would like to be able to step into the recvfrom call to see if it is receiving the same arguments I have passed it, but I haven't figured out how to do that yet.

  5. #5
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    Quote Originally Posted by The Jargon File, v4.3.0
    fandango on core n. Unix/C hackers, from the Iberian dance
    In C, a wild pointer that runs out of bounds, causing a core dump, or corrupts
    the `malloc(3)' arena in such a way as to cause mysterious failures
    later on, is sometimes said to have `done a fandango on core'. On
    low-end personal machines without an MMU (or Windows boxes, which have
    an MMU but use it incompetently), this can corrupt the OS itself,
    causing massive lossage. Other frenetic dances such as the cha-cha or
    the watusi, may be substituted. See aliasing bug, precedence
    lossage, smash the stack, memory leak, memory smash, overrun
    screw, core.
    In short, accidently smashing some malloc fence or similarly primitive structure somewhere which causes hell to freeze over 50 timeslices later, also known as the most annoying kind of bug ever.
    Valgrind is a great problem for finding those wild pointers that dance out of control. It's an x86 emulator which emits warnings whenever the program does something seemingly fishy, like reading from memory that hasn't yet been written to, or writing to memory outside a malloc arena, .data segment or .bss segment.

    The problem with syscall debugging is that lately, glibc has some stupid stub making the actual syscall hidden in the last 64k bytes of the address space (especially on a compilation which uses the sysenter instruction for making syscalls), which is used for all syscalls, which makes it very hard to isolate only one certain syscall. However, you can look up the syscall number for recvfrom in arch/i386/kernel/entry.S in the kernel source and set up a breakpoint conditional in GDB that only breaks on the sysenter instruction when EAX contains that syscall number. You can also use strace to see that the syscall truly does report that value. It's hard to truly validate the arguments outside of GDB, though -- maybe the best way is to add some printk's to the kernel source whenever recvfrom attempts to return EINVAL, though, to see where it really goes wrong. The annoying part of that is, of course, that recvfrom is in the static part of the kernel, so you'd need to reboot to do that kind of debugging (as opposed to reloading a certain module, for example), which is enough to make a sane man go nuts.

    Otherwise, are you sure that you're initializing l_iReceiveLength and similar variables correctly? You didn't include that code in your previous listing.

    Also, this looks strange to me:
    Code:
    sockaddr_in l_aReceive;
    
    ...
    
    ..., &#40;sockaddr *&#41;&l_aReceive, ...
    Have you typedef'd struct sockaddr{,_in} to primitive tokens?

  6. #6
    Just Joined!
    Join Date
    Nov 2004
    Posts
    6
    Er yes, thank you. I was not initializing l_iReceiveLength to any value because I was interpreting the documentation as saying the final argument to recvfrom is strictly a place for the call to place a value - an out parameter. I was fairly sure I had checked this in the past, but apparently not.

    Now it is just a bit frightening that it worked so well in the past.

    Anyway, thank you for your help.
    Chad Hogg

  7. #7
    Just Joined!
    Join Date
    Nov 2004
    Posts
    6
    I haven't done anything unusual with sockaddr and sockaddr_in. I can't currently find the resource I was using when writing this originally, but my understanding was that the sockaddr structure holds address information about any kind of address, while the sockaddr_in structure is the same size but splits the address into IP address and port number members. The man page for "ip" sort of describes it.

  8. #8
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    Yeah, but since they are structures, you have to write "struct sockaddr" and "struct sockaddr_in", unless you typedef them.

    On second thought, it might be a C++ "extension" (in contrast to C) to automatically typedef all structs. I haven't used C++ in a long while (doing my best to stay away from it), but I think I've heard something about that.

  9. #9
    Just Joined!
    Join Date
    Nov 2004
    Posts
    6
    I believe your second thought is correct. I am a C++ programmer who has just recently started writing some straight C code and I found the whole using the struct keyword everywhere thing to be ludicrous.

    I believe the same thing occurs with enums between C and C++.

  10. #10
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    I think having to use the `struct' and `enum' keywords explicitly is important, since it cleans up the namespaces.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •