Find the answer to your Linux question:
Results 1 to 5 of 5
I'm trying to math all class references in a C++ file using grep with regular expression. I'm trying to know if a specific include is usuless or not, so I ...
  1. #1
    Just Joined!
    Join Date
    May 2010
    Posts
    2

    Using grep and regular expression to find class references in C++ file

    I'm trying to math all class references in a C++ file using grep with regular expression. I'm trying to know if a specific include is usuless or not, so I have to know if there is a refence in cpp.
    I wrote this RE that searches for a reference from class ABCZ, but unfortunately it isn't working as I espected:

    grep -E '^[^(\/\*)(\/\/)].*[^a-zA-Z]ABCZ[]*[\*\(\<:;,{& ]'

    ^[^(\/\*)(\/\/)] don't math comments in the begging of the line ( // or /* )
    .* followed by any character
    [^a-zA-Z] do not accept any caracter before the one I'm searching (like defABCZ)
    []* any white space (I can have something like ABCZ var; )
    [\*\(\<:;,{& ] followed by ( * < : ; , & { (I cant get #define "ABCZ.h" or ABCZdef for example)

    Well, I can get patterns like this:

    class Test: public ABCZ{
    class Test: public ABCZ {
    class Test : public ABCZ<T>
    class NameSpace::Test : public ABCZ<T>
    ABCZ var = ABCZ();
    ABCZ* var = new ABCZ();
    ABCZ* var = new ABCZ<T>();
    teste = ABCZ(var);
    teste = ABCZ(var, otherVar);
    teste = ABCZ(var,otherVar);
    teste = ABCZ();
    funct(*ABCZ av);
    struct(ABCZ, BBBB, CCCC);
    const ABCZ& getNot_before(void) const;

    That's all right, they are all acceptable patterns. But these I'm missing, and they are all acceptable too:

    ABCZ teste;
    ABCZ teste; //comment
    ABCZ teste; /* comment*/
    ABCZ teste; /* comment*/
    ABCZ teste;
    ABCZ *teste;

    And all these are not acceptable:
    #include "ABCZ.h"
    //ABCZ var = ABCZ();
    ABCZdef a = ABCZdef();
    DefABCZ a = DefABCZ();
    ABCZdef a = ABCZdef();
    /*ABCZ var = ABCZ();

    I don't know if I'm missing any other patterns, if I am, please tell me =p
    What am I doing wrong?
    I'll be very grateful if someone answer me
    Thanks in advance

  2. #2
    Just Joined!
    Join Date
    May 2010
    Posts
    7
    Sorry if this isn't helpful, but it looks like you know a bit about regular expressions, is the problem that you're trying to do one pattern for everything?

    Might be easier to write a series of patterns:

    Code:
    grep -E 'foo'
    grep -E 'bar'
    Instead of doing it all in one go. Certainly possible to combine, but breaking up the problem will make it easier to maintain and debug.

  3. #3
    Linux Enthusiast Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    717
    +1 to zanerock. Find all types you want to regex and write that regex. Finally you can combine it to one big regex if you want to.

    i.e. these are different instances:

    class Test : public ABC { # definition
    ABC abcVar = new ABC(); # instantiation
    ABC abcVar; # variable definition

    Much harder are things like struct(*ABC var, ..) because you don't know what ABC is. Is it a class, a struct, a union, a primitive? This only knows the lexer after he parsed the complete source code (syntactical check) and when he starts examining it semantically.

    Unfortunatly this problem is not solvable with a state machine (regex) because it needs some memory (somewhat like a turing machine). Such things you learn around the 4th or 5th semester in informatics at the university. If you are able to do it, get your reward and become famous.

  4. #4
    Just Joined!
    Join Date
    May 2010
    Posts
    2
    Thanks you zanerock and Kloschüssel!
    You both are right, that's the first time I use regular expression and I'm still in 4th semester =p
    I will try to write a series of patterns.

    Thank you very much!

  5. #5
    Linux Enthusiast Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    717
    You're welcome.

    Anyway you'll need more than regex provides you. If your professors didn't so far, I recommend you to read how parsers and compilers work. This will bring you to the point where you must understand how lexers and scanners tokenize code and recognize context free and context bound grammars. In the end you'll recognize that this here is a problem that is proofed to be unsolvable with state machines that lack of memory.

    Have fun!

    PS: if this thread is closed, do so with the thread tools

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...