Find the answer to your Linux question:
Results 1 to 4 of 4
The text line has the following formats: what.ever.bla.bla.C01G06.BLA.BLA2 what.ever.bla.bla.C11G33.BLA.BLA2 what.ever.bla.bla.01x03.BLA.BLA2 what.ever.bla.bla.03x05.BLA.BLA2 what.ever.bla.bla.Part01.BLA.BLA2 and other similar ones, I need a way to select the "what.ever.bla.bla" part out of the text. So ...
  1. #1
    Just Joined!
    Join Date
    Oct 2008
    Posts
    10

    Question Extract pattern from text line

    The text line has the following formats:

    what.ever.bla.bla.C01G06.BLA.BLA2
    what.ever.bla.bla.C11G33.BLA.BLA2
    what.ever.bla.bla.01x03.BLA.BLA2
    what.ever.bla.bla.03x05.BLA.BLA2
    what.ever.bla.bla.Part01.BLA.BLA2

    and other similar ones, I need a way to select the "what.ever.bla.bla" part out of the text.

    So basically it has to be based on such regex:

    Code:
    (.*?)(C[0-9]+G[0-9]+|[0-9]+x[0-9]+|Part[0-9]+)
    where (.*?) is the part I want to extract... any ideas?

  2. #2
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    So it depends a great deal on what the format actually is.

    If the format of each line is 7 columns separated by dots and you always want the 5th, then all you need is:
    Code:
    .*\..*\..*\..*\.(.*)\..*\..*
    Alternatively, if it is simply some constant string, followed by the desired part, followed by a constant string, you could do:
    Code:
    CONST_STRING_1(part_you_want)CONST_STRING_2
    So basically, more information on the format would be nice.
    DISTRO=Arch
    Registered Linux User #388732

  3. #3
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    Quote Originally Posted by TehOne View Post
    The text line has the following formats:

    what.ever.bla.bla.C01G06.BLA.BLA2
    what.ever.bla.bla.C11G33.BLA.BLA2
    what.ever.bla.bla.01x03.BLA.BLA2
    what.ever.bla.bla.03x05.BLA.BLA2
    what.ever.bla.bla.Part01.BLA.BLA2

    and other similar ones, I need a way to select the "what.ever.bla.bla" part out of the text.

    So basically it has to be based on such regex:

    Code:
    (.*?)(C[0-9]+G[0-9]+|[0-9]+x[0-9]+|Part[0-9]+)
    where (.*?) is the part I want to extract... any ideas?
    it is easier to split them using field delimiters than using regular expressions
    Code:
     cut -f1-4 -d"." file

  4. #4
    Just Joined!
    Join Date
    Oct 2008
    Posts
    10
    Quote Originally Posted by Cabhan View Post
    So it depends a great deal on what the format actually is.

    If the format of each line is 7 columns separated by dots and you always want the 5th, then all you need is:
    Code:
    .*\..*\..*\..*\.(.*)\..*\..*
    Alternatively, if it is simply some constant string, followed by the desired part, followed by a constant string, you could do:
    Code:
    CONST_STRING_1(part_you_want)CONST_STRING_2
    So basically, more information on the format would be nice.

    The what.ever.bla.bla was just an example, it does differ.
    It can be Example1.C11G33 or Bla123.Bla123.C11G33 and so on,
    I need to extract always the name that is befor the "C11G33" no matter what it is .*?

    The only thing that stays is the C11G33 or 01x03 or Part01 just with different numbers, just look on my regex example again.

    EDIT:

    solved

    Code:
    sed 's/\(.*\)\(C[0-9][0-9]*G[0-9][0-9]*\)\(.*\)/\1/'

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...