Results 1 to 1 of 1
I have a question about changing how parsing occurs currently for us:
Code:
input FILE123
TAGA01: 01
TAG02: daadsf
TAG03: adfasdf
TAGBBB04: 35
TAG05: asdfa
TAG07: adfd
TAG07: adfa3
TAG07: ...
- 08-31-2011 #1Just Joined!
- Join Date
- Aug 2011
- Posts
- 1
Change parsing
I have a question about changing how parsing occurs currently for us:
Note that some tags may not be in a record, and some tags may repeat in the same record.Code:input FILE123 TAGA01: 01 TAG02: daadsf TAG03: adfasdf TAGBBB04: 35 TAG05: asdfa TAG07: adfd TAG07: adfa3 TAG07: 234234 TAGCC08: 3525df TAG09: adsfa TAG10: 245 TAG11: nnnn EOR: TAGA01: 02 TAG02: abas TAG03: asdfasd TAGBBB04: E TAG05: asdfasd TAG07: acvasc TAG07: czcvc TAG07: 22 TAGCC08: adsfasd TAG09: Y TAG11: yyyy EOR: . . .
I need to covert to the following inline format (limiter doesn't matter, and I can change it should the data include the limiter in other files) and trim it so the tag doesn't appear:
Code:Format: TAGA01 TAGCC08 TAGBBB04 TAG09 TAG11 output.file: 01 3535df 35 adsfa nnnn 02 adsfasd E Y yyyy . . .
Here is what is used currently (from memory, so the syntax isn't correct but the idea is):
where awkfile.awk contains if statements and a printf output statement (again, syntax along with substring numbers are not correct - but the idea is there):Code:cat FILE123 | egrep "^TAGA01 ^TAGBBB04 ^TAGCC08 ^TAG09 ^TAG11" | awk -F. -f awkfile.awk > output.file
I wanted to see different ideas for two reasons: one to see if this could be more efficient since every tag gets multiple ifs every time, and just to straight up learn something new.Code:if ($1==TAGA01) {pTAGA01=substr($1,3)} . . . if ($1==TAG11) { pTAG11=substr($1,4) printf pTAGA01 ... pTAG11 }
Thanks for your time!


Reply With Quote