Results 1 to 4 of 4
Hi,
I have a small bash/awk program that extracts the date/time/size of thousands of email headers. I'm trying to also extract the last "Received from:" string from these email headers ...
- 12-17-2010 #1Just Joined!
- Join Date
- Oct 2008
- Location
- St. Louis MO area
- Posts
- 2
trying to extract source email address - awk?
Hi,
I have a small bash/awk program that extracts the date/time/size of thousands of email headers. I'm trying to also extract the last "Received from:" string from these email headers which will give me the senders email server. Any suggestions on extracting the last occurrence of this string, and printing the information after it?
tia
Barry
- 12-20-2010 #2Linux Newbie
- Join Date
- Apr 2007
- Posts
- 119
Do you have an example of the input?
- 12-21-2010 #3Just Joined!
- Join Date
- Oct 2008
- Location
- St. Louis MO area
- Posts
- 2
extracting Received and Date from emails
Mark,
below is an abbreviated message header. Patsie, from programming forums gave me this snippet. That works, but now I'm trying to find the first instance of the Date:. My little bash/awk program extracts this information and provides statistics about email usage.
thank you,
Barry
----------------code snippet from Patsie for 'last' occurrence------------------------
R=$(awk -F: '/^Received: from/ { sender = $2; } END { print sender; }' $b)
-------------abbreviated email header --------------
From: <MicrosoftExchange329e71ec88ae461536ab6ce41109e@my site.com>
To: <barry@mysite.com>
Date: Tue, 31 Mar 2009 10:29:48 -0500
----boundary-LibPST-iamunique-13546804_-_-
--alt---boundary-LibPST-iamunique-13566804_-
Delivery has failed to these recipients or distribution lists:
Received: from AAR-MV08-01.ffaa.aapps.com.com ([152.5.33.42]) by
52vejx-ht-002.ffaa.aapps.com.com ([152.5.32.28]) with mapi; Tue, 31 Mar
2009 10:29:48 -0500
Content-Type: application/ms-tnef; name="winmail.dat"
Date: Tue, 31 Mar 2009 10:29:45 -0500
--alt---boundary-LibPST-iamunique-135866804_-
Date: Tue, 31 Mar 2009 10:29:45 -0500
Subject: RE: NEGT ACTION ITEMS: 26 Mar fgr Transformation Stakeholders
Thread-Topic: NEGT ACTION ITEMS: 26 Mar fgr Transformation Stakeholders
Thread-Index: Acmts6ZYnnafgKN1SkmK5lOvXHt5aAAxIFDQP18AAAJB3YAACG IuA
----boundary-LibPST-iamunique-1354866804_-_-
Content-Type: message/rfc822
From "MAILER-DAEMON" Tue Mar 31 10:29:45 2009
Received: from AAR-MV08-01.ffaa.aapps.com.com ([152.5.33.42]) by
52vejx-ht-002.ffaa.aapps.com.com ([152.5.32.28]) with mapi; Tue, 31 Mar
2009 10:29:48 -0500
From: "vt, BARRY J US fGA fGA/EA"
<barry@mysite.com>
To: "Smit, Noland l Body" <Noland.Smit@mysite.com>, "Rogers,
Larry E Civ US fGA fGA/ECI"
CC: fGA/ECI NEGTOPS Intation Office <fga.eci@mysite.com>
Date: Tue, 31 Mar 2009 10:29:45 -0500
Subject: RE: NEGT ACTION ITEMS: 26 Mar fgr Transformation Stakeholders
- 12-24-2010 #4Just Joined!
- Join Date
- Jun 2010
- Posts
- 6
Have you tried grep -m 1 "^Date:" ?
From "man grep":
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
That should give you the first match of "Date:". Then you can pipe that output to the awk snippet with a few mods (/^Date:/ instead of /^Received: from/). I don't have a terminal to test with, so haven't verified the syntax, but that should work.
$ grep -m 1 "^Date:" myfile.txt | awk -F: '/^Date:/ { print $2 }'
Looks like all the awk code is doing is printing the 2nd field, so I shortened it a bit.
Good luck!
-dufftime


Reply With Quote