Results 1 to 4 of 4
I am trying to extract text from a file, say the file looks like this:
</tr><tr><td>domain1.com</td><td>2006-11-06 14:43:37</td>
<td><a href="index.php?_host_id=47">1</a></td>
<td><a href="index.php?_host_id=47">2</a></td>
<td><a href="index.php?_host_id=47">3</a></td>
</tr><tr><td>domain2.com</td><td>2006-11-06 14:43:37</td>
<td><a href="index.php?_host_id=48">1</a></td>
<td><a href="index.php?_host_id=48">2</a></td>
<td><a ...
- 11-06-2006 #1Just Joined!
- Join Date
- Nov 2006
- Posts
- 3
Bash scripting - string searching
I am trying to extract text from a file, say the file looks like this:
</tr><tr><td>domain1.com</td><td>2006-11-06 14:43:37</td>
<td><a href="index.php?_host_id=47">1</a></td>
<td><a href="index.php?_host_id=47">2</a></td>
<td><a href="index.php?_host_id=47">3</a></td>
</tr><tr><td>domain2.com</td><td>2006-11-06 14:43:37</td>
<td><a href="index.php?_host_id=48">1</a></td>
<td><a href="index.php?_host_id=48">2</a></td>
<td><a href="index.php?_host_id=48">3</a></td>
and many other lines like this. What I need is a way to get all of the host_id's only once and only 1 per line, like this:
47
48
I have found the domain string with this already:
awk '/domain/' file | cut -c 15-25
- 11-06-2006 #2
I would look into grep and sed.
Flies of a particular kind, i.e. time-flies, are fond of an arrow.
Registered Linux User #408794
- 11-06-2006 #3Linux Newbie
- Join Date
- Aug 2006
- Posts
- 226
If all the lines follow that same format then I one possibility would be the following:
grep '_host_id=' file | cut -d '=' -f2 | sed 's/\".*$//'
grep only returns the lines with the "_host_id" string
cut returns the text following the second = sign
sed strips everything from the " and beyond
You could probably get rid of cut and do the line processing with just sed, but I don't have the time to figure it out. I also haven't tested this so it may be slightly off.
- 11-06-2006 #4Just Joined!
- Join Date
- Nov 2006
- Posts
- 3
Figured it out, thanks for the help.
Didn't think to just 'cut' it up.
Something useful too:
'sort -um' will give you unique instances of strings within a file without sorting the results by value.


Reply With Quote