Find the answer to your Linux question:
Results 1 to 3 of 3
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1

    Filtering out lines based on substrings

    Hello all,

    Here's what I'm trying to do:

    1.Gather several 'hosts' files
    2. Merge them.
    3. Find and delete duplicates.
    4. Optimize the result.

    I found out how to do 1-3 with 'sort'. Step 4 is what trying to figure out next.

    Let's say I have this file: locahost

    I would like to remove the as is it made redundant by and because of

    I think is can be done by:

    For every line CUR_LINE in file
    - Take parf of CUR_LINE that comes after " " -> CUR_PART //after space too!
    - For every line COMP_LINE in same file
    -- Take parf of COMP_LINE that comes after " " -> COMP_PART
    -- If COMP_PART is a substring of COMP_LINE that comes at the end of COMP_LINE and Length(COMP_PART)is smaller than Length(COMP_LINE)
    --- Delete (CUR_LINE)

    Do excuse my mangled up pseudocode.
    What I'm trying to show here is that the actual domain part (without the preceding " ") of every line is compared to the actual domain part of every other line and if one is a substring of another, but only at the end and of smaller length, the longer one is deleted.

    The problem is that being the newbie that I am, I have no idea how to achieve this, and frankly, I'm not even sure which search keywords to use.

    What commands would be useful here? How do I do this?


  2. #2
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    My first impulse was to (again) suggest a dns server, which might be still valid as it offers a central point of configuration.

    But what is the deal with having all those hosts point to localhost?
    You must always face the curtain with a bow.

  3. #3
    The hosts file pointing to or is an anti ad/tracking/malware/etc measure. There are several resources with such file ready to go. I wanted to include some links, but it wouldn't let me due to low post count. A search for the keywords: hosts block ads brings relevant results, for those who may be interested.

    I want to create such a file to be used on Androd phones. I'm looking to combine some selected files from the sites available with one particular file that includes common domains for mobile tracking specifically. I would also like to optimize it in the way I described above just in case it makes a difference for the weaker mobile CPUs, and because it just makes sense (that is I like things like that to be neat and tidy).

    This also means that using a DNS server is not an option.

    Bases on this thread (note the missing dots) www linuxforums org/forum/networking/182629-does-dns-resolving-subdomains-go-through-main-domain.html this whole optimization idea may not work at all, but if someone wanted to give me some ideas about how this could be implemented, I'm still interested just for the learning.
    Last edited by textscript; 09-15-2011 at 01:38 AM.

  4. $spacer_open

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts