[ubuntu] Easiest way to meld multiple large huge hosts files (block unwanted parasitic sites)

Easiest way to meld multiple large huge hosts files (block unwanted parasitic sites)

Can you suggest improvements to this process of maintaining a large hosts file?

When I find an objectionable site (advertisement, redirect, flash content, etc.) I simply add the domain to my hosts file and never have to see that content again.

But every few months, I find a few more unwanted parasites slipping through, so I pick up a new (huge) hosts file, from, say:
Code:
http://winhelp2002.mvps.org/hosts.htm
What I do is save the hosts.txt file to /tmp/hosts.txt and then strip out all but the redirects (remove double spaces, remove comments, & remove the localhost line)
Code:
http://winhelp2002.mvps.org/hosts.txt
grep -v \# hosts.txt | sed -e 's/  / /g' | grep -v 127.0.0.1 localhost\$ | sort -u >> /etc/hosts

NOTE: The sed is removing redundant spaces so it's consistent with my existing hosts file.
My first problem is getting the syntax right for the removal of the localhosts line (I actually have to delete that one line manually because some of the valid lines to keep also have localhosts in the domain name).

Then, I try to cull out duplicates, using:
Code:
sort -u /etc/hosts -o /etc/hosts
Unfortunately, that process moves the localhost and other top-level lines to a lower level which I then have to manually bring back to the top to keep a semblance of the original hosts order.
Code:
127.0.0.1 machine localhost.localdomain localhost
127.0.1.1 machine
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# begin parasitic blocks
This process works - but it could use an improvement.

Any ideas?

Thread: Easiest way to meld multiple large huge hosts files (block unwanted parasitic sites)

Thread Tools

Display

Threaded View

Easiest way to meld multiple large huge hosts files (block unwanted parasitic sites)

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions