blairm
March 4th, 2009, 12:45 AM
Hi,
Looking at writing some form of web scraper to monitor the latest headlines.
Almost there with one site (which has no robots.txt), but the other I'm interested in has a robots.txt file which indicates - quite fairly - that the webmaster frowns upon scraping.
That site does have an rss feed - do the same restrictions on scraping generally cover rss feeds as well?
Considering scraping the rss page every 60 minutes while I'm at work and beng notified of new headlines by email.
(background: need notifications at work; our IT department refuses to install a feed reader and seems to have blocked the one in IE. They are however okay with me hosting something on one of my computers and using email notifications of new headlines).
Blair
Looking at writing some form of web scraper to monitor the latest headlines.
Almost there with one site (which has no robots.txt), but the other I'm interested in has a robots.txt file which indicates - quite fairly - that the webmaster frowns upon scraping.
That site does have an rss feed - do the same restrictions on scraping generally cover rss feeds as well?
Considering scraping the rss page every 60 minutes while I'm at work and beng notified of new headlines by email.
(background: need notifications at work; our IT department refuses to install a feed reader and seems to have blocked the one in IE. They are however okay with me hosting something on one of my computers and using email notifications of new headlines).
Blair