Plan of action for WP Web Scraper

Posted May 27, 2009 at 12:20 pm by Akshay

My latest WordPress plugin for web scraping – WP Web Scrapper was a grand launch. It recorded more than 200 downloads in the first two days itself! Thanks for all the appreciation and comments. This post is mainly to list down my plan to extend WP Web Scrapper into a standard scraping framework. Apart from being a flexible framework, I also plan to introduce some pre-built modules to make specific and highly desired scraping tasks easy. First such module will be a stock market data grabber. This module will extend the plugin to get stock market data from various big exchange websites easily (planning to support NSE, BSE and NASDAQ to start off with). The data will be almost realtime (delay ranging between 1 to 10 mins) and will include Open, High, Low, Last Price, Previous Close, Change, Change Percentage and Volume information for all active symbols on these exchanges.

The plugin API will provide a shortcode something like this – [wpws mod="nse" symbol="acc" datatype="last"] should output the latest price for ACC listed at NSE. The aim is to make it an extendable module framework and hence I am taking time to code it well. Apart from this features, I am also planning to improve the core scrapper with functionalities like a regex powered cleanup function to remove all unwanted text strings from the scrap and also a more flexible algorithm to query html tables returned by the scrap.

3 Comments

  1. Hi Akshay – Do you have a stand alone version of the WP web scraper? So that one could put it on any site?

  2. Thats actually a good idea. Will soon deploy a standalone version too.

  3. Interesting point on web scrappers, I use python for simple html web scrappers, but for larger projects i have used extractingdata.com web scrapper which builds custom web scrappers and data extracting programs simple and fast

Leave a Comment