HTML Table Queries through WP Web Scraper

Posted August 11, 2009 at 3:07 pm by Akshay

Scraping HTML tables is easy, but parsing them has always been tricky. That’s exactly what my next release of WP Web Scraper will let you do. This feature will have methods to query HTML tables within your scrap. For instance, the scraper will let you filter by value of a specific table column and also restrict the number of rows using a ‘from’ and ‘to’ index key.

Further, it will also let you delete a certain column from the output and also apply specific CSS classes to even and odd rows. This feature is specifically designed for users intending to scrap and filter or parse data extracted from HTML tables. This feature will be implemented as a module within WP Web Scraper.

Commitement to Open Source

Posted July 24, 2009 at 1:23 am by Akshay

Since the time I started writing code in bits and pieces, I always dreamed of creating at-least one major open source project. Although the Open Source conceptually still revolves more around GNU/Linux or like operating system, I personally feel that any piece of code released with its complete source code for general public usage can be broadly categorized as Open Source. For me, it simply gives me pride of being usable to someone whom you don’t even know. Its a blissful feeling to check your mail after days work to find appreciation notes, comments and suggestions on some plugin or widget you have developed.

This post is a small thanks giving note to all those you downloaded my WordPress plugins – Flash Photo Gallery and WP Web Scraper. In all these have received about 6,000 downloads in just about 5 months! Thanks for all your comments, suggestions, bug notifications and Donations!

WP Web Scraper – A WordPress Stock Market plugin

Posted June 8, 2009 at 6:05 pm by Akshay

This is probably a major milestone in the lifecycle of WP Web Scraper WordPress plugin. Technically speaking, the plugin gets in own ‘module architecture’ to incorporate unlimited extensions without touching the core codebase. Speaking non-technically, this opens WP Web Scraper to a non-techie WordPress user. To startoff, this mod extends the plugin with a specific shortcode to get stock market data from NSE and NASDAQ (to start off with, more exchanges soon to come). The data is scraped with a cache interval of a minute (which can be further increased as per your requirement) and includes data types such as Open, High, Low, Last Price, Previous Close, Change, Change Percentage and Volume information for all active symbols on these exchanges.

The plugin API will provides a simple shortcode. For example – [wpws_market_data market="nse" symbol="acc" datatype="last"] or [wpws_market_data market="nasdaq" symbol="csco" datatype="open"]. NSE data is currently scraped from nseindia.com and NASDAQ data is scraped from reuters.com. The immediate plan is to implement all major stock markets in this API. Later, I plan to extend this modular architecture to other categories of scraps such as Weather, Sports scores etc too.

Plan of action for WP Web Scraper

Posted May 27, 2009 at 12:20 pm by Akshay

My latest WordPress plugin for web scraping – WP Web Scrapper was a grand launch. It recorded more than 200 downloads in the first two days itself! Thanks for all the appreciation and comments. This post is mainly to list down my plan to extend WP Web Scrapper into a standard scraping framework. Apart from being a flexible framework, I also plan to introduce some pre-built modules to make specific and highly desired scraping tasks easy. First such module will be a stock market data grabber. This module will extend the plugin to get stock market data from various big exchange websites easily (planning to support NSE, BSE and NASDAQ to start off with). The data will be almost realtime (delay ranging between 1 to 10 mins) and will include Open, High, Low, Last Price, Previous Close, Change, Change Percentage and Volume information for all active symbols on these exchanges.

The plugin API will provide a shortcode something like this – [wpws mod="nse" symbol="acc" datatype="last"] should output the latest price for ACC listed at NSE. The aim is to make it an extendable module framework and hence I am taking time to code it well. Apart from this features, I am also planning to improve the core scrapper with functionalities like a regex powered cleanup function to remove all unwanted text strings from the scrap and also a more flexible algorithm to query html tables returned by the scrap.

Bringing Web Scraping to WordPress!

Posted May 24, 2009 at 11:54 am by Akshay

Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites. Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be formatted and displayed or stored and analyzed. Exemplary uses of Web scraping include online price comparison, weather data monitoring, market data tracking, Web content mashup and Web data integration.

Imagine what you can do with all this power in your WordPress blog! Pages and posts can display realtime content from other pages, letting you create a meshup of content. This all is now possible using my WP Web Scraper plugin. Its an easy to implement professional web scrapper for WordPress. This can be used to display real time data from any websites directly into your posts, pages or sidebar. Use this to include real time stock quotes, cricket or soccer scores or any other generic content. The scrapper is built using time tested libraries cURL for scrapping and phpQuery for parsing HTML. Please post all your suggestions and thoughts about this on the WP Web Scraper project page.

Flash Photo Gallery

Posted May 18, 2009 at 2:21 am by Akshay

I have now put up an exclusive page for one of my plugin projects – Flash Photo Gallery. This is a WordPress plugin which creates a Flash Photo Gallery like one provided in Adobe Photoshop CS2 Flash Web Photo Gallery templates. All plugin ‘ToDos’ and release information will be available on this page and any feedback is much appriaciated.