Posted August 11, 2009 at 3:07 pm by Akshay
Scraping HTML tables is easy, but parsing them has always been tricky. That’s exactly what my next release of WP Web Scraper will let you do. This feature will have methods to query HTML tables within your scrap. For instance, the scraper will let you filter by value of a specific table column and also restrict the number of rows using a ‘from’ and ‘to’ index key.
Further, it will also let you delete a certain column from the output and also apply specific CSS classes to even and odd rows. This feature is specifically designed for users intending to scrap and filter or parse data extracted from HTML tables. This feature will be implemented as a module within WP Web Scraper.
Posted June 8, 2009 at 6:05 pm by Akshay
This is probably a major milestone in the lifecycle of WP Web Scraper WordPress plugin. Technically speaking, the plugin gets in own ‘module architecture’ to incorporate unlimited extensions without touching the core codebase. Speaking non-technically, this opens WP Web Scraper to a non-techie WordPress user. To startoff, this mod extends the plugin with a specific shortcode to get stock market data from NSE and NASDAQ (to start off with, more exchanges soon to come). The data is scraped with a cache interval of a minute (which can be further increased as per your requirement) and includes data types such as Open, High, Low, Last Price, Previous Close, Change, Change Percentage and Volume information for all active symbols on these exchanges.
The plugin API will provides a simple shortcode. For example – [wpws_market_data market="nse" symbol="acc" datatype="last"] or [wpws_market_data market="nasdaq" symbol="csco" datatype="open"]. NSE data is currently scraped from nseindia.com and NASDAQ data is scraped from reuters.com. The immediate plan is to implement all major stock markets in this API. Later, I plan to extend this modular architecture to other categories of scraps such as Weather, Sports scores etc too.
Posted May 27, 2009 at 12:20 pm by Akshay
My latest WordPress plugin for web scraping – WP Web Scrapper was a grand launch. It recorded more than 200 downloads in the first two days itself! Thanks for all the appreciation and comments. This post is mainly to list down my plan to extend WP Web Scrapper into a standard scraping framework. Apart from being a flexible framework, I also plan to introduce some pre-built modules to make specific and highly desired scraping tasks easy. First such module will be a stock market data grabber. This module will extend the plugin to get stock market data from various big exchange websites easily (planning to support NSE, BSE and NASDAQ to start off with). The data will be almost realtime (delay ranging between 1 to 10 mins) and will include Open, High, Low, Last Price, Previous Close, Change, Change Percentage and Volume information for all active symbols on these exchanges.
The plugin API will provide a shortcode something like this – [wpws mod="nse" symbol="acc" datatype="last"] should output the latest price for ACC listed at NSE. The aim is to make it an extendable module framework and hence I am taking time to code it well. Apart from this features, I am also planning to improve the core scrapper with functionalities like a regex powered cleanup function to remove all unwanted text strings from the scrap and also a more flexible algorithm to query html tables returned by the scrap.
Posted May 24, 2009 at 11:54 am by Akshay
Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites. Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be formatted and displayed or stored and analyzed. Exemplary uses of Web scraping include online price comparison, weather data monitoring, market data tracking, Web content mashup and Web data integration.
Imagine what you can do with all this power in your WordPress blog! Pages and posts can display realtime content from other pages, letting you create a meshup of content. This all is now possible using my WP Web Scraper plugin. Its an easy to implement professional web scrapper for WordPress. This can be used to display real time data from any websites directly into your posts, pages or sidebar. Use this to include real time stock quotes, cricket or soccer scores or any other generic content. The scrapper is built using time tested libraries cURL for scrapping and phpQuery for parsing HTML. Please post all your suggestions and thoughts about this on the WP Web Scraper project page.