Posted May 27, 2009 at 12:20 pm by Akshay
My latest WordPress plugin for web scraping – WP Web Scrapper was a grand launch. It recorded more than 200 downloads in the first two days itself! Thanks for all the appreciation and comments. This post is mainly to list down my plan to extend WP Web Scrapper into a standard scraping framework. Apart from being a flexible framework, I also plan to introduce some pre-built modules to make specific and highly desired scraping tasks easy. First such module will be a stock market data grabber. This module will extend the plugin to get stock market data from various big exchange websites easily (planning to support NSE, BSE and NASDAQ to start off with). The data will be almost realtime (delay ranging between 1 to 10 mins) and will include Open, High, Low, Last Price, Previous Close, Change, Change Percentage and Volume information for all active symbols on these exchanges.
The plugin API will provide a shortcode something like this – [wpws mod="nse" symbol="acc" datatype="last"] should output the latest price for ACC listed at NSE. The aim is to make it an extendable module framework and hence I am taking time to code it well. Apart from this features, I am also planning to improve the core scrapper with functionalities like a regex powered cleanup function to remove all unwanted text strings from the scrap and also a more flexible algorithm to query html tables returned by the scrap.
Posted May 24, 2009 at 11:54 am by Akshay
Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites. Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be formatted and displayed or stored and analyzed. Exemplary uses of Web scraping include online price comparison, weather data monitoring, market data tracking, Web content mashup and Web data integration.
Imagine what you can do with all this power in your WordPress blog! Pages and posts can display realtime content from other pages, letting you create a meshup of content. This all is now possible using my WP Web Scraper plugin. Its an easy to implement professional web scrapper for WordPress. This can be used to display real time data from any websites directly into your posts, pages or sidebar. Use this to include real time stock quotes, cricket or soccer scores or any other generic content. The scrapper is built using time tested libraries cURL for scrapping and phpQuery for parsing HTML. Please post all your suggestions and thoughts about this on the WP Web Scraper project page.
Posted May 19, 2009 at 6:49 pm by Akshay
This is in continuation to my concept note on BSE and NSE syndication feeds. Here are some more sketchy details about the planned service.
Data source: Currently I am considering to scrap the stock exchange website pages using cURL and then parsing the DOM structure of the raw source. Once the service grows big enough, a subscription to an official data feed can be considered.
Output formats: I realized that RSS was a bad idea after all, however I discovered a very evolved XML specification for market data representation called Market Data Definition Language (MDDL). Apart from MDDL the other output formats could be a simpler XML (home grown format); good old csv (light, easy and time tested) and JSON (for direct use in client side apps). As JSONP is natively cross domain, it can be easily integrated without any server side technology. This means that users of this service can start their own ‘Get Quote’ apps on their website irrespective of the server or the host they use… just plain javascript code.
Lets get down to an example here. This is the current get quote output from NSE (exchange) for ACC (stock). An example output in MDDL would look like this sample mddl-xml. This as such is the plan and I am eagerly awaiting some feedback on this one before I start off with the actual code.
Posted May 18, 2009 at 2:21 am by Akshay
I have now put up an exclusive page for one of my plugin projects – Flash Photo Gallery. This is a WordPress plugin which creates a Flash Photo Gallery like one provided in Adobe Photoshop CS2 Flash Web Photo Gallery templates. All plugin ‘ToDos’ and release information will be available on this page and any feedback is much appriaciated.
Posted May 15, 2009 at 12:57 am by Akshay
Consider this. Our stock market sites gear themselves as Web 2.0 services and start publishing data in the form of RSS, XML or JSON feeds. Imagine receiving corporate announcements in your RSS aggregator or even better… track your portfolio thru your aggregator. Further, picture what the developer community can do using this. Stock feeds can be parsed using widget frameworks like Google Gadgets, OpenSocial or plugin architectures to be dispayed on sites, blogs or social networking profile pages. Basically the possibilities are limitless.
But the reality is that our stock markets won’t do it. Not at least so soon. To fill in this gap I am planning to start a service which will parse BSE or NSE pages and convert them into RSS, XML or JSON feeds. It will be a simple HTTP service which will answer your request in the desired format. I will start work as soon I find a some time off my current projects. Till then, give it a thought and let me know what do you think of this.
Posted May 12, 2009 at 1:22 pm by Akshay
So. I’m officially here. In the blogosphere. Finally.
I struggled with what would be my first official post. It should be prolific after all this time, right? Grand in nature. Broad in scope. So, I struggled. And then I realized it’s just like walking into a room full of people you’ve never met – I’ll introduce myself to you. It seems like the most polite thing to do, right?
First an introduction for the unfamiliar. Web design and development (Web.D) is what I have been doing for a while now. Am sure most of you are not aware of this layer of me, but what the heck… some of you might not even be knowing me. So lets start from the scratch. I am Akshay Raje, a self thought freelance web designer and developer. I love travel and am a great movie buff too. Apart from the above, I try to spend the rest available time with my wife and also try to squeeze out of whatever time’s left for the Gym. Read further »