WP Web Scraper

An easy to implement professional web scraper for WordPress. This can be used to display real time data from any websites directly into your posts, pages or sidebar. Use this to include real time stock quotes, cricket or soccer scores or any other generic content. The scraper is built using time tested libraries cURL for scrapping and phpQuery for parsing HTML. Features include:

  1. Can be easily implemented using the button in the post / page editor.
  2. Configurable caching of scraped data. Cache timeout in minutes can be defined in minutes for every scrap.
  3. Custom Useragent header for your scraper can be set for every scrap.
  4. Scrap output can be displayed thru custom template tag, shortcode in page, post and sidebar (text widget).
  5. Other configurable settings like cURL timeout, disabling shortcode etc.
  6. Error handling – Silent fail, standard error, custom error message or display expired cache.
  7. Option to clear or replace a certain regex pattern from the scrap before output.
  8. Built-in module for stock market data (NSE, BSE and NASDAQ supported currently), other markets to follow.

Download and usage

Download a latest copy of this from WordPress plugin repository. Detailed FAQs and usage manual is also hosted there.

Demo

The plugin is already downloaded 34,192 times. The download count displayed  is not static. It is dynamically scraped from this URL with a refresh rate of 60 minutes. The CSS selector used is ‘.last-child td’.

Guaranteed success with help of latest testking 646-230 Q & A by testking including certified testking 1Y0-A09 practice exam testking HP0-Y23 barindumps!


60 Comments

  1. This is a great idea for a plugin for cases where a site does not provide their data as an RSS feed.

    Cheers,
    Stu.

  2. Great Plugin. I have try it. Nice and Great.

  3. Terrific plugin, thanks!

  4. Pretty cool how you use the plugin to show how many times it’s been downloaded on WordPress.

    • The shortcode for that is [wpws url="http://wordpress.org/extend/plugins/wp-web-scrapper/stats/" selector=".last-child td" cache="90" timeout="1"].

      Dont forget to replace the plugin URL by your own.

  5. Hello Akshay,

    I tried to figure out how your plugin should work but did not succeed.
    Could you make or will you make me a string for the right column on our website?
    http://www.internetwerkt.nl in order to use it in wordpress.

    regards
    Jets

  6. [...] WP Web Scraper » WebD :: Web design, developement, wordpress themes, plugins, web technologies [...]

  7. i HAVE A WEBSITE RUNNING ON http://ublog.co.nr and I would like to put on live cricket scores. Please help

  8. Does it keep the other elements apart from text ? or does it filter them out ?
    Like… if I select “all the content from “table XXX”, will it return the formatted table or only the text from the cells ?

    Thank you.

    • in the shortcode, use the param output=”html” if you want the scraper to return the scrap as html. If you want to drop html tags, use output=”text”

  9. Dear Akshay,

    thank you very much for that quick answer !

    Your plugin is really great. Nice job, really !

    Denis.

    • Hi Denis, For such issues, you can use the clear param of the shortcode. For example clear=”/�/” will clear (delete) all such characters in the scrap. This param takes a regex expression to find and replace with a blank string. Hope this helps.

  10. Hallo Akshay,

    Still looking for a solution to get a complete URL in websites that only use href/file. Now I get myURL/file.
    Can I use 2 selectors? f.i. selector1,selector2 like jquery ?

  11. This plugin looks like it could be extremely useful. But it is in need of a help file big time.

  12. This is exactly what I’ve been looking for. Unfortunately I have no idea how to set this up from a coding stand point. I was wondering if I could pay you a fee to set up the selector for me?

    • For creation of a custom selector for your scrap, write to me at akshay[dot]raje[at]gmail[dot]com

  13. Hi,

    I have some problems with url like this

    http://www.website.com/index.php?search_type=7&g_cod_type=1

    the “[" and "]” are not accept.

    regards,

    Alessandro

    • @ Alessandro

      This issue has been sorted in the new release (v. 1.2)

  14. Dear Akshay,

    Can you also please make this kind of plugin for joomla as well??

  15. Hello, I installed your plugin, but I can’t get it to work. Where do I find add scraper button? Please help.
    Thank you

    • WP Web Scraper rests as single button in the WordPress rick text editor. The button looks like a gear and should help you get started with inserting a scrap.

  16. is there a way style the scraped data ?

    • The CSS referencing the scraped data has to reside in your style.css (or any other CSS file / style tag). Its as good as styling any other stuff in your post / page.

  17. You can use the cache parameter of the shortcode for this. For example, cache=”1440″ will used cached content for a period of 1 day instead of a scrap on each request.

  18. Hi … Akshay… I want to thank you for ur generousity, sharing a resource like this is mega cool!

    However, Akshay, i would like to know if this gem” can scrape text from a list of dynamically changing urls, which are provided in RSS format.

    I ask this cos i have a pipe which is aggregated…. and i will like to scrape from each URL – {dynamic changing’

    If not , how can i modify it to do so? thanks

    PLease reply to my @mail – monasor28@gmail.com

  19. Where do I find information on which CSS Selector to use??

    • CSS selector is nothing but the same what you use while styling your HTML documents. For instance using ‘body’ as a selector will scrap the whole page or using ‘body div’ will scrap the first div element within the page. Please write to me at akshay[dot]raje[at]gmail[dot]com for assistance on CSS selectors for scraping.

  20. [...] so the first thing that you will need to do is download the WP Web Scraper plugin.  The plugin comes with a widget feature that we will edit to make our plugin.  In essence, the [...]

  21. [...] WP Web Scraper is “an easy to implement web scraper for WordPress. Display realtime data from any websites directly into your posts, pages or sidebar.” I’ve not had cause to use it yet, but could be very interesting. [...]

  22. Is anyone else having trouble with the basehref parameter? I’ve followed the directions and it’s still not working.

  23. I figured out my problem. The site I was scrapping didn’t use a “/” for its relative links. Therefore, the basehref couldn’t replace that “/” with a new absolute path. As a fix for when this is the case, I created a new parameter called basehref2 that replaces nothing but simply appends an absolute path. Seems to work well. Thanks.

  24. Fantastic Support.

    Thankyou for your help with my project.

    Akshay took the time to help me to implement this plugin, I would recommend this plugin to anyone, and Akshay is very helpful.

    Andy

  25. I have gotten the plugin to bring me the text from the page but am not shore how to get it to display with css.

    • @ Ryan

      Once you get the data on your page, it has to be styled just like any other element on your page. You can add the css code either in the style.css or any other css file you have linked in your template.

  26. [...] I hope this gets you interested enough to start exploring its limit less possibilities. Download your copy, install it and get started. You can find more info on this plugin in the FAQ section or its official page. [...]

  27. When I try to Activate the plugin, I get the following:

    Plugin could not be activated because it triggered a fatal error.

    Parse error: parse error, unexpected T_OBJECT_OPERATOR in /blog/wp-content/plugins/wp-web-scrapper/wpws_index.php on line 173

    I’m using WP v. 2.8.6

    Help!

    • Some users using PHP 4.x have reported this issue.

  28. Hey I can fix this for you Justyn

    Just open up youR .htaccess file and add

    SetEnv DEFAULT_PHP_VERSION 5

    This changes the version to 5

    It now works for me! – GOOD LUCK

  29. Hey Akshay

    Wonder if you could help me?

    I have it working within posts and pages, but I want to bring in a header from a sister site into the WordPress template itself.

    The example code you gave is:

    The code that works in the post is:

    [wpws url="http://www.domain.com" selector="#header" output="html"]

    So how do I write the PHP code in the header.php file? Everything I try shows a blank page…

    I’m sure I’m not the only non coder who may run into this. Looks like a great plugin, all it needs is an example for adding code to the template as well.

    Your help is appreciated – thanks

  30. For use within the theme you need to use it as a php function… something like this…

    < ?php echo wpws_get_content($url, $postargs, $selector, $clear, $replace, $replace_text, $basehref, $output, $cache, $agent, $timeout, $error)?>

    • can i apply css into scripted document

      • Yes, CSS can be applied to the scraped HTML. You will have to include the CSS classes in the style.css file of your theme or directly embed it in your theme’s header.php

  31. Hey Akshay,

    The feedback on your plugin is really good but unfortunately I can’t get it to run with WP 2.8.6

    I was wondering if you have any plans to release a version compatible with WordPress 2.8.6 ??

    Many thanks,
    Andrew.

    • The plugin works just fine on WP 2.8.6. In fact, webdlabs.com also currently runs on 2.8.6 and this page has a live sample of the scraper (displying the number of downloads)

  32. Dear Akshay, how can i apply different basehref for different link dynamically?

  33. [...] or some such software, very good and descriptive software for collection your reader stats.WP Web ScrapperWhile this plugin can easily be used for nefarious needs (getting content from other sites, [...]

  34. Hello.

    I recently upgraded to version 2 and the app has stopped working on my site. are you able to help?

    [wpws url="http://www.poultonprimaryleague.org.uk/leagues.asp" selector="#table2:eq(2)" urldecode="0" cache="30" error="1" timeout="5" output="html" htmldecode="windows-1252"]

    thanks

  35. hi all
    i have scraped a site which shows some unknown character such as �. and when i update the value into mysql database it does not updated. please please help me.

  36. Hello,

    First of all, thanks for your great plugin.

    It worked perfectly except for formatted text contained in tags where newlines were dropped.

    Here is what i did to avoid that:
    in wpws-includes/functions.php, line 247 I replaced

    file_put_contents($cache_file, serialize( str_replace(array(“\r”,”\n”), ”, $response) ) );

    with

    file_put_contents($cache_file, serialize( $response ) );

    Is there a reason why newlines were dropped?

    Regards

  37. Magnificent Plug-in Akshay!
    Only thing that I am missing is an option to only scrape once, then have the scraped content inserted into the post where the short tag was placed (just like it is now, but actually saved only once) and then removing the short code(for a single scrapping-to-post), or disabling it for a chosen amount of time. That would open up a whole lot of possibilities, the scraped content would become search-able and could be used by automatic tagging etc.
    Perhaps you are not able to do that, but that would have made the plug-in manifold more useful.

    Keep up the good work!

  38. Love this plugin. Thanks

  39. Pretty cool how you use the plugin to show how many times it’s been downloaded on WordPress.

  40. hey very nice plugin.

    exactly suits my requirements

    thanks

  41. Love this plugin. Thanks

  42. [...] me very excited when I first heard about. It’s called WP Web Scraper. It’s created by Web Labs and available free in the WordPress Plugin Repository. WP Web Scraper works a little like an [...]

  43. Love this plugin, genius! I am having one problem figuring out how to remove words. Is there an option to remove a word or string of words? Thanks!

  44. [...] I hope this gets you interested enough to start exploring its limit less possibilities. Download your copy, install it and get started. You can find more info on this plugin in the FAQ section or its official page. [...]

  45. If you have this problem go to your theme: functions.php and add this line:
    add_filter(‘widget_text’, ‘do_shortcode’);

    And now the shortcode will work in the text widget.

Leave a Comment