Gecko/Webkit Screenshots

Posted by tobi — 12:07 PM May 17

For our Shopify Product Search we needed a good way to Screenshot web pages. There are some services on the web for this but we ended up building it but none of them fit our needs. They were either way to expensive, they didn’t produce nearly the quality we needed or they didn’t offer an API at all.

Our solution was to install a headless X server in our server farm which runs firefox 2.0. We used a python GTK automation script which navigates the Firefox instance to the page and then dumps the framebuffer into a png file when done. This works well enough but i’d like something more robust for a different project.

Ideally I’d like someone to build a screenshot tool based on Gecko or Webkit which can simply take an url and spit out an png. A dependency on an running X server is acceptable but I’d rather not have it running all the time because it complicates deployment a lot. It has to run on Linux and must not depend on a shared global resource, i.e. you should be able to take two screenshots at the same time.

If you know of a tool like the one I describe or if you think you could build something like this for me please contact me. This may be paid open source work.

Comments

  • Jordan Elver 17 May 13:30

    Not sure if this is exactly what you need, but it sounds like it could be adapted.

    http://trac.browsershots.org/wiki

  • Scott M. 17 May 14:02

    Maybe this? http://www.paulhammond.org/webkit2png/

  • Tristan Dunn 17 May 14:04

    http://www.paulhammond.org/webkit2png/

    A linux version is mentioned and linked as well.

  • court3 17 May 14:44

    We use webkit2png on a headless mac mini. The mini just pings our web service and grabs all the items to snapshot off a queue, then POSTs the png back. Works quite well.

    We’d use linux, but unfortunately the linux version has terrible-looking fonts.

  • Aydin Mirzaee 17 May 16:31

    not sure if they have an API, but… for my blogging I use: http://www.kwout.com

  • tobi 17 May 19:31

    Tristan: The linux version of the webkit2png script you linked is the basis for our manual solution i describe above. webkit2png is very good but only works on macs.

    For the project i have in mind it really has to run on linux and be parallizeable so both solutions are sadly out.

  • Casey 18 May 00:03

    I ran into the same issues (Girafa too expensive, Alexa not good enough) and ended up using Browsershots’ code – which is also headless X, Firefox, and Python for automation.

    I’d like something simpler as well, although I am fairly happy with Browsershots. It takes care of managing X and xvnc and I just drop little text files into a queue directory.

  • Thomas aylott 18 May 23:57

    please do let us know what you come up with. John butler came up with a great solution for crazyegg.com, but I’d like to see if there’s a better way.

  • Paul Goscicki 19 May 05:55

    I’m using Screengrab (http://www.screengrab.org/) for taking web screenshots. It’s a Firefox extension and as such is not automated, but I believe making it more automatic shouldn’t be very hard.

  • Johan Sørensen 19 May 07:30

    I wrote Paparazzi (http://www.derailer.org/paparazzi/) back in the day (now maintained by someone else). It’d be a walk in in the park running it headless, and it’s still opensource afaik.

    I know you said it has to run on linux, but I think you’re setting yourself up for a lot of wasted hours and headaches, compared to a macmini or vmware instance running somewhere.

  • Dave Hoover 19 May 07:38

    I use http://webthumb.bluga.net/home on one of my projects (client pays for credits) and it’s worked great.

  • tobias Lütke 19 May 12:13

    Good job on writing Paparazzi. We still use it almost daily for preparing the high quality screenshots for our http://blog.shopify.com/shop-of-the-moment showcase.

    The problem is that it really needs to be on linux. The application has to run at Amazon EC2. The latest Gecko releases now use Cairo as a rendering back-end. This should make it fairly straightforward to render to png.

    I’m also thinking about writing an adobe Air app with a telnet server build in which can take screenshots and return them over the wire.

  • Christian 20 May 04:02

    Why not just use the real firefox and put it in full screen mode? Use a plugin to eliminate menus etc. Save this config in prefs.js.

    The procedure for doing the screenshots would be as following:
    • Start a virtual X server (Xvfb) with a large resolution.
    • Start firefox with the URL in full screen mode.
    • Wait for some seconds to load the page.
    • Then do a screenshot with “xwd -root |convert -trim” to eliminate unecessary borders

    Of course, this needs an X server. But I do not know of any firefox version for Linux which runs without…

  • Hans de Graaff 20 May 08:22

    I’ve written this as a ~200 line ruby script (which includes starting Xvfb and collecting and managing information on which screenshots to take).

    Using the GTK::MozEmbed bindings it really isn’t that hard, and the bindings even allow you to grab the pixbuf of the rendered page directly, when it is done rendering. It’s currently powering our http://watvindenwijover.nl/ service.

  • Chris 23 May 07:31

    hi tobias

    take a look at http://www.thumbalizr.com it is not free, but perhaps it fits your needs.

    cheers, chris

Commenting are now closed…