Samstag, 19. Februar 2011

Headless HTML page rendering with phantomjs

What is phantomjs?

phantomjs is a headless browser which can render HTML pages into images. It uses a Webkit rendering engine. That's cool because the generated page images aren't missing any dynamically loaded javascript stuff. Event flash movies are shown as expected.

Building phantomjs

The phantomjs homepage contains some very useful hints about getting and building phantomjs. Read the comments at the end of the build instructions page if something doesn't work out for you. There are many useful hints.

Rendering my first page

phantomjs is controlled using javascript commands. You can launch your phantomjs javascript files from the commandline:

$ phantomjs myScript.js

phantomjs calls your script every time after a page is loaded. To react in different ways on different pages you have to track your current "state". You can persist your script's state in the var phantom.state. phantomjs restores the value of phantom.state even after another page is loaded. Other javascript variables disappear after loading a new page. Initially the state is empty.

To load a page call phantom.open(url). phantomjs will load the page below the specified URL and execute your script after the page's DOM is loaded.

The (in my opinion) coolest function is phantom.render(path). phantomjs will save a "screenshot" of your current page to the specified path on your hard drive. The path's file extension defines the image file format. Your viewport's size can be defined by setting phantom.viewportSize.

After you're done with your phantom things you have to call phantom.exit(returnCode) to quit. The returnCode is passed as status to the parent process.

Now putting all that functions together in a script gives us the following:

if(phantom.state.length === 0){
  phantom.state = '0_home';
  phantom.open('http://www.mini.de');
}
else if(phantom.state === '0_home'){
  phantom.viewportSize = {width: 800, height: 600};
  phantom.sleep(2000);
  phantom.render('home.png');
  phantom.exit(0);
}

The result after executing the script is a screenshot of www.mini.de in a file named 'home.png'. The image will show the main stage with a flash movie and a footer with three dynamically loaded HTML fragments. You can see the result below:

Clicking links

"Clicking links" like a humanoid user is simulated by firing mouse click events. The following listing defines the clickElement(id) function which can be used to "click" elements. This allows us to execute simple page flows.

function clickElement(id){
  var a = document.getElementById(id);

  var e = document.createEvent('MouseEvents');
  e.initMouseEvent('click', true, true, window, 0, 0, 0, 0, 0, false, false, false, false, 0, null);

  a.dispatchEvent(e);
}

if(phantom.state.length !== 0){
  // save screenshot for every page / state
  phantom.viewportSize = {width: 800, height: 600};
  phantom.sleep(2000);
  phantom.render('screen_' + phantom.state + '.png');
}

if(phantom.state.length === 0){
  phantom.state = '0_home';
  phantom.open('http://www.mini.de');
}
else if(phantom.state === '0_home'){
  phantom.state = '1_config';

  clickElement('quicklink_id1');
}
else if(phantom.state === '1_config'){
  phantom.exit();
}

The script will load the URL http://www.mini.de which is the home page. After loading the home page the element with the ID 'quicklink_id1' will be target of a click event. The element with the ID 'quicklink_id1' should be the 'MINI KONFIGURATOR' link in the footer.

Asserting stuff

Within phantomjs scripts you can access you page's DOM, the global javascript variables and global javascript functions. This enables us to do some kind of unit testing.

I'm extending the listing once more:

function clickElement(id){ ... }

function fail(msg){
 console.log(msg);

 phantom.exit(1);
}

function assert(condition, msg){
 if(condition){
  return;
 }

 fail(msg);
}

if(phantom.state.length !== 0){
 phantom.viewportSize = {width: 800, height: 600};
 phantom.sleep(2000);
 phantom.render('screen_' + phantom.state + '.png');
}

if(phantom.state.length === 0){
 phantom.state = '0_home';
 phantom.open('http://www.mini.de');
}
else if(phantom.state === '0_home'){
 phantom.state = '1_config';

 clickElement('quicklink_id1');
}
else if(phantom.state === '1_config'){
 assert(document.getElementById('cake'), 'I am missing the cake!');

 phantom.exit();
}

Check this out: phantomjs returns a status code of 1 when the assertion fails. This can be used to do some custom post processing in the shell:

$ phantomjs myScript.js && echo 'Test successful' || echo 'Test failed'

Further information

For further information see the following pages. They are sorted by what I think is important.

Kommentare:

  1. Have you a Usecase for PhantomJs? I prefer Selenium for similar Usecaes and Selenium can test FF and IE too.

    AntwortenLöschen
  2. I only know selenium from the java world. In my opinion selenium itself is a very mature solution. It got it's own FF IDE and hudson server continous integration support. Many "real life" problems have already been solved in selenium.
    I think phantomjs can be the base technology for a new browser UI testing approach.

    What I like about phantomjs is that you can automagically generate "screenshots" of your page. I think this can be very handy when you have a composite page which is based on many smaller components. When I find some time I will experiment with making screenshots of these page components and comparing them with former screenshots. I hope this will allow me to refactor my CSS and see the side effects in all the affected components.

    Also phantomjs scripts are written in pure javascript. There's no java or php on the server and javascript on the client. Therefore I think phantomjs is especially in a nodejs environment appealing.

    My last pro of phantomjs is that you can run it without UI. At least almost (see http://code.google.com/p/phantomjs/wiki/QuickStart) :-)
    This makes it perfect for server based continuous integration solutions.

    AntwortenLöschen
  3. I have a use case for it - i made a content-editor for a kind of cms and there i want a fancy feature, where a user can see what he did as a little preview on the overview pages of his articles ;-) so, phantomjs is the perfect way to automatically/programmatically catch a screenshot of the stuff the user dit in the editor and make a nice preview-thumbnail of it in the main-website

    AntwortenLöschen
  4. I wrote a tutorial on how to take screenshot using phantomjs here http://devlup.com/programming/php/capture-webpage-screenshot-using-php-and-phantomjs/4484/

    AntwortenLöschen
  5. Thanks, I was going really "nut" about clicking a link,

    So, phantomJS.org JS objects in the page context doesn't have any of the mouse related events, am I right?

    All of the other answers I've found around the globe are based on extenal libraries.

    AntwortenLöschen
  6. hi,
    I want to generate htm pages depending on page event using phantomjs .
    Can any one help me how can i do this.

    AntwortenLöschen