2010/10/15

Using BaseX to grep through Charles output

BaseX is a fantastic tool to grep through large XML files by creating indices for text, attributes and path summaries. I use this to analyze data generated by Charles, an HTTP proxy / HTTP monitor / Reverse Proxy. Both tools are written in Java, so you should have no problems running them. While the latter is not OSS, it comes at a reasonable price and may be used in a trial version for 30 days after which you get a nagging dialog.

Charles may be used as a simple tool to run stress tests. Just choose it as a proxy, run your usual usecases and export the data to xml. Two of the power features Charles offers are Man in the middle for SSL connections by importing the Charles Root CA certificate and modifying your requests on the fly to use test systems of new software instead of the live ones. Afterwards you may check the output by grepping your expected results using BaseX using XQuery or XPath. An example:

  • Start Firefox creating a new profile /Applications/Firefox.app/Contents/MacOS/firefox-bin -profileManager called Charles.
  • Download and install Charles' Firefox extension by visiting the download site and restart Firefox after installation.
  • In the Tools menu of Firefox Charles offers to install the CA certificate.
  • Make sure you have Charles running and choose to proxy Firefox in it's Proxy menu.
  • Enable Charles in the Tools menu of Firefox, now you should see requests coming through Charles.
  • Search for hgkit in Google.
  • Drill down in Charles tree view and find the http://www.google.de/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=de&source=hp&q=hgkit&meta=&btnG=Google-Suche request.
  • From the context menu of this request choose Repeat advanced and enter 100 iterations with 5 concurrent requests making sure to use a new session.
  • Export the new session in Charles as google-hgkit.xml.
  • Now start BaseX and create a new database referencing google-hgkit.xml. If you encounter an error Invalid byte 1 of 1-byte UTF-8 sequence make sure to use the built in parser in the Parsing tab. Make sure you chose Options/Realtime execution.
  • Analyze your data:
    • A search for /charles-session/transaction should result in 100 hits.
    • A search for /charles-session/transaction/response[@status="200"] should result in 100 hits.
    • As the html returned by the search is escaped in the body, you need to use XML escaping in your search through the body.
    • A search for /charles-session/transaction/response[@status="200"]/body[contains(text(), "this_surely_will_not_show_up_will_it_dsddada")] should result in 0 hits.
    • A search for /charles-session/transaction/response[@status="200"]/body[contains(text(), "<a href="http://hgkit.berlios.de/")] should result in 100 hits.
    • The Xquery for $y in (for $x in /charles-session/transaction where $x/response/@status="200" return ( $x/@endTimeMillis - $x/@startTimeMillis)) order by $y descending return $y will return the times for successful requests in milliseconds in descending order.
    • Return all requests taking more than 300 milliseconds: for $y in (for $x in /charles-session/transaction where $x/response/@status="200" return ( $x/@endTimeMillis - $x/@startTimeMillis)) where $y > 300 order by $y descending return $y.
    • Return the count for the above requests:
      let $times := (for $x in /charles-session/transaction 
        where $x/response/@status="200" 
        return ($x/@endTimeMillis - $x/@startTimeMillis))
      let $slowQueries := for $y in ($times) where $y > 300 return $y
      return count($slowQueries)
      

You could try to trigger a second search with a different searchterm and analyze that the search results are not mixed up by querying, e.g.
/charles-session/transaction[contains(@query, "q=hgkit")]/response[@status="200"]/body[contains(text(), "<a href="http://hgkit.berlios.de/")]. You may select more than one request in Charles for repetition. For further instructions on XPath I recommend w3school's tutorial.