Friday, September 08, 2006

HTTP Partial Get

Part of our application allows the user to drag/drop document links onto a window. When the link is a web page shortcut, I wanted to get the page title as the description for that link. The only way I could find to do this was to download the page using an HttpWebRequest object, and parse for the <title> tag.

This is OK except that I didn't want to download an entire page just to get a piece of information that, if present, will be somewhere near the top. Enter the partial get capability of HTTP 1.1. If you add a "range" header to the GET request, the server should return only the specified byte range from the document. Here is the code I used.

 

   PrivateFunction GetWebPageTitle(ByVal url As String) As String

 

       Dim request As HttpWebRequest

       Dim response As WebResponse

       Const bytesToGet As Integer = 1000

 

        request = DirectCast(WebRequest.Create(url), HttpWebRequest)

        request.AddRange(0, bytesToGet - 1)

        response = request.GetResponse

 

        ...

 

   EndFunction

 

However, the operative word here is should (i.e. it's not mandatory behaviour). In half-a-dozen web sites I tried (including microsoft.com), only one took any notice of my range request:

Some things to note:

  • The response code is 206 (Partial Content)
  • Content-Length = 1000
  • Content-Range = 0-999 (the first 1000 bytes) of a total of 126,444
  • Accept-Ranges: Bytes tells us the server recognizes the Range header in the GET request (which we now know, but we could have issued a HEAD request to find this out)
  • The well-behaved web site is w3.org (of course!)

(The screenshot is from Wireshark, formerly Ethereal.)

1 comment:

Leslie Lim said...
This comment has been removed by a blog administrator.