Part of our application allows the user to drag/drop document links onto a window. When the link is a web page shortcut, I wanted to get the page title as the description for that link. The only way I could find to do this was to download the page using an HttpWebRequest object, and parse for the <title> tag.
This is OK except that I didn't want to download an entire page just to get a piece of information that, if present, will be somewhere near the top. Enter the partial get capability of HTTP 1.1. If you add a "range" header to the GET request, the server should return only the specified byte range from the document. Here is the code I used.
PrivateFunction GetWebPageTitle(ByVal url As String) As String
Dim request As HttpWebRequest
Dim response As WebResponse
Const bytesToGet As Integer = 1000
request = DirectCast(WebRequest.Create(url), HttpWebRequest)
request.AddRange(0, bytesToGet - 1)
response = request.GetResponse
...
EndFunction
However, the operative word here is should (i.e. it's not mandatory behaviour). In half-a-dozen web sites I tried (including microsoft.com), only one took any notice of my range request:
Some things to note:
- The response code is 206 (Partial Content)
- Content-Length = 1000
- Content-Range = 0-999 (the first 1000 bytes) of a total of 126,444
- Accept-Ranges: Bytes tells us the server recognizes the Range header in the GET request (which we now know, but we could have issued a HEAD request to find this out)
- The well-behaved web site is w3.org (of course!)
(The screenshot is from Wireshark, formerly Ethereal.)
1 comment:
Post a Comment