Broccoli Products home | contact us | profile


BBC Radio Downloader Project in C#
(by Lou Brown, last updated 28-DEC-2009)

The Brief
This is a guide for C# developers who want to create an iRecorder - an application that downloads radio programmes from the BBC iPlayer website. I will start by describing the technical hurdles and how they can be leaped, and finish with a description of the application flow.

The code for this project can be found in "Program.cs", a single C# file that compiles as a console application. The function names and code shown on this page are taken from this file. I have compiled and run this file on XP with VS2005.

An understanding of HTTP protocols, socket communications and XML is required. However these functions are neatly encapsulated in the .NET libraries, and I have commented the code to make it clear what is going on.

Throughout this guide, the word "programme" refers to a radio programme, such as "Woman's Hour". I will use the term "application" to refer to the compiled executable.

What is our brief? For a programme specified by the user, connect to the BBC's iPlayer website, download the streamed programme, and store it in a file (the "target file").

The RTMP Client Way
The BBC stream their programmes for a variety of machines - PCs and Apple Macs, mobile phones and iPods. Although iPlayer makes MP3 files available for download to mobile phones, the primary streaming protocol for PCs and Macs is RTMP (Real Time Messaging Protocol). RTMP is a streaming protocol that serves audio and video data from a media server to an RTMP client. The most widely used RTMP client is Adobe's Flash Player. If you watch tv on the Internet, or YouTube videos, the chances are you are using Flash Player and RTMP or one of it's variations.

Although RTMP is the dominant streaming protocol, it is owned by Adobe, and the official RTMP specification is the sort of specification that only makes sense to someone with a thorough understanding of RTMP. If you want to write an RTMP client class, I suggest you make sure you have a lot of time, a GSOH, and are prepared to scour the Internet for hints and clues as to the mysteries of the RTMP message types. Alternatively, you could use our RTMP Client class for .NET developers.

Another drawback with RTMP streaming is that the audio is often delivered as packets of AAC (Advanced Audio Coding). AAC is an audio compression format, once named as the successor to MP3. When AAC is streamed, the packets sent are the frames of data found in the middle of an AAC file. So to make an MP3 file out of the AAC packets, one must combine all of the AAC packets with a valid AAC file header and footer into an AAC file, and then convert the AAC file to an MP3. It would be a lot easier if we could just download the file as a complete MP3 file. As I've mentioned above, the iPlayer only makes MP3 available for mobile phones, so either we learn about the internals of an AAC file, or we learn how to pretend to be a mobile phone.

Pretending to be a Mobile Phone (HTTP Requests and Cookies)
I'll start by briefly explaining how HTTP and Cookies work. (Most of you will know, so you can skip this bit.)

When you open a website with your browser (your browser might be FireFox or Internet Explorer), your browser sends a request to the server that hosts the website. The first request is usually for the content of the homepage. Once the homepage content is received, the browser requests any other data, such as images, that are contained in the website. The requests and responses that traverse the Internet connection are in HTTP format - a simple textual format.

This type of connection, between a browser and the web server, is known as a "stateless session" - the website server cannot tell the difference between your browser's requests and anyone else's. An analogy would be if a man called Tony sat behind a screen all day with a loud-hailer, and on the other side of the screen there are loads of people shouting out things like "What time is it?" and "What colour are your shoes?". Tony shouts out the answer to each question, but he never knows who's asking. If Mary, for example, shouts "How much money is in my bank account?" Tony cannot answer because he doesn't know who's asking. Here's Mary's question in HTTP format:

GET /whats_in_my_bank_account.html HTTP/1.1\r\n
Accept-Language: en\r\n
Connection: keep-alive\r\n
Accept: */*\r\n
User-Agent: Microsoft Internet Explorer\r\n
Host: www.man-with-loud-hailer.com\r\n
Pragma: no-cache\r\n
\r\n

The "\r\n" signifies newline characters. The top line of the request (the "GET...") is Mary's question. There is nothing in this request that identifies the requester as Mary. But if a small piece of data identifying Mary is attached to the HTTP request, Tony would know who the request was from. The bits of data that identify a user to a website are called Cookies. Here is the HTTP request with a cookie attached.

GET /whats_in_my_bank_account.html HTTP/1.1\r\n
Accept-Language: en\r\n
Connection: keep-alive\r\n
Cookie: MY-NAME=Mary\r\n
Accept: */*\r\n
User-Agent: Microsoft Internet Explorer\r\n
Host: www.man-with-loud-hailer.com\r\n
Pragma: no-cache\r\n
\r\n

The cookie has made this a "statefull session". It has given the session "statefullness".

Where does this cookie come from? It comes from Tony. When Mary makes her first request to Tony, the equivalent to opening Tony's home page, Tony's response would have included the cookie. When you open the homepage of a website, this is when the cookie value is defined. Cookies are often talked about as if they were little files of personal details that attach themselves to outgoing web traffic, but this is not realistic.

The HTTP request-respose has been encapsulated in the Microsoft .NET libraries as classes HttpWebRequest and HttpWebResponse. But we wont be using them. Why? Because if you look at the HTTP in the above query, there is a section "User-Agent" which tells the website which type of browser we are using. In this case "Microsoft Internet Explorer". If we want to pretend to be a mobile phone, this "User-Agent" kind-a gives the game away. We wouldn't have Microsoft Explorer installed on a mobile phone. A mobile phone would have a User-Agent that looked more like this:

Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en)

And since we're looking at large chunks of gobbledygook, you should know cookies don't look like this "MY-NAME=Mary". They look like this:

GROUP-UFZZN=946b41c67329a2c53979217a711bb0de6e756e9710a021f
4444f30029cc8386c0Mozilla%2f5%2e0%20%28iPhone%3b%20U%3b%20C
PU%20like%20Mac%20OS%20X%3b%20en%29%20AppleWebKit%2f420%2b%
20%28KHTML%2c%20like%20Gecko%29%20Version%2f3%2e0%20Mobile%
2f1A543a%20Safari%2f419%2e3

Attractive!

We can retrieve a cookie from the iPlayer website by sending a request containing the User-Agent of a mobile phone's browser. For all subsequent requests to the iPlayer website we will be rolling our own HTTP, which will include the cookie, and the User-Agent for a mobile phone or mobile phone browser.

Where Is The Stream Of Audio Data?
We start with a programme id - a PID. These PIDs can be found on the BBC iPlayer website. If you roll your mouse over a programme link you will see a code in the form "b00XXXXX", an eight character lower case alphanumeric code, prefixed with b00. This is the programme code.

In BBC-speak, a programme cannot be downloaded, only an episode can be downloaded. So we need the location of the stream of data for the episode of our programme. This can be obtained from the page:

http://www.bbc.co.uk//mobile/iplayer/episode/<PID>

PID is the programme id, and the page returned is an HTML file which includes two pieces of information - the episode stream location URL, and a cookie for our session with the iPlayer website.

(Note: if we try and open this page in a browser on a PC we wont get anything useful back because we are not using a mobile phone, and remember that all of our requests are sent whilst pretending to be a mobile phone.)

A typical episode stream location looks something like this:

http://download.iplayer.bbc.co.uk/iplayer_streaming_http_mp4
/5407399044672966201.mp4?token=iVXazJp%2BTtMualAm Hk4kMqdjtS%2F
1VaSft%2FmuLGJh8l2kCjSxqtss9K6%2Fy%2BtGoRq2f5NrcvGkSRuU%0Aos76
%2BxAG5YQUF2sbc1i%2Br4 sp3brWmLiZuOqjj4HusbLOUgt9Z%2FmGNBdQK
g%3D%3D%0A


Now we know how to locate the audio-data stream. We can initiate a flow of data by sending an HTTP request to the stream-location, adding a range parameter to determine the data we want. For example, we could add the following:

Range: bytes=10000-64000\r\n

This would return 54,000 bytes of data, stating at position 10,000. If we want to know the length of the stream, we can add the line:

Range: bytes=0-1\r\n

Which returns an HTTP response with the length of the stream stored in the Content-Range header.

The stream data may be very large, so we download it in chunks, requesting successive chunks of maximum size 32Mb.

Socket Communications
There are four functions provided in the Program.cs application file to assist with socket communications. By trapping exceptions for each socket function instead of wrapping whole socket sessions in a single try-catch, error messages can indicate exactly where the exceptions occur.

The four socket functions are prefixed with "_LowLevel" because they are only called from within private functions.

_LowLevel_OpenSocketConnection Create a socket object, resolve the host name, and connect the socket.

_LowLevel_SendSocketData Send an array of bytes using the socket.

_LowLevel_ReadSocketData Wait for data to arrive. Data received is added to the "bigBuffer" byte array.

_LowLevel_SendRawSocketRequest      This function uses all three of the above functions to connect to a host, send a request, wait the results, and close the socket connection.

Application Flow
Now we know all we need to know, lets plot an application flow. Below is a table of the application functions in the same order they appear in the example code Program.cs.

A     Read the application command-line parameters ("arguments"). To download a programme we definitely need the PID from the user. If one isn't provided as an argument in the form "-pid=b00XXXXX", the user can enter one. If the format of the PID entered is rubbish, there's no point going any further.

Also, there is a default target directory of "c:\". The user may want to store the programme somewhere else, so have an option of specifying an alternative target directory in the form "-targetfolder=<path>". This folder must exist.

_ReadApplicationArguments
B Look up the episode information for the PID using the .NET HTTPWebRequest and HttPWebResponse classes. The returning data is in XML format, from which we retrieve the episode ID, the episode title, a summary of the episode, and what type of programme it is (radio or television).

_GetEpisodeDetails
C Combine the episode data with the target folder to make a unique filepath for the target file.

_MakeUniqueFilepath
D Open the mobile-phone homepage of the iPlayer website "/mobile/iPlayer", whilst pretending that the browser is Mozilla running on an iPhone. The response is an HTML file containing the cookie and stream location. If the programme contains adult content, a POST will be made to confirm that we are 16 years old or more.

_GetEpisodeStreamLocationAndCookie
E Use the stream-location to request the length of the programme data. We can use this later to plot the progress of our download.

_QueryStreamLength
F Open the target file, and download the stream of audio-data into it. This is done in chunks of 32Mb. Plot progress as data is received, and bail out on errors. _DownloadStream


More Information
Here are some links for further reading:

RtmpClient class for .NET developers

BBC iPlayer

Wikipedia - RTMP

Wikipedia - AAC

HttpWebRequest and HttpWebResponse


NOTEBOOK SUPPORT
If you have any questions or suggestions, or have found an error, or would like more explanation of one of the items in the Software Developers Notebook please contact us by email.

Software notebook support: enquiries@broccoliproducts.com


Return to the home page.

Broccoli Products Ltd © 1998-2010 Broccoli Products Ltd
Reg Number: 2895355
Reg Office: 27 Old Gloucester Street, London. WC1N 3AX
Bug Report Form Privacy Policy
Copyright Notice
Liability Disclaimer
Contact Us