![]() |
home | contact us | profile | |
| BBC Radio Downloader Project in C# (by Lou Brown, last updated 28-DEC-2009) The Brief This is a guide for C# developers who want to create an iRecorder - an application that downloads radio programmes from the BBC iPlayer website. I will start by describing the technical hurdles and how they can be leaped, and finish with a description of the application flow. The code for this project can be found in "Program.cs", a single C# file that compiles as a console application. The function names and code shown on this page are taken from this file. I have compiled and run this file on XP with VS2005. An understanding of HTTP protocols, socket communications and XML is required. However these functions are neatly encapsulated in the .NET libraries, and I have commented the code to make it clear what is going on. Throughout this guide, the word "programme" refers to a radio programme, such as "Woman's Hour". I will use the term "application" to refer to the compiled executable. What is our brief? For a programme specified by the user, connect to the BBC's iPlayer website, download the streamed programme, and store it in a file (the "target file"). The RTMP Client Way The BBC stream their programmes for a variety of machines - PCs and Apple Macs, mobile phones and iPods. Although iPlayer makes MP3 files available for download to mobile phones, the primary streaming protocol for PCs and Macs is RTMP (Real Time Messaging Protocol). RTMP is a streaming protocol that serves audio and video data from a media server to an RTMP client. The most widely used RTMP client is Adobe's Flash Player. If you watch tv on the Internet, or YouTube videos, the chances are you are using Flash Player and RTMP or one of it's variations. Although RTMP is the dominant streaming protocol, it is owned by Adobe, and the official RTMP specification is the sort of specification that only makes sense to someone with a thorough understanding of RTMP. If you want to write an RTMP client class, I suggest you make sure you have a lot of time, a GSOH, and are prepared to scour the Internet for hints and clues as to the mysteries of the RTMP message types. Alternatively, you could use our RTMP Client class for .NET developers. Another drawback with RTMP streaming is that the audio is often delivered as packets of AAC (Advanced Audio Coding). AAC is an audio compression format, once named as the successor to MP3. When AAC is streamed, the packets sent are the frames of data found in the middle of an AAC file. So to make an MP3 file out of the AAC packets, one must combine all of the AAC packets with a valid AAC file header and footer into an AAC file, and then convert the AAC file to an MP3. It would be a lot easier if we could just download the file as a complete MP3 file. As I've mentioned above, the iPlayer only makes MP3 available for mobile phones, so either we learn about the internals of an AAC file, or we learn how to pretend to be a mobile phone. Pretending to be a Mobile Phone (HTTP Requests and Cookies) I'll start by briefly explaining how HTTP and Cookies work. (Most of you will know, so you can skip this bit.) When you open a website with your browser (your browser might be FireFox or Internet Explorer), your browser sends a request to the server that hosts the website. The first request is usually for the content of the homepage. Once the homepage content is received, the browser requests any other data, such as images, that are contained in the website. The requests and responses that traverse the Internet connection are in HTTP format - a simple textual format. This type of connection, between a browser and the web server, is known as a "stateless session" - the website server cannot tell the difference between your browser's requests and anyone else's. An analogy would be if a man called Tony sat behind a screen all day with a loud-hailer, and on the other side of the screen there are loads of people shouting out things like "What time is it?" and "What colour are your shoes?". Tony shouts out the answer to each question, but he never knows who's asking. If Mary, for example, shouts "How much money is in my bank account?" Tony cannot answer because he doesn't know who's asking. Here's Mary's question in HTTP format:
The "\r\n" signifies newline characters. The top line of the request (the "GET...") is Mary's question. There is nothing in this request that identifies the requester as Mary. But if a small piece of data identifying Mary is attached to the HTTP request, Tony would know who the request was from. The bits of data that identify a user to a website are called Cookies. Here is the HTTP request with a cookie attached.
The cookie has made this a "statefull session". It has given the session "statefullness". Where does this cookie come from? It comes from Tony. When Mary makes her first request to Tony, the equivalent to opening Tony's home page, Tony's response would have included the cookie. When you open the homepage of a website, this is when the cookie value is defined. Cookies are often talked about as if they were little files of personal details that attach themselves to outgoing web traffic, but this is not realistic. The HTTP request-respose has been encapsulated in the Microsoft .NET libraries as classes HttpWebRequest and HttpWebResponse. But we wont be using them. Why? Because if you look at the HTTP in the above query, there is a section "User-Agent" which tells the website which type of browser we are using. In this case "Microsoft Internet Explorer". If we want to pretend to be a mobile phone, this "User-Agent" kind-a gives the game away. We wouldn't have Microsoft Explorer installed on a mobile phone. A mobile phone would have a User-Agent that looked more like this:
And since we're looking at large chunks of gobbledygook, you should know cookies don't look like this "MY-NAME=Mary". They look like this:
Attractive! We can retrieve a cookie from the iPlayer website by sending a request containing the User-Agent of a mobile phone's browser. For all subsequent requests to the iPlayer website we will be rolling our own HTTP, which will include the cookie, and the User-Agent for a mobile phone or mobile phone browser. Where Is The Stream Of Audio Data? We start with a programme id - a PID. These PIDs can be found on the BBC iPlayer website. If you roll your mouse over a programme link you will see a code in the form "b00XXXXX", an eight character lower case alphanumeric code, prefixed with b00. This is the programme code. In BBC-speak, a programme cannot be downloaded, only an episode can be downloaded. So we need the location of the stream of data for the episode of our programme. This can be obtained from the page:
PID is the programme id, and the page returned is an HTML file which includes two pieces of information - the episode stream location URL, and a cookie for our session with the iPlayer website. (Note: if we try and open this page in a browser on a PC we wont get anything useful back because we are not using a mobile phone, and remember that all of our requests are sent whilst pretending to be a mobile phone.) A typical episode stream location looks something like this:
Now we know how to locate the audio-data stream. We can initiate a flow of data by sending an HTTP request to the stream-location, adding a range parameter to determine the data we want. For example, we could add the following:
This would return 54,000 bytes of data, stating at position 10,000. If we want to know the length of the stream, we can add the line:
Which returns an HTTP response with the length of the stream stored in the Content-Range header. The stream data may be very large, so we download it in chunks, requesting successive chunks of maximum size 32Mb. Socket Communications There are four functions provided in the Program.cs application file to assist with socket communications. By trapping exceptions for each socket function instead of wrapping whole socket sessions in a single try-catch, error messages can indicate exactly where the exceptions occur. The four socket functions are prefixed with "_LowLevel" because they are only called from within private functions.
Application Flow Now we know all we need to know, lets plot an application flow. Below is a table of the application functions in the same order they appear in the example code Program.cs.
More Information Here are some links for further reading: RtmpClient class for .NET developers BBC iPlayer HttpWebRequest and HttpWebResponse NOTEBOOK SUPPORT If you have any questions or suggestions, or have found an error, or would like more explanation of one of the items in the Software Developers Notebook please contact us by email. Software notebook support: enquiries@broccoliproducts.com Return to the home page. |
![]() |
© 1998-2010 Broccoli Products Ltd Reg Number: 2895355 Reg Office: 27 Old Gloucester Street, London. WC1N 3AX |
Bug Report Form | Privacy Policy Copyright Notice Liability Disclaimer Contact Us |