Professional Documents
Culture Documents
The inner workings and possibilities of XMPP and its multimedia enabling extension Jingle
Page 1
Page 3
Capgemini
Capgemini was founded in 1967 by Serge Kampf in Grenoble and it started under the name of Sogeti - Socit pour la Gestion de l'Entreprise et le Traitement de l'Information. Its present name Capgemini is a result of merger between CAP in 1974 and Gemini in 1975. Capgemini is one of the world's leaders in information technology with a workforce of over 100.000 people in 39 countries. Capgemini has four divisions: Consultancy, Outsourcing, Technology Services and Financial Services. Consultancy gives business advice to companies which are facing important decisions. Outsourcing provides substitutes for the internal services of companies that dont belong to their core business. Technology services are focused on delivering and supporting the physical side of IT. Financial services provide all kind of services for the Finance industry. The department where I was stationed, TDI, was a subsection of B60 which is a section of FS. During my internship, the Capgemini structure was reorganized and TDI became a part of TS.
Research objectives
My research objective was to investigate the possibilities for Unified Communications within The New Way of Working philosophy for the Banking industry. The assignment was pretty open-ended and I added the future of Unified Communications regarding interoperability to the research plan, focusing primary on the interdomain interoperability. In that regard I ended up looking further into the potential of XMPP as a standard for UC, including Real-time media as voice and video. During an interview I had with Daniel Hilster, I got the impression that the facilitation of Real-time streams is often neglected when it comes to the corporate network architecture. So I added the facilitation of Real-time streams to my list of research objectives, to give an overview of techniques that could be used to safeguard Real-time streams.
Page 4
Build-up
After reading available material and looking at the fundamentals of most UC systems, I came to the conclusion that there is no universal standard in use today to connect different UC systems together, thereby making them interoperable. An new old protocol XMPP looked promising in becoming the new standard for IM due its simplicity and well-thought-through design . In chapter two, I researched its current and potential use and which companies use it for their internet services. Also the potential use of UC in the banking sector in directly communicating with its customers has been given a closer look in chapter two In chapter three , to ensure the reader would get a better understanding of the technical working of XMPP and sip and how they compare against each other. In an interview I had with Daniel Hillster I got the impression that handling RTP traffic is still a problematic. So included ways of handling RTP traffic and how to ensure the quality of the media stream is up to par. In chapter four, a theoretical case is built to show how a XMPP infrastructure could look like with examples of usable cases and references to extensions. So the reader of this thesis could get an impressions of how XMPP could be applied in an corporate environment to suit business needs The thesis ends with the conclusion in chapter five where my findings will be reported.
Acknowledgments
During the writing of this thesis I received help and support from people whom I would like to thank for their contribution. I interviewed two colleagues, which might not seem much but they gave me a great deal of material to work with. First I interviewed Daniel Hillster an employee of Didacticum who was involved in the introduction of The New Way of Work at SNS Reaal. He gave me an insight into what is involved when implementing UC systems, how it is used and the needs and opportunities when it comes to improving current UC systems. I had my second interview with Thiago Camergo. He is an experienced SIP / XMPP jingle engineer currently working at Nimbuzz. He is also in the process of developing a NAT-traversal extension for XMPP Jingle and is a strong advocate for XMPP Jingle which can be seen from his blog, XMPPjingle. He helped me to understand the concept of Jingle and gave me an insight into the developments currently in the UC field.
Page 5
Name: R. (Roel) van de wiel Function: Intern at Capgemini, student at HsZuyd Address: Theems 70, 5152 SN Drunen Tel: 06-28079738 Corporate email: Roel.vande.Wiel@capgemini.com Private Email: wielrvd@gmail.com
Graduation Committee
Mentor
Name: R. (Ron) Mandjes Function: managing consultant Email: ron.mandjes@capgemini.com Phone: 00 31 (0)3 68 99 115
Supervisor (Due illness, not in function)
Name: A. (Arnoud) Vons Function: Prinicpal consultant Email: arnoud.vons@capgemini.com Phone: 00 31 (0) 6 150 303 43
Hszuyd supervisor
Name: J.C.C. (Jean-Paul) Brands Email: j.brands@hszuyd.nl Phone: 00 31 (0) 45 400 6765
Page 6
Page 7
Page 9
Hypothesis
XMPP will be the protocol of choice for instant messaging, presence and video communication in the private domain, and the public domain (the internet). It will function as the lingua franca of the UC field. Jingle will be added to make video or voice communication possible. Explanation Jingle is an extension to XMPP that enables the setup of real-time media streams between two hosts. XMPP jingle will coexist with SIP and in the future UC vendors will include XMPP Jingle in their products, SIP will most likely be kept to maintain backwards compatibility with the PSTN network. Jingle can already be used with Ciscos CUCM (Cisco Unified Communication Manager) and the recently released Cisco Jabber client. XMPP jingle is used for Google talk. Google Talk is presently available for Android (only in the US) and with their large presence in the mobile market with Android, it is only a matter of time before Google Talk is available worldwide. The introduction of 4g technologies is probably going to be a accelerator of true IP enabled voice services. The advantage of XMPP over SIP is the integration of IM functionality, such as presence, resource identification in URI and a lesser complicated and clearer process of extending its functionality, due to the setup of its managing organization as well as the technical architecture. The IT environment is evolving into a multi-screen environment where the distinction between the personal and professional environment is blurred. The XMPP protocol with its versatility and wellthought-through design has the right architecture to fill in the requirements of this new environment. There could be question marks placed about the maturity level reached by UC vendors when it comes to XMPP integration into their products needed for Jingle support. The reasons for this are that Jingle is not yet common in implementations in enterprise environments. At this moment, work is being done to meet those criteria. But the XMPP fundamentals are solid and clear and, most importantly, they are complete. Roel van de Wiel Page 10
The office New ways of multimedia communication will change the way the office is used. It will be no longer be required to be physically present in the office in order to be a productive member of a team. That does not mean that offices will be a thing of the past; they will still fulfill a role. They will become more a type of meeting centre with a relaxed and productive atmosphere catering to the needs of the employees. These meeting centers will have a smart building system with more natural light, improved climate control and a more thought out design, so it will feel more as a natural Roel van de Wiel Page 11
New devices Because of the commoditization of consumer electronics, new types of electronic devices will be brought into the business, with or without the permission of management. Instead of prohibiting the use of these devices in a business environment, companies should take advantage of the added value that these devices could bring to the office. The hardware of devices is also an important factor in the equation when transitioning to the new way of working. There used to be a time when the criteria for choosing a computer was the raw processing power, now the criteria is shifting more toward the form factor, the quality of its sensory input like microphones and video and the quality of its output (screen en speakers.) If we use voice only communication, a small device with high quality recording and a microphone is considered to be the best. A small form factor get precedence over video output, so a 3-inch screen will do. If you need something small to take notes, edit graphic data with the touch of your finger or read some books on e-ink enabled screen, then the Tablet is your best choice. When writing a lengthy report or designing software, a laptop with a big 17 inch screen is the best. For important videoconferences, sit down and relax in front of your display screen with an HD camera. For any brainstorming sessions, use a 100-inch smart board and immediately publish the end product when you are finished. Devices will be used in conjunction with each other. Edit and share data together on one screen and seeing each other face-to-face on the other screen. The cloud A popular word often heard in the IT-world is Cloud. The Cloud is a broad concept, but in short it is about offering IT services through the Internet. In the same manner as electricity or water, services are provided without the need to invest in infrastructure on site. Access to data and applications will not be limited by one device that just happens to have the right software installed and the data on the hard drive. They will be accessible from every device being serviced hardware independent through rich internet applications, streaming virtualization or combination of the both. It is referred to as the Martini Principle: anytime, anyplace, any device. The problems In order for a new way of working to be successful, there are number of problems which must be dealt with. Most of them are not of a technical nature. People are creatures of habit; they find it hard to change the way they have been doing things. These technologies will change the way we work and influence the business processes and organizations themselves. It will become easier to work across work boundaries, scale a process or Roel van de Wiel Page 12
Hardware independent As stated above, devices will be brought into the corporate domain by employees who want access to the same functionality as they have at home. But there are some issues that need to be taken into consideration. Some of these are securing company data in a (semi)-controlled environment and maintaining the application landscape of these devices. One way of looking at it, is to regard the devices as merely a temporary container for the users applications and data. The IT department should make their support independent of the underlying hardware and should only focus on delivering added value. By detaching the OS from the device using virtualization (could be a laptop, tablet or mobile phone) and creating an isolated runtime environment to safeguard the data and the updates, makes it more secure and easy to maintain the software. Another approach is to use the Cloud to provide all the necessary applications, including rich internet applications which interact with the user and provide all the functionality local applications would normally provide. These internet applications will be based on the new HTML5 standard. Cloud based infrastructure demands that devices will always have access to internet.
Page 13
1.2.1 History of digital communications systems The ways of communicating have drastically improved in the last 200 years. In the year 1800 it could take up to a year to send a message through the postal system from Europe to a colonial country. But within a century, the time was reduced to a couple minutes by phone. Nowadays we use email widely to communicate with friends, family and colleagues. The first truly electronic web of communications was the telegraph system. With the commercial use of the first intercontinental undersea cable in 1866, countries could react quickly to important matters and businessman to trade fluctuations. Messages were brief due to the high cost involved when communicating over a single cable. Cost decreased later as the capacity and speed of transmission over the cable increased. The next big electronic web was the telephone, with an intercontinental link completed in 1915. In the beginning it was only used for verbal communications but with the passage of time text messages were transmitted through a telex system. Later a fax became another popular means of communication. In the 60s the basis was being laid for the internet. Firstly only through the declaration of theories in 1968 with the start of a digital network called Arpanet. This network eventually evolved into the Internet as we know it today. In the 90s the Internet became generally available to the public. Nowadays we use the World Wide Web for a large part of our communicating. This communication can be different types, e.g. email, IM, VOIP and video, but in essence it is the same thing: a large volume of bytes moving over a wire, going from one place to another at very high speed passing through networks that serve as intermediaries. When designing the fundamentals for the Internet, a packet-switched network was chosen over a circuit-switched network. Data is sent as a packet full of bits with a destination address attached to it. That packet will pass through a series of network devices, e.g. a router etc, and each time it goes through the device, it will look up the destination address of the packet and decide in which direction it should be sent. This increases the flexibility of the network but makes it unsuited for real-time data like voice and video. This was so until a decade ago when the rapid increase of connection quality and the introduction of Quality of Service made voice, and later videoconferencing, viable over the Internet. These days there is still a separation of the current Internet and the telephony network. Even though the telephone network has converged with the current data network in the background, they still function as separate networks. A good example is Internet-enabled mobile phones. A person is able to reach his email on any mobile device as long as it has Internet access. But the owner can only call and be called with his personal number at the same time on one mobile phone which has his personal SIM card inserted. Roel van de Wiel Page 14
1.2.2. UC functionality A UC system is made of up different components that could be used, and are used, as autonomic systems in their own right. The concept of Unified Communications exploits these separate systems by combining them and offering them as one unified communication experience to users. At the heart of this unified communication experience is the UC core system itself. The UC core system manages the incoming and outgoing direct connections like phone calls, a video conferences or IM conversations. The core system should provide plugins, Open standards and APIs so that it is able to integrate them into other enterprise applications like Sharepoint and Outlook. This integration will complete the UC experience. The connections made by the UC core system can be divided into realtime and near-real-time. Real-time Real-time communication is voice and video. With real time communication there are two separate channels: the signaling channel, like SIP, h323, XMPP Jingle or something else, and a data channel. The signaling channel is responsible for setting up a connection between two end points and negotiating the data stream and its parameters used to transport the actual voice or video data. RTP is used for the data stream. Most of the time it is directly routed between the end points, whereas the signaling protocol itself is relayed through different stations This is due the low tolerance allowance of latency and jitter when transporting real time data. VOIP Phone infrastructure was managed originally by a PBX. The PBX was a large piece of machinery which used ISDN as standard for setting up phone connections. With the rise of the Internet and the increased reliability of IP networks, IP-PBX became popular from 2000. IP-PBX uses SIP in an IP network instead of ISDN thereby eliminating the cost associated with a separate ISDN infrastructure. Cost reduction was the primary reason for implementing ISDN. Now IP-PBX is gradually evolving into a UC manager with more functionality present than simply managing phone calls. There is an import distinction that needs to be made when it comes to how VOIP is used. Phone calls are connections made with E.164 numbers (standard telephone number) that use the legacy PSTN infrastructure for interconnecting domains and it needs to conform to the specification from Roel van de Wiel Page 15
Page 16
Page 17
Page 19
2.3 UC vendors
This section has listed some interesting vendors and their UC experience. They are interesting because of their support for XMPP in their leadership position. The list of active UC vendors is too long to discuss them all. There is a clear distinction between UC vendors who provide UC infrastructure for a business environment and Internet service providers who deliver mostly consumer oriented services. 2.3.1 leading UC vendors
Cisco Cisco says its policy is to use open standards as much as possible. This is partially marketing hype but in the field of UC they are living up to that promise by fully supporting jabbers XMPP technology. They bought jabber XMPP server in 2008 and the former CEO of jabber and president of the XMPP foundation, Peter Saint-andre, is leading the jabber technology division at Cisco. Cisco UC technology natively supports XMPP for IM and presence. They currently have an extensive portfolio of UC related products. One of their products (Interoperability Media Engine) can be used as a mediator between different SIP dialects and as such can facilitate interoperability. The latest news is that there is a new release on 1st march 2011 of the XMPP client called Cisco Jabber with support for Jingle . Microsoft Lync technology does not use XMPP natively. Recently Microsoft has added a XMPP gateway to their product portfolio. This allows it to work with IM services like Gmail. The H.264 video codec from Microsoft uses proprietary techniques to make it error resistant but this makes their video solution difficult to interoperate.
2.3.2 Challengers/innovators
Process-one Process-one is the developer of XMPP software including server and clients. Their open source XMPP server, Ejabberd, is based on Erlang which is a programming language developed by Ericson to build robust telecommunication applications. It is even possible to do a hot code loading of the server which means that the server can continue to run even if code needs to be added to it.
Page 20
Page 21
Whatsapp Startup lets you chat on your smartphone with a phone number as a user identifier. It adds contacts who are already listed in your phonebook and have whatsapp installed into your contact list. Yammer Due the nature of their service, i.e. microblogging, there is heavy traffic between the client and its server. With XMPP implemented, updates get pushed to the client rather than polling the server every time. Nokia ovi Nokia chat service is based on XMPP. It is a little comparable to Blackberrys ping service. Facebook Facebook is the biggest social media website. It added XMPP support in February 2010. Currently Facebook uses XMPP for letting users connect with their IM services. It blocks any XMPP service that is not IM-related like Jingle, and does not allow interdomain communication. The reasons for this Roel van de Wiel Page 22
2.5 Institutes
Institutes are responsible for guiding technology development and defining open standards. There is a need for a common agreement on the specifications of standards. So when standards find their way into the vendor's products, it is still possible to combine these products with exchange data services. Roel van de Wiel Page 23
2.6 Reflection
There is still a clear distinction between service providers on the Internet which have evolved from standard text based communication (like IM and email) to a more advanced form of communication, and the telecom industry responsible for the mobile phone network and phone infrastructure as used in businesses. The former has less restrictions placed on it and is more flexible when implementing new functionality than the latter. Reason for this is that the telecom industry is very institutionalized and needs to comply with many standards. Some of those are in place to remain backwards compatible with PSTN or legacy equipment. This distinction will most likely slowly disappear and can already be observed by looking at the increase in the number of applications on mobile phones that enable the customer to use the Internet for phone calls rather than using the mobile phone network. Mobile phone network operators are struggling to adapt to these events. Banning these applications from the network or limiting the network access for these applications means risking customer dissatisfaction. One possible solution to this could be to implement IMS which allows mobile operators to include those services (their own or of that of a third party service provider) into their mobile phone network making them chargeable and providing QoS. But as yet, there has been no successful introduction of IMS. Due to their adaptability and rich feature set, it is most likely that the future of communication will be dictated by service providers like Gtalk, Fring and Nimbuzz which are all XMPP based (but are also SIP capable). Industry leaders in the UC section, like Cisco, Microsoft and Avaya, are already moving towards XMPP support and UC interoperability. Cisco is ahead in this effort as it recently (1 march 2010) announced Cisco Jabber with a UC client based on XMPP that supports Jingle and ultimately Gtalk.
Page 25
3.1.1 Call setup SIP is a session protocol for managing the connection for as long as it is required. A SIP session is set up when one end-point, e.g. a mobile phone, initiates contact through a invitation and the other party send an acknowledgement. The whole process will be explained in more detail later in this chapter. After this initial contact, the connection is considered established. While the connection is active, Real-time data (mostly voice) is transmitted through a separate Real-time channel using the RTP protocol. Voice and Signaling channels are separated into two different channels which can be independently routed. When a session is to be ended, a termination command will be sent by one of the parties to request the end of the session. In the main RFC for SIP, RFC3261, a SIP session is broken down into five facets: User location: determining which end system will be used for communication. User availability: determining whether or not the called party is willing to engage in communications. User capabilities: determining the media and media parameters to be used for this communication. Session setup: establishing the session parameters at both the called and calling parties. Session management: including the transfer and termination of sessions, the modifying of session parameters, and the invoking of session services.
3.1.2 SIP software architecture The SIP software architecture consists of two elements: a SIP client (User Agent Client) and a SIP server (User Agent Server). A SIP client sends SIP requests and receives SIP responses. A SIP server receives the requests and gives responses. For example, a SIP client sends a request in the form of an invitation, the server receives this request and determines whether to send an acknowledgement or deny the request with an error or unavailable message.
Page 26
These are just server roles and most of the time they reside on one physical server.
3.1.4 SIP message format There are six types of request messages defined and they are referred to as Methods. These are sent by UAC: REGISTER: Is used by a client to register an address with a SIP server. INVITE: Indicates that the user or service is being invited to participate in a session. The body of this message would include a description of the session to which the callee is being invited. ACK: Confirms that the client has received a final response to an INVITE request, and is only used with INVITE requests. CANCEL: Is used to cancel a pending request. BYE: Is sent by a User Agent Client to indicate to the server that it wishes to terminate the call. OPTIONS: Is used to query a server about its capabilities.
The response messages contain Status Codes and Reason Phrases that indicate the current condition of this request. These methods are used by the UAS. The status code values are divided into six general categories: 1xx: Provisional: The request has been received and processing is continuing. 2xx: Success: An ACK, to indicate that the action was successfully received, understood, and accepted. 3xx: Redirection: Further action is required to process this request. 4xx: Client Error: The request contains bad syntax and cannot be fulfilled at this server. Page 27
The inspiration for this code schema is the code schema used in the HTTP protocol, with its most famous 504 page not found error.
3.1.5 SDP As mentioned earlier, the signaling and the data channels are separated. The SIP protocol is responsible for setting up the data channel and it uses the SDP protocol to negotiate this. SDP originated from its implementation in SAP, but has been reused in SIP and is defined in RFC4566. SDP is basically a standard which specifies the parameters for a real-time data channel that needs to be published to other parties. Mandatory parameters are marked with an asterisk.
Session description v= (protocol version) o= (owner/creator and session identifier) s= (session name) i=* (session information) u=* (URI of description) e=* (email address) p=* (phone number) c=* (connection information - not required if included in all media) b=* (bandwidth information) One or more time descriptions (see below) z=* (time zone adjustments) k=* (encryption key) a=* (zero or more session attribute lines) Zero or more media descriptions (see below) Time description t= (time the session is active) r=* (zero or more repeat times) Media description m= (media name and transport address) i=* (media title) c=* (connection information - optional if included at session-level) b=* (bandwidth information) k=* (encryption key) a=* (zero or more media attribute lines)
And this is an example how SDP is used to set up a RTP connection [Offer] Roel van de Wiel Page 28
3.1.6 Call setup After explaining the basics of the SIP operation, it is easier to visualize now how a call is made. When a voice call is being made, the phone (UAC part) sends out an INVITE request with a SDP offer to the proxy server. The proxy server checks if the URI is known local and is registered with the Registar server, or interdomain. If it is an interdomain URI, the redirect server will be queried and it response will be a 3XX redirection method message that points to the next server in line. The proxy server of the other domain will receive a INVITE with a URI identifying the receiving party and will pass it on to the phone. The phone (UAS part) will reply with a response message that will most likely be a 1XX provisional acceptance. When phone call is accepted, by picking up the phone for example, a 2XX ACK is sent back. A RTP stream that directly connects the caller to the callee is set up according to the approved SDP offer. It is also possible for the callee to respond with a 3XX redirection message. When the phone line is busy, a redirection needs to be sent. If the connecting attempt is unsuccessful a 4XX, a 5XX or a 6XX error code is returned as a response.
Page 29
Page 30
In this day and age, phone calling or video calling is simply not enough. There is a need for IM communication, or chatting as it is known, where people can have a text conversation with their colleagues. This has proven to be ideal in some situations, for example asking a quick question and receiving a short answer. It should also be possible to see the current availability status of a person to determine if they are open for communication. To service these needs, there has been an extension defined to the SIP protocol called Simple. Simple is the acronym of Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions. The Simple protocol adds two distinguished features to the SIP protocol: Instant messaging and presence sharing. But unfortunately it is considered by many to be too complex for implementation.
3.1.7 Why does SIP not provide the solution ? Understanding the deficiencies of SIP is easier when comparing it to another standard like SMTP. When sending an email, SMTP is used between two servers. SMTP is a straightforward protocol with only two different versions implemented in email servers worldwide: ESMTP (Enhanced Simple Mail Transport Protocol) and SMTP (Simple Mail Transport Protocol). ESMTP is the standard for email in use nowadays. When writing email server software, the SMTP protocol is followed exactly without making any twists or changes, resulting in full compatibility with the standard. Therefore when sending a email from one server to another server there is no risk of the email being rejected by the receiving server. SIP is based on HTTP and SMTP. It uses the same schemas and grammar but SIP is designed for voice communications whereas HTTP and SMTP are for one way and two-way text-based communications. The main difference between SIP and the other protocols is the looseness of the specifications, which result in vendor implementing SIP according to their interpretation of the protocol. All the functionality that has been defined in the IETF standard in relation to SIP can be found in RFC5411 - the hitchhikers guide to SIP. It serves as a reference guide to the 100 SIP related RFC.
3.1.8 Problems with SIP SIP was developed in the late nineties to make voice over internet possible. At that time there was a digital voice standard, namely ISDN. ISDN is a great protocol by itself but it was not designed for IP networks like the internet. So there was still two segregated infrastructures being used alongside each other - the packet-switched IP enabled network called the Internet and the circuit-switched telephone network. Maintaining two independent infrastructures is more expensive than maintaining one, hence the need for voice transportation over the Internet. There were two protocols competing to becoming the standard for voice over Internet - H323 developed by ITU-T and SIP developed by the IETF. Roel van de Wiel Page 31
Weak Terms Can = 475 Option = 144 Should = 344 May = 381
Strong Terms
The excessive use of weak terms is a good indicator of how open the SIP standard is for interpretation by developers. As a result, when using SIP in a heterogeneous environment, great effort has to be made to maintain compatibility with even the most basic functionality. The common architecture used in enterprises when using SIP as replacement for telephony is that of a central PBX system that is used to communicate with the internal clients in the vendor dependent dialect of SIP, and a SIP gateway that is used to make SIP interworking possible and provide SIP trunking capabilities. One of the Internet advantages over the original PSTN network is its mesh design and the ability to setup a connection on a peer-to-peer basis. With the need for SIP-trunking for connecting SIP infrastructure to telephony service providers nothing has changed. The SIP infrastructure as a whole mimics the PSTN network over the internet. Sources: (sip interoperability ) & (Real-world SIP Interoperability: Still an elusive quest , 2007)
3.2 XMPP
3.2.1 Brief History of XMPP XMPP technology was invented by Jeremie Miller in 1998. His motivation came from the desire to open up IM services. The first release of a working product was in January 4 1999. Soon there was a whole group of developers designing clients and libraries for languages. Work on the XMPP protocol Roel van de Wiel Page 32
3.2.5 Stream XMPP is in essence a streaming XML protocol using Stanzas to communicate. When the negotiation of XML stream is complete, stanzas are used to exchange messages. There are three types of stanzas: <message> A message stanza is used to send an IM message; a message is pushed to the other party. There are five different types of messages:Normal: similar to an email message where a reaction might be given. Chat: near real-time message communication. Groupchat: communication in a chat room. Headline: used for alert and notification Error: for error notification. <presence> Presence stanzas are used to indicate the presence of the client. It also offers the possibility of including standard status signs like Away or Available or with personal information, e.g. Im in the train Example: <presence from="alice@wonderland.lit/pda"> <show>xa</show> <status>down the rabbit hole!</status> </presence>
<iq> The Info Query stanza is used to receive and send information. A request maybe a roster <iq> a dialog between a client and server would look like this: C = client and S=server. C: <stream:stream> C: <presence/> C: <iq type="get"> <query xmlns="jabber:iq:roster"/> </iq> S: <iq type="result"> <query xmlns="jabber:iq:roster"> <item jid="alice@wonderland.lit"/> Roel van de Wiel Page 34
The difference between a XML streaming protocol like XMPP and a SMTP and HTTP inspired protocol like SIP is that XMPP sets up a long lived TCP connection that is better suited for near-real-time IM communication. This is in contrast to SIP which needs to set up a TCP session for every information exchange. 3.2.6 Security One of the requirements stated by the IETF in order to ratify XMPP as a RFC, was that it must have security built into its design. As a result, TLS and SASL have been incorporated into the specifications of the core XMPP RFC. So communication from server to client and from server to server can be secured by TLS, and credentials can be checked by using SASL. This does not make XMPP secure from end- to-end because messages are unencrypted as they pass through the server. Work is in progress to make XMPP end-to-end secure, however this would make the messages unreadable at the server itself. End-to-end security can have its drawbacks because it obscures the stream, making it hard to be controlled and audited. Another security feature is the option to use CAPTCHA. CAPTCHA can be used to mitigate SPIM (SPAM at IM networks). When a XMPP account request the addition of another XMPP account to the domain, the server has the possibility to send a CAPTCHA as a data form to identify the user as a real person. 3.2.7 XMPP jingle XMPP uses ASCII in its communication so it is well suited to sending and receiving text messages. But it is not well suited to sending binary data like a file or voice communication. It must first be converted to Base64 and that makes the process inefficient. Another issue is the fact that XMPP uses a client-server model and the data is sent indirectly via a path through the server. This is also a reason why sending a large amount of data or data with QOS, is more efficient using a different protocol. So there was a need for an extension to solve these problems. When Google launched Google Talk in 2005 with voice support over XMPP, the XMPP community became serious about Roel van de Wiel Page 35
An offer is two sided. Application type: States the type of data that is going to be exchanged and the protocol used, e.g. voice data over Real Time transmission protocol. Transport method How data is to be sent and which IP address it is going to, e.g. UDP on port 4043. Each Jingle stanza has an action type, which is quite similar to the different actions types of SIP: Session-initiate Session-accept Session terminate Session-info Used to give additional information through the session
There are some additional action types that could be sent through Jingle. Content-add Can be used to add another content type like video or voice to the stream. Content-remove Opposite of content-add Content-modify Change the direction of the media exchange, so sender-only or receiver-only. Description-info Additional information, e.g. suggested height and width. Transport-replace Suggest a change in transport method, e.g. IP address or port. This can be accepted or rejected by the other party.
When an offer is being made by the initiator, it starts a process which generates a large amount of XMPP traffic being sent back and forth to negotiate offering details like Codecs, IP addresses and port numbers. As copied from 1. The initiator sends an offer to the responder. 2. The offer consists of one or more application types (voice, video, file transfer, screen sharing etc.) and one or more transport methods (UDP, ICE, TCP, etc.). 3. The parties negotiate further parameters related to the application type(s) and work to set up the transport(s). Roel van de Wiel Page 36
The offered payloads are copied from the profile offering of SDP. This makes XMPP Jingle compatible with SIP/SDP. Jingle is mostly used for voice or audio but it could be used for setting up other streams, like gaming or app sharing. If everything goes well, the responder answers with a session-accept stanza: <iq from="sister@realworld.lit/home" id="b18dh29f" to="alice@wonderland.lit/rabbithole" type="set"> <jingle xmlns="urn:xmpp:jingle:1" Roel van de Wiel Page 37
Bypassing NAT The problem with setting up a real time media stream is the inherent difficulty in bypassing Network Address Translation (NAT). NAT hides the receiving IP address which makes it hard to route the stream to the senders IP address. XMPP Jingle supports all the traditional ways of bypassing NAT, e.g. supporting TURN and STUN. The negotiation of NAT Bypassing is done by the ICE. Recently there has been an extension added to XMPP called XMPP Jingle relay node (XEP-0278). It is still in the experimental phase but is already the most supported NAT bypass technique in XMPP software. It works by relaying the real time stream. When a client notices it is located behind a NAT device, it will do a service discovery to find any clients, or servers for that matter, which support a Jingle relay node and have direct access to the Internet without NAT. When an appropriate relay node is found, a request for a Jingle relay channel is made. The Jingle relay node responds (if all goes accordingly) with a Jingle relay node channel accept and parameters like maximal kbps, public IP address and port. The client includes the IP address and port number into the Jingle negotiation process. The sending client does not have to support the XMPP Jingle relay node. It has only to transmit the stream to the IP address and port given to it. This will be resent to the receiving party. 3.2.8 Advantages of XMPP XMPP is an easy to understand protocol. At the core is a set of principles that are formalized in a very straightforward, standard and explicit way. Due the restrictions of the protocol, the risk of it degenerating into separate dialects is small. One of XMPP principles is that it should be easy to add extension to the protocol making it easy to add functionality to any existing XMPP infrastructure. Another distinct advantage is the way it uses URI to identify resources. A user can have several clients logged in at the same time. The URI would look something like this user@domain.nl/mobile and user@domain.nl/laptop . Every client receives an 8-bit priority number between -128 and 128. This enables the user to select as default a client that he prefers to be addressed by. When needed the conversation can be moved to a different client and changed to video mode, e.g. chatting with a user@domain.nl/laptop could be moved to a video conversation with the client user@domain.nl/video. Security is also incorporated into the design by using TLS and SALS.
Page 38
Page 39
3.3.1 Codecs codecs (COdingDECoding) are algorithms used to encode analogue audio and video signals into a digital form for transmission. There are many codecs for audio and video streams varying in complexity and serving different purposes. The increase in network bandwidth and the computing power in the last few years have made it possible to employ advanced codecs which lead to a higher quality video and audio. Until recently, there was no provisioning in place to make codecs more robust and resilient for unfavorable network conditions. A new variation of the H.264, H.264 appendix G or H.264 SVC, standardized by ITU-T, is specially designed to produce an acceptable video quality over slow and error-prone links. This is achieved by splitting the signal into several layers.
Page 40
The base layer of the protocol delivers a low-resolution low frame-rate video. The layers on top of this base layer supplement the data by enhancing the quality of the video which the base layer is made up of. When the data of one of the enhancement layers is incomplete, the resolution and frame rate of the video is dropped for a very short period and it falls back to the resolution and frame rate of the next lower layer. The base layer has modest bandwidth requirements compared to the upper layers so it is less likely that the reconstruction of the base layer will fail due to incomplete data. Another advantage of SVC is seen with multi party video conferencing. With this type of video conferencing, multiple streams are sent to several devices that might differ in their ability to render video, for example devices that only can handle 480p like mobile phones. Instead of recoding the original video for that one special device type, only the layers that the device supports have to be retransmitted. Another way of making the stream extra resilient is by applying the Forward Error Correction algorithm. When parts of the data in packets are damaged along the way, the original information can be recreated by using the added FEC information, provided there is enough data to reconstruct the missing parts. FEC has the disadvantage of adding overhead to the data stream, but used in combination with SVC, and applying FEC only to the base layer, overhead is kept to a minimal while still providing a video stream that is resilient enough to withstand unfavorable network conditions. 3.3.2 Call admission control Call Admission Control is a last resort for guaranteeing acceptable quality in the data stream. With call admission control active, capacity in the network is closely monitored and guarded. When there is a danger of running out of bandwidth, call admission control steps in and prohibits the setup of another (video) call. Call Admission Control is not incorporated in the XMPP protocol because it is regarded as a function to be dealt with outside the scope of XMPP. Currently there is no XMPP server software which supports Call Admission Control due the fact that these XMPP servers' primary function is to provide IM communication, and Jingle support is considered to be less important.
Page 41
Page 42
Example.com
Jinglerelay.Example.com
2& 3
3 2&
1
Media stream user1@Example.com\laptop user2@Example.com\mobile user2@Example.com\videoscreen
1: XMPP jingle request with offer 2: XMPP jingle accept 3: XMPP termination Extensions used: XEP-0166 Jingle, XEP-0167 Jingle RTP session Interdomain jingle call When setting up an interdomain video call, NATs from both sides have to be traversed. Different techniques can be applied to overcome this. An interesting one is Jingle relay node. A Jingle relay node is a Jingle client connected to the Internet with a public IP address and which has direct access to the Jingle client it is serving with no NAT taking place in the path. When User 1 wants to set up a Jingle call with User 3 who is from a different domain, but notices it is behind a NAT, it initiates a service discovery to search for a XMPP Jingle relay node. When a XMPP Jingle relay node is found the client asks for an available channel to facilitate a RTP stream. If there is enough capacity available, a channel will be provided. The RTP stream of the external client will be redirected through the Jingle relay node. For a multiparty video or audio conference there is currently no standard agreed. There is a company named Peoplelink which has based its video MCU product on XMPP. At this moment, it is not clear if it is based on XMPP with an open standard or if it has chosen its own proprietary solution for its MCU based on XMPP. It should be very easy to add a multiparty video conferencing standard to XMPP. The possibility to redirect RTP to a different destination is included in Jingle. The Jingle relay node specifications could be used to provide such functionality. At this moment the author of Roel van de Wiel Page 43
test.com
am Media stre
2+3 Example.com
1+ 2+ 3
m ea str ia ed M
user1@Example.com\laptop
The process is quiet similar like in the first example in intradomain communication. Only there are some extra steps involved. 1: search for XMPP jingle relay node that is accessible 2: request XMPP jingle relay channel from available XMPP relay node 3: send XMPP jingle relay channel details Extensions used: XEP-0166 jingle, XEP-0167 jingle RTP session XEP-0278 jingle relay node
Page 44
Page 45
webserver
customerID@customercenter.com/ applicationserver
Customercenter.com
Example.com
Jinglerelay.Example.com
user1@Example.com\laptop
One of the choices made in this design is to separate the XMPP domains. One for the web site which is used for the user account of the customer, and the other for the company. The reason for this separation is that it increases the manageability of the solution. This way multiple sites can be managed by one company and vice versa. Extensions: XEP-0004 data form, XEP-0166 jingle, XEP-0167 jingle RTP session XEP-0278 jingle relay node
Page 46
The advice given by Gartner is sound. In addition to this advice which is fully supported by the author of this thesis, the capabilities of XMPP multimedia extension Jingle should be taken into account when developing a communication architecture. That advice would be to consider XMPP Jingle, when and where possible, to be the protocol of choice when it comes to interdomain multimedia communication such as voice or video conferencing with partners or customers. Another important part of UC systems is the transmission of real-time traffic. Real-time traffic is very prone to suffer from unfavorable network conditions. Usually adaptations to the network infrastructure are made to mitigate this problem, e.g. implementing QoS. But often key parts of the Roel van de Wiel Page 47
Page 48
Page 49
Page 50
Page 51