High-Level Protocol Design
The proposed high-level protocol we use is devided into 3 phases: In the first phase (Low-Level Connectivity) the client discovers all the available services offered by services devices (phase 1.1) and builds a cryptographically secured connection to the service device from which it will use a service (phase 1.2). We will use this phase also to verify the identity of the service device so that the user can be confident that he uses the assumed service and not the one of a malicious third party.
After phase 1, the second phase (Authentification and Authorization) establishes an authenticated (phase 2.1) and authorized session (phase 2.2), which means that the service device get's the identity of the client (provided by some trusted Identity Provider) and after that the authorization of the client to use the service (provided by some Access Control Manager which is associated with the service device). As the service device may operate completely offline (only equipped with NFC), the communication to both the IP and the ACM have to be managed by the client himself.
The third phase is the phase in which the actual use of the service occurs. All three phases are described below in detail:
Phase 1: Low-Level Connectivity
Phase 1.1: Service Discovery
(Bluetooth built-in? IP-based: Bonjour?)
We should define a service discovery mechanism that is completely IP-based (most probably: use Bonjour for that) and completely agnostic of the lower layers. Then there should also be the possibility to use mechanisms that are specific to the respective lower layer, e.g. to speed up discovery and most importantly to optimize the operation (there's no sense in establishing a full IP connection with ZeroConf etc. when it can be determined at the Bluetooth layer that the remote side won't talk with us anyways). These lower layer mechanisms would have to be defined separately for each NFC technology. In the case of Bluetooth that would most probably be based on SDP.
All data of phase 1.1 may be cached for later use (other services which are already known from the same or other service devices...).
Phase 1.2: Secured Connection
After we know to which service on which device we want to talk to, a connection has to be built. It seems logical to start with encryption right here (using e.g. TLS), so that all information potentially leaked to malicious third parties is minimized. It's also a good point to verify that the other end point to which the client talks to is the actual service device, so that any kind of man-in-the-middle attack is prevented. This can be done by using some public-key protocol, so that the client can present the fingerprint (encoded as some colored symbols? Is there some standard to do this?) of the received key used by the service device to its user, who can compare it to some physical label on the service device.
Other alternatives seemed possible to verify the fingerprint. E.g. the ACM could provide this information (in phase 2.2), so that the authenticity of the service device could be verified by the client later on, instead of asking the user to compare some colored symbols. But this would mean that the user would have to trust the ACM, and as the service device itself provides the information which ACM to use, this could also be a compromised service (provided by e.g. an attacker who tries to MitM-attack the connection between client and actual service device), so we consider this as bad design. Note that the fingerprint can't be provided by the IP as it doesn't necessarily know about ACM or even service device (as they may not have an identity in the sense of an IP).
In cases where it's not that important to ensure that the client talks to the right service device (it may be important in a photo print shop where the user doesn't want to give his photos to somebody else with e.g. much higher prices, but unimportant when accessing a beamer), one could think of a third option "unimportant" besides "matches" and "doesn't match" regarding the fingerprint (with "match" cached for future use and "unimportant" only valid for one use or sth. like that).
So from here on, the client can be confident that all data he transmits can be read only by the service device he wants to use. We now have to ensure that the service device can be sure to be used by (i.e. receive commands from) a client who is authorized to do so. This is done by phase 2:
Phase 2: Authentification and Authorization
First, the client is identified by some Identity Provider chosen by the service device (which means that the client has to be known by some Identity Provider trusted by the service device). The assertion issued by the IP may be cached for future use. After that, the authorization information for the access has to be retrieved from an Access Control Manager (also chosen by the service device) who manages the access policies for the service device. IP and ACM will usually (?) be two different entities (e.g. mobile phone company and photo print shop).
One could think of methods for minimizing the communication efforts of the client. Besides caching of identity assertions (within the clients cache), the idea could be to let the ACM identify the client itself (with the help of an IP), so that it's not necessary for the client to directly communicate to the IP. But this would have some disadvantages like leakage of information (used service, ...) to the IP or the difficulty of transmitting the needed credentials through the ACM, so we don't consider this as a good design choice.
Both authentication and authorization messages between all four parties are transmitted using (signed) SAML assertions and protocols, as follows:
Phase 2.1: Authenticated Session
First, our client (mobile phone) opens an authenticated session (by authenticating to service device (beamer) of course). This involves communication client<->service device and client<->Identity Provider:
- Client presents identity to service device
- Service device asks for prove
- Client asks Identity Provider for a signed authentication statement (SAML Authentication query)
- Identity Provider provides this (SAML Authentication statement)
- Client presents IP-signed identity
- -> authenticated session opened (service device issues some session-id)
Phase 2.2: Authorized Session
Within this session, an authorized usage session (or multiple?) may be opened. This involves communication client<->service device and client<->Access Control Manager
- Client asks service device for usage access
- Service device asks client for authorization
- Client asks Access Control Manager for authorization to access service device (SAML Authorization decision query)
- Client gets authorization from Access Control Manager (SAML Authorization decision statement)
- Client hands authorization over to service device
- -> authorized session opened (service device issues some session-id)
Phase 3: Using the Service
Phases 2 and 3 need a common base communications protocol. It seems logical to use XML for all protocol messages. One question still remains: How do we transfer our XML messages over the IP network? The common approach apparently is to use SOAP. Although SOAP theoretically can be used over any transport mostly no one ever uses anything else than SOAP over HTTP. This approach doesn't fit our environment:
- HTTP fundamentally is a synchronous request-response protocol. Our implementation however will probably need asynchronous communications.
- Also not all exchanges will be initiated by the same side. E.g. most of the time the mobile phone will initiate a request-response type exchange with the beamer, however, sometimes the beamer might want to initiate an exchange, such as asking for authentication. This type of communication does not nicely map to HTTP operations. (It would be necessary to implement an HTTP server on the mobile phone, which just doesn't feel right.)
One candidate for an appropriate base protocol to wrap the XML messages up in would be SCTP, the Stream Control Transmission Protocol (RFC 2960). Another candidate would be BEEP (RFC 3080). A third choice would be XMPP (RFC 3920). (See SPAN Web Ressources)
I (Henryk 18:34, 3 Nov 2005 (CET)) personally would like to see SCTP used. However, there are a couple of drawbacks:
- SCTP sits directly on top of IP, and I'm not sure whether we can get down that far in the mobile phone's IP stack
- There is no existing SOAP over SCTP binding, so we would have to invent that
- Implementing SCTP might be more complex than is needed for our project
So from a purely rational standpoint BEEP (over TCP) seems to be the best choice:
- There is an SOAP over BEEP binding (RFC 3288)
- BEEP explicitly supports TLS.
- BEEP can be used over TCP, which should be comparatively easy to implement on the mobile phone.
- If need be, BEEP can used over other transports than TCP/IP, e.g. Bluetooth RFCOMM.
(And of course: BEEP has the superior abbreviation compared with SCTP.)