Distributed Video Talk Environment

Web Enabled, CORBA Driven, Distributed VideoTalk Environment on the Java Platform ACCESSIBILITY {tomek , kz}@ics.agh.edu.pl

Abstract
Nothing probably gets people more accessibility than the ability to be everywhere without actually being there ever. The article presents a general overview of a Java driven, distributed framework for networked real motion multimedia on the world wide web. It presents the obstacles and difficulties one must inevitably face to make real video, web centric, network Java applications a reality. The paper is primarily intended to outline one of the most, as it is perceived, feasible ways to meet the challange at this stage of Sun's Java technology. It emphasizes the benefits brought and limitations imposed by Java on the architecture of a multimedia system. It sketches out the ways the designers of Java VideoTalk Environment (aka VTE) system tried to exploit the former and to overcome the latter. The article also portrays an attempt to build the entire video system on top of OMG's CORBA computational model, facilitating enormously VTE 's interoperability with ABS from Oracle/Olivetti Research Laboratory, Cambridge, UK and, to a certain degree, with IBM's Aglets.

Keywords: CORBA, Java, ORL's ABS, Streaming Audio and Video, IBM 's Aglets

Introduction
The demand for audio and video existence on the web seems to grow, especially since the day when Java[1] mesmerised web users by offering easy and straightforward executable applet code download and execution across many different platforms and environments. The interest sparked by the promise of much more lively web pages and the omnipresence of proclaiming Java as the web and multimedia oriented language somewhat diminished when it turned out that the only multimedia support, for the time being, was cartoon animation and that real video libraries were scheduled for 1997. Starting with SunSoft's alpha3 version of Java a group at UMM embarked upon integrating video and audio capabilities into Java software.
Implementing a real time networked video system is not a trivial task. An implementation often requires addressing many levels of software development, ranging from writing low level code (like code getting the images off the video devices) up to higher software levels (like APIs needed for harnessing the need for initiating connections and, in general, controlling the system's execution flow). To tackle the problem of delivering real time video to the web efficiently, one must be able to provide viable solutions in all of the software areas.
Sun's Java appearance seems to be an attempt to provide an important answer to the need for founding a common software platform, making the high level code portable. However, its current state of technology offers very modest programming facilities that could serve as a vehicle for real motion video applications of practical use, thus effectively bridging the accessibility gap between web users . VideoTalk Environment development team's aim was the creation of a flexible system, bringing audio and video accessibility to the user within a corporate LAN or, as work progresses, to wider audience, in each case, from within an ordinary Java enabled web browser.

Design Problems
During the development of a Java video system today, one encounters three main areas that in the reality shape the overall Java multimedia system's structure. First of them is the difficulty of seamless integration of video handling code with Java applet software, the second one is the selection of the framework used to provide all the facilities needed to harness the complexities of the system (like initiating/closing connections, handling communication errors), and the third one is the choice of transport protocol used to transfer data between video sources and sinks.
We might imagine that ideally, a user of the system should use an ordinary Java enabled browser and this would be all that he or she should need in a Java network video application. An applet loaded into Netscape from a remote host would get the video image of the user, thanks to standard Java classes in the browser (supported by native libraries), and send user's video to a destination (obviously with the user's permission to do so). Unfortunately, at this moment, the Java packages do not provide anything that would enable getting images off cameras and sound off microphones. The only way to deal with it now is to use native methods and hook them up to Java code. This approach, however, introduces a substantial flaw into such Java software: classes containing native methods cannot be loaded by most popular classloaders in browsers due to strict security restrictions. This way getting multimedia capturing and displaying code in a dynamic way from a given remote server without the support from the installed base of browser's native software turns out to be unrealistic with the applet concept. The reason is that letting to download such code into a www viewer in the form of a lightweight applet might be dangerous (not to say non-portable). Some code might try to mimic VTE system and try to execute native (unverified) code, possibly breaching security at a host. Indeed, were it made possible, the Java security model would be seriously deficient.
This observation of the limitation the security provision inevitably enforces, provides the rationale behind introducing at least a two layer architecture into a web based Java video system. The sensible way out for video system implementors is providing native libraries on each host and creating a server that would provide video capabilities for all locally started applets and applications. The server could be a Java application doing the real work in native methods or it could simply be an application written entirely in C++, which should be the better implementation choice dictated by performance reasons. This necessity to seperate out native (C and C++) code out of an applet code coincides with the benefit that is brought to an application that has a very clear boundary between graphical user interface objects and functional objects. VTE system follows the obvious observation that precise separation between all software components (i.e. not just GUIs and the back end) renders yet even clearer architecture. A natural consequence of this is employing OMG's CORBA [2] IDL definition language to sketch out all the services offered by system components. After approaching the problem this way, a CORBA audio and video server would supplement the deficiency in services offered by the native code, accessible to applets, through Java(Netscape)Native Interface[3] and shipped with browser libraries.
An Object Request Broker also seems to provide a splendid solution as a framework for coordinating and managing the objects interacting in the VTE video system, thus addressing the second problem of the multimedia system design which is the need for managing the system components.
The following sections depict the implementation of most of the objects in a general way, presenting their capabilities and limitations.

High Level Overview of the System
Implementation obstacles imposed on developers by the integration of video handling code into Java applets are reflected by the layered structure of the VTE system. The two layers (Browser Layer and ConnectionRequester Layer) can be thought of as entirely Java layers. The AudioVideoSplitController layer can be seen as native code (for example C++ written) layer.
VTE system utilises VisiGenic's VisiBroker for Java[5] ORB to manage its components in a network. A sensible alternative might be basing the application on top of Iona's OrbixWeb[6] CORBA 2.0 implementation. Unfortunately, at the time the team set out to implement the system, OrbixWeb didn't support server side mapping for Java. There is however no reason, why some of the components in the VTE system couldn't be built on top of OrbixWeb, while some of them on top of VisiBroker, yet couldn't be fully interoperable, thanks to the CORBA 2.0 IIOP protocol.
As a consequence of the chosen structure, every audio and video device server (AudioVideoSplitController) in VTE must be installed on a host prior to using it as a multimedia access point. Right now, there seems to be no other elegant solution than to provide video manging facilities to clients through a neatly designed "native" CORBA server.
The picture presenting the overview of the VTE system reveals that there is a ConnectionRequester server object in the middle layer between the separated out AudiVideoSplitController and Java applet code executing in browser. It performs an extremely important function of coordinating, validating, registering and possibly authenticating communication between applets and AudioVideoSplitController. It is also the server that rejects or accepts connection requests and decides if they should be handed over to AudioVideoSplitController or simply discarded. Localization Site is also included in the middle layer. It provides localization information about the users of the system.

A High Level Outlook on the Layered Structure of the VTE System

The next horizontal layer in the VTE system is the user interface web based layer, acting mainly as a client level to the layer containing ConnectionRequester and Localization Site. The applet GUIs communicate with the higher level through CORBA mechanisms. All of them can be executed either as applets or standalone applications and they all act as "stateless" software code. Most of the VTE's object interfaces are contained in the system's main module named VideoPackage. Almost all components in the system interact in terms of operations defined in this module and exchange data structures defined within it .

module VideoPackage

{

//some more code defining structs ....

_{interface AudioVideoSplitController}_{
_{OpenInfoClass open_av_device();}
_{CloseInfoClass close_av_device();}
_{oneway void transmit(in TransmitInfoClass
bic);}
_{oneway void receive(in ReceiveInfoClass
ric);}
_{oneway void change_packet_size(in SizeChange
sc);}

_//.
_{.//..less important code };}

_{interface ConnectionRequester}_{
_{boolean request_connection_from(in ReceiveInfoClass
ric,in TransmitInfoClass bic);}
_{boolean accept_connection_from(in ReceiveInfoClass
ric,in TransmitInfoClass bic);}
_{oneway void change_transmit_param(in TransmitInfoClass
bic);}
_{oneway void change_receive_param(in ReceiveInfoClass
ric);}
_{boolean allow_connection(in string username);}
_{boolean deny_connection(in string username);}
_{boolean remove_queued_connection_request(in
string username);}
_{boolean register_listener(in string username,in
string ior);}
_{boolean unregister_listener(in string username,in
string ior); };}

_{interface ConnectionEstablishCallback}_{
_{boolean request_connection_from(in ReceiveInfoClass
ric,in TransmitInfoClass bic); };}

//some more code

};

Cooperation of the Web Oriented GUI and Native Layers Using the ConnectionRequester Layer

The structuring of the system makes the implementation of user interfaces and functional code separate. It is worth to catch a glipmse of the interaction between VTE's layers and spot the central role of the ConnectionRequester layer with its persistent functionality. ConnectionRequester acts as an entity relaying requested attributes of a video connection, set up by user GUIs in applets (like "I want to open a connection to my friend on host tulip (149.156.97.25) with Jpeg video and linear PCM 44.1Khz sound"). It receives the parameters in CORBA calls from a client applet GUI and, after authorizing access, passes them up to AudioVideoSplitController for further execution. The piece of applet Java code loaded into browsers performs vital role by passing commands from the user, but it is also needed to register a user as a listener of a ConnectionRequester and required to unregister when his or her browser exits. Client applets register by handing a ConnectionEstablishCallback object's (which is executing within them as a lightweight thread) IOR on to the local ConnectionRequester. Not only do they act as clients for ConnectionRequester when requesting connection to someone, but they also provide servers within browsers that are called up by ConnectionRequester, when an INCOMING connection request is put in the ConnectionRequester's queue. This way GUIs also serve to "wake up" the users of VTE. Obviously, this requires that a proper applet web page is viewed but a nice trick with a constantly executing applet thread is also sometimes possible, notifying one of a requested connection, even if he or she changes web pages.

Let us imagine that user A chooses to talk to user B and follow a scenario without bogging down into unnecessary details. User's A applet calls request_connection_from on its local ConnectionRequester server object, passing data structures filled with info describing the needed connection. The local A's ConnectionRequester server object invokes accept_connection_from on the B host's ConnectionRequester thus putting a connection reqest in its queue. Now, B's ConnectionRequester calls request_connection_from on the local ConnectionEstablishCallback server object in B user's applet. Subsequently, ConnectionEstablishCallback notifies the B user of the need to invoke either allow_connection or deny_connection on its local ConnectionRequester. Depending on the decision, the ConnectionRequester triggers calling up operations open_av_device, transmit and receive (which send outbound audio/video stream and accept inbound audio/video) on AudioVideoSplitController server object or it simply triggers removing a request from the queue by calling remove_queued_connection_request on the local ConnectionRequester server object. When a connection is accepted AudioVideoSplitContollers exchange audio/video information. ConnectionRequester is designed as a fully multithreaded Java server application and also greatly benefits from the underlying ORB's threaded environment in servicing clients' requests and maintaining the behavioral consistency of the VTE system. It becomes evident especially when user A accesses VTE simultaneously in all 7 browsers started on the host by him or her and in each browser asks for the same connection but passing contradicting parameters it is to be initiated with. As opposed to "stateless"applet GUI code, ConnectionRequester acts as a persistent "storage server", tracking down video sessions awaiting connection and the sessions currently active.

Localization Facilities Incorporated in VTE and Applet GUI
Without doubt video accesibility is only part of the real accesibility challange. The other part of the task that a system must be able to handle is to easily find the person one is looking for. Java with its mobile facilities and dynamic code reloading seems to also provide some excellent mechanism to tackle the problem. This especially holds true when we take into the account that the VTE system is designed to work on the Java/CORBA driven platform with IBM's Aglets[8] and Active Badge System(ABS) from Olivetti/Oracle Research Laboratory[7], Cambridge, UK. In the VTE, the localization of a collocutor can be established in two distinct ways. One of them, the most rudimentary, is giving an email address of the person one wants to talk to. Obviously, this approach is extremely inefficient as one may not have valid information about a person's location (esp. which workstation/computer the person is currently working at). This option is provided rather for those who languish for a modern replacement for the ubiquitous talk utility on the Unix systems.
The other, much more sophisticated way of locating users, is ORL's ABS system coupled with IBM's aglets-mobile Java pieces of code that can find the requested information for us. Active Badge System equips its users with small badges that can be located by groups of sensors dispersed in all parts of a building. The ABS sensors and software can track people's location within a building. Obviously, ABS system is usable only in places where sensors are located.
VTE is being extended to even more tightly integrate with the ABS system, not only to establish physical location of potential collocutors, but also to collect some info (like discerning if John is moving fast between rooms) about their current activities from aglets, raoming a set of special servers. Each of the servers is responsible for managing a collection of sensors that intercept signals from badges held by every member of the staff in the office or in a building. The Localization Site acts as a server initiating lighweight aglets movement in search of information and as a server controlling the retrieval of the information gleaned by the mobile code. To obtain this information a user interface client applet consults it using CORBA calls and builds a tree depicting the current location of the users in the local domain. Right now, the Java user interface provides clickable names of hosts at which one may ask a video connection to a given person.

Java VTE User Interface with Custom Crafted Java GUI Components

The graphical user interfaces used by the VTE system provide the mechanism for the interaction needed between the web based and the middle system layer. They all act as "stateless" entities. Java turns out to be an excellent tool and provides facilities that enable creating packages with specialised components like knobs, trees and custom drawn list, thus making the interaction with the system surpass in convenience alternative solutions that might use cumbersome(:() html forms. Also, the VTE 's graphical user components are designed to conform to the requirements stipulated by The Java Beans[4] specification.

Audio and Video Handling Native Layer
In essence, AudioVideoSplitController is the only part of the system responsible for creating, directing (splitting) multimedia stream to recipients and retrieving the streams incoming . One can ask if it can be efficiently handled by interpreted Java code even accelerated by Just In Time compilers. Based on the experience garnered during the implementation of VTE system the answer is that it certainly can, as almost all audio/video code is executed within native, video, compiled C libraries anyway (like XIL on Solaris for example). However, there is indeed no reason to implement it in Java as the necessity to access video devices will always force it to relay real work to native code. This approach might seem inevitable also for performance reasons, but in theory there are no obstacles to use Java JIT accelerated code with the fundamental work done in native methods at this layer too.
The VTE's AudioVideoSplitController implementation is capable of handling several different types of audio and video formats unavailable in standard Java libraries. In its current state it can use linear PCM, u-law, and A-law audio encoding together with Jpeg or CellB video de/compression standards. Unfortunately, VTE system currently doesn't support MPEG-1 compression standard, although MPEG-1 decompression is supported.

Video displaying is handled by native code while the applets provide GUI to localization

facilities and relay user requests to management objects in the middle VTE layer.

The GUI also easily runs in Java enabled browsers.

Currently the VTE system's transport protocol used to transfer data is UDP/IP connectionless protocol, acting as a "plumbing" approach to the rest of the system which is structured on top of CORBA protocol. One might also suggest basing the multimedia data transport work entirely on CORBA with its underlying IIOP protocol. There is however a simple difference that must be noticed when developing a real motion video application and a traditional networked one: building video applications on top of inherently request/reply computational model like CORBA, inevitably requires implementing audio and video delivery outside CORBA requests. ORB based model suits perfectly well in most cases when searching databases, managing objects (like in this web application's case) and performing tasks on a request/reply basis is needed, but modern video applications require much more than just this kind of behavior. They need streaming capabilities and call for providing sensible Quality of Service in the delivery of data. It is a generally well-known fact that multimedia systems require substantial support from the underlying transport and signalling protocols to provide QoS. The IIOP protocol doesn't currently equip CORBA with any streaming capabilities. Considering that, one must openly admit that, without introducing proprietary extensions to CORBA, one cannot do without the so-called "plumbing approach" i.e. handling audio video transmission on a different network protocol layer.
There is currently no standard protocol we know of on the internet, providing multimedia programmers with the ability to control and negotiate the quality of service of a given stream of multimedia data . Even ATM networks existing right now cannot be used in such a way that programmers may use the signalling benefits of a QoS oriented network, unless they delve deep into the native ATM mode and thus step out of code usable on the internet.
Due to lack of real multimedia oriented standard protocols VTE's team implemented a test version in which servers exchanged audio and video using oneway CORBA operations. Unfortunately, some CORBA implementations seem to execute some subsequent oneway operations in a LIFO fashion. It resulted in the need to build yet another kind of "sequencing protocol" and introduced considerable jitter into the video stream, confirming the notion that applying CORBA request/reply model is not appropriate when streaming time dependent data transfer is required. Confirming the correctness of the notion about uselessness of IIOP in transferring video data pinpointed in the VTE's test implementation is the fact that oneway CORBA calls may not be asynchronous in the reality, since the CORBA specification clearly permits the ORB to even block while sending a oneway call. Such behavior is highly undesirable when high speed video is transmitted and displayed.
There are generally two approaches to providing QoS in multimedia applications: reservation of resources and adaptation to operating under existing resources. Obviously, UDP/IP used by VTE provides no QoS, not to say any sensible "multimedia orientation " approach with reservation of resources, but it definitely doesn't lock us out of the internet like native ATM network programming. Hence the VTE designers' decision, to at least provide an illusion of real QoS, by enabling the user to adapt the system to the existing resources at run time, when he or she gets annoyed by frequent frame dropping or unbearable sound jitter. In this way one can lower the number of colors displayed, switch to smaller picture size and decrease the quality of sound transmitted. The VTE system's AudioVideoSplitController tries also to provide sensible "QoS" (but of course it is only a substitute for it) maintaining synchronization between audio and video by inserting voice and images together into about 1000 bytes long packets that are assembled at the receiving host's AudioVideoSplitController into a bigger chunk of data. The chunk of data is immediately decompressed and displayed on the screen (and the sound played) using native function calls. This approach virtually cancels out the so-often terrible synchronization problem found in many multimedia systems sending audio and video separately. They often work excellent on LANs, but scaling them to larger networks makes it almost impossible to see the lips moving in sync with the sound.
The AudioVideoSplitController interface includes also operations enabling to change the size of a packet being sent. Although increasing the packet size vastly improves network performance, it may introduce substantial jitter into multimedia data flow. Obviously, all this also depends on the type of data being sent and how fast the sources generate them. For example, when a VTE connection is configured for 16-bit stereo, linear PCM, sampled at 44.1Khz audio sound, it requires approximately 10MB of data per 60 second time frame. In this case the size of the packet could be increased without almost any sound jitter perceived by the user. However, 8-bit monoaural, u-law encoded audio at 8khz sampling would be unbearable when sent in 20KB data chunks-the delay needed to fill audio buffers (after which data is sent) would be noticable and annoying.
There is, however some hope that the IPv6 will provide video application programmers with a standard-RSVP reservation protocol. A notice worth is also the fact that RSVP messages may also be carried in UDP datagrams. One might cherish hope that all this could provide solid ground for delivering QoS by reservation on the internet in the future. IDL defined layered structure should make changes to the VTE system much easier when an opportunity appears.

Concluding Remarks
Definitely a growing number of applications will require more and more networked multimedia capabilities. Java seems to path the way in an attempt to create a neutral web based software platform upon which portable software solutions can be based. However, before one contrives to detach oneself from writing non-portable, proprietary code many Java libraries will have to become available and the set of native methods delivered with Java classes will have to become much more extensive. There is also definitely a lot of standarization to be done in the area of networking itself bearing in mind multimedia requirements imposed by audio and video applications, as the UDP solution on top of IPv4, commonly met in the industry, exhibits substantial flaws in this area. The VTE system clearly portrays the observation, that right now, there is no way to build a Java video system with no proprietary native code. The only remedy is to encapsulate it, preferably, in IDL interfaces, thus making the rest of the application independent of their implementation, so that they can be replaced with better solutions in the days to come. Till the day better solutions become a reality, one will have to also rely on software components that cannot be simply installed on the fly.
Java is an extremely promising technology but there is still a lot to be added to it before one can immerse oneself in writing totally non-proprietary web based code with blissful ignorance of C .

References:

[1]The Java Language Specification, James Gosling, Bil Joy, Guy Steele,
Addison Wesley, ISBN 0-201-63451-1 1 2 3 4 5 6 7 8 9-MA-99989796
First printing, August 1996

[2]The Common Object Request Broker: Architecture and Specification,
Object Management Group, 2.0 edition July 1995

[3]The Java Native Interface Specification, Sheng Liang,
Sun Microsystems, Inc. 2550 Garcia Avenue, Mountain View, CA 94043-1100 U.S.A.

[4]The Java Beans Specification,
Sun Microsystems, Inc. 2550 Garcia Avenue, Mountain View, CA 94043-1100, USA

[5]Visigenic's VisiBroker for Java Reference Manuals,
Visigenic Software, Inc., 951 Mariner's Island Blvd., Suite 120, San Mateo, CA 94404, USA

[6]OrbixWeb for Java,
Iona Technologies Ltd., The Iona Building, 8-10 Pembroke St. 2, Dublin, Ireland

[7]The Active Badge Location System, Roy Want, Andy Hopper, Veronica Falcao, Jonathon Gibbons, Olivetti Research Ltd, 24a Trumpington Street, Cambridge CB2 1QA, England

[8] Programming Mobile Agents in Java, Danny B. Lange and Daniel T. Chang, IBM Corporation, September 9, 1996