Abstract
Nothing probably gets people more accessibility
than the ability to be everywhere without actually being there ever. The
article presents a general overview of a Java driven, distributed framework
for networked real motion multimedia on the world wide web. It presents
the obstacles and difficulties one must inevitably face to make real video,
web centric, network Java applications a reality. The paper is primarily
intended to outline one of the most, as it is perceived, feasible ways
to meet the challange at this stage of Sun's Java technology. It emphasizes
the benefits brought and limitations imposed by Java on the architecture
of a multimedia system. It sketches out the ways the designers of Java
VideoTalk Environment (aka VTE) system tried to exploit the former and
to overcome the latter. The article also portrays an attempt to build the
entire video system on top of OMG's CORBA computational model, facilitating
enormously VTE 's interoperability with ABS from Oracle/Olivetti Research
Laboratory, Cambridge, UK and, to a certain degree, with IBM's Aglets.
Keywords: CORBA, Java, ORL's ABS, Streaming
Audio and Video, IBM 's Aglets
Introduction
The demand for audio and video existence on
the web seems to grow, especially since the day when Java[1] mesmerised
web users by offering easy and straightforward executable applet code download
and execution across many different platforms and environments. The interest
sparked by the promise of much more lively web pages and the omnipresence
of proclaiming Java as the web and multimedia oriented language somewhat
diminished when it turned out that the only multimedia support, for the
time being, was cartoon animation and that real video libraries were scheduled
for 1997. Starting with SunSoft's alpha3 version of Java a group at UMM
embarked upon integrating video and audio capabilities into Java software.
Implementing a real time networked video system
is not a trivial task. An implementation often requires addressing many
levels of software development, ranging from writing low level code (like
code getting the images off the video devices) up to higher software levels
(like APIs needed for harnessing the need for initiating connections and,
in general, controlling the system's execution flow). To tackle the problem
of delivering real time video to the web efficiently, one must be able
to provide viable solutions in all of the software areas.
Sun's Java appearance seems to be an attempt
to provide an important answer to the need for founding a common software
platform, making the high level code portable. However, its current state
of technology offers very modest programming facilities that could serve
as a vehicle for real motion video applications of practical use, thus
effectively bridging the accessibility gap between web users . VideoTalk
Environment development team's aim was the creation of a flexible system,
bringing audio and video accessibility to the user within a corporate LAN
or, as work progresses, to wider audience, in each case, from within an
ordinary Java enabled web browser.
Design Problems
During the development of a Java video system
today, one encounters three main areas that in the reality shape the overall
Java multimedia system's structure. First of them is the difficulty
of seamless integration of video handling code with Java applet software,
the second one is the selection of the framework used to provide all
the facilities needed to harness the complexities of the system (like
initiating/closing connections, handling communication errors), and the
third one is the choice of transport protocol used to transfer data
between video sources and sinks.
We might imagine that ideally, a user of the
system should use an ordinary Java enabled browser and this would be all
that he or she should need in a Java network video application. An applet
loaded into Netscape from a remote host would get the video image of the
user, thanks to standard Java classes in the browser (supported by native
libraries), and send user's video to a destination (obviously with the
user's permission to do so). Unfortunately, at this moment, the Java packages
do not provide anything that would enable getting images off cameras and
sound off microphones. The only way to deal with it now is to use native
methods and hook them up to Java code. This approach, however, introduces
a substantial flaw into such Java software: classes containing native methods
cannot be loaded by most popular classloaders in browsers due to strict
security restrictions. This way getting multimedia capturing and displaying
code in a dynamic way from a given remote server without the support from
the installed base of browser's native software turns out to be unrealistic
with the applet concept. The reason is that letting to download such code
into a www viewer in the form of a lightweight applet might be dangerous
(not to say non-portable). Some code might try to mimic VTE system and
try to execute native (unverified) code, possibly breaching security at
a host. Indeed, were it made possible, the Java security model would be
seriously deficient.
This observation of the limitation the security
provision inevitably enforces, provides the rationale behind introducing
at least a two layer architecture into a web based Java video system. The
sensible way out for video system implementors is providing native libraries
on each host and creating a server that would provide video capabilities
for all locally started applets and applications. The server could be a
Java application doing the real work in native methods or it could simply
be an application written entirely in C++, which should be the better implementation
choice dictated by performance reasons. This necessity to seperate out
native (C and C++) code out of an applet code coincides with the benefit
that is brought to an application that has a very clear boundary between
graphical user interface objects and functional objects. VTE system follows
the obvious observation that precise separation between all software components
(i.e. not just GUIs and the back end) renders yet even clearer architecture.
A natural consequence of this is employing OMG's CORBA [2] IDL definition
language to sketch out all the services offered by system components. After
approaching the problem this way, a CORBA audio and video server would
supplement the deficiency in services offered by the native code, accessible
to applets, through Java(Netscape)Native Interface[3] and shipped with
browser libraries.
An Object Request Broker also seems to provide
a splendid solution as a framework for coordinating and managing the objects
interacting in the VTE video system, thus addressing the second problem
of the multimedia system design which is the need for managing the system
components.
The following sections depict the implementation
of most of the objects in a general way, presenting their capabilities
and limitations.
High Level Overview of the System
Implementation obstacles imposed on developers
by the integration of video handling code into Java applets are reflected
by the layered structure of the VTE system. The two layers (Browser Layer
and ConnectionRequester Layer) can be thought of as entirely Java layers.
The AudioVideoSplitController layer can be seen as native code (for example
C++ written) layer.
VTE system utilises VisiGenic's VisiBroker
for Java[5] ORB to manage its components in a network. A sensible alternative
might be basing the application on top of Iona's OrbixWeb[6] CORBA 2.0
implementation. Unfortunately, at the time the team set out to implement
the system, OrbixWeb didn't support server side mapping for Java. There
is however no reason, why some of the components in the VTE system couldn't
be built on top of OrbixWeb, while some of them on top of VisiBroker, yet
couldn't be fully interoperable, thanks to the CORBA 2.0 IIOP protocol.
As a consequence of the chosen structure,
every audio and video device server (AudioVideoSplitController) in VTE
must be installed on a host prior to using it as a multimedia access point.
Right now, there seems to be no other elegant solution than to provide
video manging facilities to clients through a neatly designed "native"
CORBA server.
The picture presenting the overview of the
VTE system reveals that there is a ConnectionRequester server object in
the middle layer between the separated out AudiVideoSplitController and
Java applet code executing in browser. It performs an extremely important
function of coordinating, validating, registering and possibly authenticating
communication between applets and AudioVideoSplitController. It is also
the server that rejects or accepts connection requests and decides if they
should be handed over to AudioVideoSplitController or simply discarded.
Localization Site is also included in the middle layer. It provides localization
information about the users of the system.
A High Level Outlook on the Layered Structure of the VTE System
The next horizontal layer in the VTE system
is the user interface web based layer, acting mainly as a client level
to the layer containing ConnectionRequester and Localization Site. The
applet GUIs communicate with the higher level through CORBA mechanisms.
All of them can be executed either as applets or standalone applications
and they all act as "stateless" software code. Most of the VTE's object
interfaces are contained in the system's main module named VideoPackage.
Almost all components in the system interact in terms of operations defined
in this module and exchange data structures defined within it .
//some more code defining structs ....
interface AudioVideoSplitController{
OpenInfoClass open_av_device();
CloseInfoClass close_av_device();
oneway void transmit(in TransmitInfoClass
bic);
oneway void receive(in ReceiveInfoClass
ric);
oneway void change_packet_size(in SizeChange
sc);
//.
.//..less important code };
interface ConnectionRequester{
boolean request_connection_from(in ReceiveInfoClass
ric,in TransmitInfoClass bic);
boolean accept_connection_from(in ReceiveInfoClass
ric,in TransmitInfoClass bic);
oneway void change_transmit_param(in TransmitInfoClass
bic);
oneway void change_receive_param(in ReceiveInfoClass
ric);
boolean allow_connection(in string username);
boolean deny_connection(in string username);
boolean remove_queued_connection_request(in
string username);
boolean register_listener(in string username,in
string ior);
boolean unregister_listener(in string username,in
string ior); };
interface ConnectionEstablishCallback{
boolean request_connection_from(in ReceiveInfoClass
ric,in TransmitInfoClass bic); };
//some more code
};
Cooperation of the Web Oriented GUI and Native
Layers Using the ConnectionRequester Layer
The structuring of the system makes the implementation
of user interfaces and functional code separate. It is worth to catch a
glipmse of the interaction between VTE's layers and spot the central role
of the ConnectionRequester layer with its persistent functionality. ConnectionRequester
acts as an entity relaying requested attributes of a video connection,
set up by user GUIs in applets (like "I want to open a connection to my
friend on host tulip (149.156.97.25) with Jpeg video and linear PCM 44.1Khz
sound"). It receives the parameters in CORBA calls from a client applet
GUI and, after authorizing access, passes them up to AudioVideoSplitController
for further execution. The piece of applet Java code loaded into browsers
performs vital role by passing commands from the user, but it is also needed
to register a user as a listener of a ConnectionRequester and required
to unregister when his or her browser exits. Client applets register by
handing a ConnectionEstablishCallback object's (which is executing within
them as a lightweight thread) IOR on to the local ConnectionRequester.
Not only do they act as clients for ConnectionRequester when requesting
connection to someone, but they also provide servers within browsers that
are called up by ConnectionRequester, when an INCOMING connection request
is put in the ConnectionRequester's queue. This way GUIs also serve to
"wake up" the users of VTE. Obviously, this requires that a proper applet
web page is viewed but a nice trick with a constantly executing applet
thread is also sometimes possible, notifying one of a requested connection,
even if he or she changes web pages.
Let us imagine that user A chooses to talk
to user B and follow a scenario without bogging down into unnecessary details.
User's A applet calls request_connection_from on its local ConnectionRequester
server object, passing data structures filled with info describing the
needed connection. The local A's ConnectionRequester server object invokes
accept_connection_from on the B host's ConnectionRequester thus
putting a connection reqest in its queue. Now, B's ConnectionRequester
calls request_connection_from on the local ConnectionEstablishCallback
server object in B user's applet. Subsequently, ConnectionEstablishCallback
notifies the B user of the need to invoke either allow_connection
or deny_connection on its local ConnectionRequester. Depending on
the decision, the ConnectionRequester triggers calling up operations open_av_device,
transmit and receive (which send outbound audio/video stream
and accept inbound audio/video) on AudioVideoSplitController
server object or it simply triggers removing
a request from the queue by calling remove_queued_connection_request
on the local ConnectionRequester server object. When a connection is accepted
AudioVideoSplitContollers exchange audio/video information. ConnectionRequester
is designed as a fully multithreaded Java server application and also greatly
benefits from the underlying ORB's threaded environment in servicing clients'
requests and maintaining the behavioral consistency of the VTE system.
It becomes evident especially when user A accesses VTE simultaneously in
all 7 browsers started on the host by him or her and in each browser asks
for the same connection but passing contradicting parameters it is to be
initiated with. As opposed to "stateless"applet GUI code, ConnectionRequester
acts as a persistent "storage server", tracking down video sessions awaiting
connection and the sessions currently active.
Localization Facilities Incorporated in
VTE and Applet GUI
Without doubt video accesibility is only part
of the real accesibility challange. The other part of the task that a system
must be able to handle is to easily find the person one is looking for.
Java with its mobile facilities and dynamic code reloading seems to also
provide some excellent mechanism to tackle the problem. This especially
holds true when we take into the account that the VTE system is designed
to work on the Java/CORBA driven platform with IBM's Aglets[8] and Active
Badge System(ABS) from Olivetti/Oracle Research Laboratory[7], Cambridge,
UK. In the VTE, the localization of a collocutor can be established in
two distinct ways. One of them, the most rudimentary, is giving an email
address of the person one wants to talk to. Obviously, this approach is
extremely inefficient as one may not have valid information about a person's
location (esp. which workstation/computer the person is currently working
at). This option is provided rather for those who languish for a modern
replacement for the ubiquitous talk utility on the Unix systems.
The other, much more sophisticated way of
locating users, is ORL's ABS system coupled with IBM's aglets-mobile Java
pieces of code that can find the requested information for us. Active Badge
System equips its users with small badges that can be located by groups
of sensors dispersed in all parts of a building. The ABS sensors and software
can track people's location within a building. Obviously, ABS system is
usable only in places where sensors are located.
VTE is being extended to even more tightly
integrate with the ABS system, not only to establish physical location
of potential collocutors, but also to collect some info (like discerning
if John is moving fast between rooms) about their current activities from
aglets, raoming a set of special servers. Each of the servers is responsible
for managing a collection of sensors that intercept signals from badges
held by every member of the staff in the office or in a building. The Localization
Site acts as a server initiating lighweight aglets movement in search of
information and as a server controlling the retrieval of the information
gleaned by the mobile code. To obtain this information a user interface
client applet consults it using CORBA calls and builds a tree depicting
the current location of the users in the local domain. Right now, the Java
user interface provides clickable names of hosts at which one may ask a
video connection to a given person.

Java VTE User Interface with Custom
Crafted Java GUI Components
The graphical user interfaces used by the VTE
system provide the mechanism for the interaction needed between the web
based and the middle system layer. They all act as "stateless" entities.
Java turns out to be an excellent tool and provides facilities that enable
creating packages with specialised components like knobs, trees and custom
drawn list, thus making the interaction with the system surpass in convenience
alternative solutions that might use cumbersome(:() html forms. Also, the
VTE 's graphical user components are designed to conform to the requirements
stipulated by The Java Beans[4] specification.
Audio and Video Handling Native Layer
In essence, AudioVideoSplitController is the
only part of the system responsible for creating, directing (splitting)
multimedia stream to recipients and retrieving the streams incoming . One
can ask if it can be efficiently handled by interpreted Java code even
accelerated by Just In Time compilers. Based on the experience garnered
during the implementation of VTE system the answer is that it certainly
can, as almost all audio/video code is executed within native, video, compiled
C libraries anyway (like XIL on Solaris for example). However, there is
indeed no reason to implement it in Java as the necessity to access video
devices will always force it to relay real work to native code. This approach
might seem inevitable also for performance reasons, but in theory there
are no obstacles to use Java JIT accelerated code with the fundamental
work done in native methods at this layer too.
The VTE's AudioVideoSplitController implementation
is capable of handling several different types of audio and video formats
unavailable in standard Java libraries. In its current state it can use
linear PCM, u-law, and A-law audio encoding together with Jpeg or CellB
video de/compression standards. Unfortunately, VTE system currently doesn't
support MPEG-1 compression standard, although MPEG-1 decompression is supported.
Video displaying is handled by native
code while the applets provide GUI to localization
facilities and relay user requests
to management objects in the middle VTE layer.
The GUI also
easily runs in Java enabled browsers.
Currently the VTE system's transport protocol
used to transfer data is UDP/IP connectionless protocol, acting as a "plumbing"
approach to the rest of the system which is structured on top of CORBA
protocol. One might also suggest basing the multimedia data transport work
entirely on CORBA with its underlying IIOP protocol. There is however a
simple difference that must be noticed when developing a real motion video
application and a traditional networked one: building video applications
on top of inherently request/reply computational model like CORBA, inevitably
requires implementing audio and video delivery outside CORBA requests.
ORB based model suits perfectly well in most cases when searching databases,
managing objects (like in this web application's case) and performing tasks
on a request/reply basis is needed, but modern video applications require
much more than just this kind of behavior. They need streaming capabilities
and call for providing sensible Quality of Service in the delivery of data.
It is a generally well-known fact that multimedia systems require substantial
support from the underlying transport and signalling protocols to provide
QoS. The IIOP protocol doesn't currently equip CORBA with any streaming
capabilities. Considering that, one must openly admit that, without introducing
proprietary extensions to CORBA, one cannot do without the so-called "plumbing
approach" i.e. handling audio video transmission on a different network
protocol layer.
There is currently no standard protocol we
know of on the internet, providing multimedia programmers with the ability
to control and negotiate the quality of service of a given stream of multimedia
data . Even ATM networks existing right now cannot be used in such a way
that programmers may use the signalling benefits of a QoS oriented network,
unless they delve deep into the native ATM mode and thus step out of code
usable on the internet.
Due to lack of real multimedia oriented standard
protocols VTE's team implemented a test version in which servers exchanged
audio and video using oneway CORBA operations. Unfortunately, some CORBA
implementations seem to execute some subsequent oneway operations in a
LIFO fashion. It resulted in the need to build yet another kind of "sequencing
protocol" and introduced considerable jitter into the video stream, confirming
the notion that applying CORBA request/reply model is not appropriate when
streaming time dependent data transfer is required. Confirming the correctness
of the notion about uselessness of IIOP in transferring video data pinpointed
in the VTE's test implementation is the fact that oneway CORBA calls may
not be asynchronous in the reality, since the CORBA specification clearly
permits the ORB to even block while sending a oneway call. Such behavior
is highly undesirable when high speed video is transmitted and displayed.
There are generally two approaches to providing
QoS in multimedia applications: reservation of resources and adaptation
to operating under existing resources. Obviously, UDP/IP used by VTE
provides no QoS, not to say any sensible "multimedia orientation " approach
with reservation of resources, but it definitely doesn't lock us out of
the internet like native ATM network programming. Hence the VTE designers'
decision, to at least provide an illusion of real QoS, by enabling the
user to adapt the system to the existing resources at run time, when he
or she gets annoyed by frequent frame dropping or unbearable sound jitter.
In this way one can lower the number of colors displayed, switch to smaller
picture size and decrease the quality of sound transmitted. The VTE system's
AudioVideoSplitController tries also to provide sensible "QoS" (but of
course it is only a substitute for it) maintaining synchronization between
audio and video by inserting voice and images together into about 1000
bytes long packets that are assembled at the receiving host's AudioVideoSplitController
into a bigger chunk of data. The chunk of data is immediately decompressed
and displayed on the screen (and the sound played) using native function
calls. This approach virtually cancels out the so-often terrible synchronization
problem found in many multimedia systems sending audio and video separately.
They often work excellent on LANs, but scaling them to larger networks
makes it almost impossible to see the lips moving in sync with the sound.
The AudioVideoSplitController interface includes
also operations enabling to change the size of a packet being sent. Although
increasing the packet size vastly improves network performance, it may
introduce substantial jitter into multimedia data flow. Obviously, all
this also depends on the type of data being sent and how fast the sources
generate them. For example, when a VTE connection is configured for 16-bit
stereo, linear PCM, sampled at 44.1Khz audio sound, it requires approximately
10MB of data per 60 second time frame. In this case the size of the packet
could be increased without almost any sound jitter perceived by the user.
However, 8-bit monoaural, u-law encoded audio at 8khz sampling would be
unbearable when sent in 20KB data chunks-the delay needed to fill audio
buffers (after which data is sent) would be noticable and annoying.
There is, however some hope that the IPv6
will provide video application programmers with a standard-RSVP reservation
protocol. A notice worth is also the fact that RSVP messages may also be
carried in UDP datagrams. One might cherish hope that all this could provide
solid ground for delivering QoS by reservation on the internet in the future.
IDL defined layered structure should make changes to the VTE system much
easier when an opportunity appears.
Concluding Remarks
Definitely a growing number of applications
will require more and more networked multimedia capabilities. Java seems
to path the way in an attempt to create a neutral web based software platform
upon which portable software solutions can be based. However, before one
contrives to detach oneself from writing non-portable, proprietary code
many Java libraries will have to become available and the set of native
methods delivered with Java classes will have to become much more extensive.
There is also definitely a lot of standarization to be done in the area
of networking itself bearing in mind multimedia requirements imposed by
audio and video applications, as the UDP solution on top of IPv4, commonly
met in the industry, exhibits substantial flaws in this area. The VTE system
clearly portrays the observation, that right now, there is no way to build
a Java video system with no proprietary native code. The only remedy is
to encapsulate it, preferably, in IDL interfaces, thus making the rest
of the application independent of their implementation, so that they can
be replaced with better solutions in the days to come. Till the day better
solutions become a reality, one will have to also rely on software components
that cannot be simply installed on the fly.
Java is an extremely promising technology
but there is still a lot to be added to it before one can immerse oneself
in writing totally non-proprietary web based code with blissful ignorance
of C .
References:
[1]The
Java Language Specification, James Gosling, Bil Joy, Guy Steele,
Addison Wesley, ISBN 0-201-63451-1 1 2 3 4
5 6 7 8 9-MA-99989796
First printing, August 1996
[2]The
Common Object Request Broker: Architecture and Specification,
Object Management Group, 2.0 edition
July 1995
[3]The
Java Native Interface Specification, Sheng Liang,
Sun Microsystems, Inc. 2550 Garcia Avenue,
Mountain View, CA 94043-1100 U.S.A.
[4]The
Java Beans Specification,
Sun Microsystems, Inc. 2550 Garcia Avenue,
Mountain View, CA 94043-1100, USA
[5]Visigenic's
VisiBroker for Java Reference Manuals,
Visigenic Software, Inc., 951 Mariner's Island
Blvd., Suite 120, San Mateo, CA 94404, USA
[6]OrbixWeb
for Java,
Iona Technologies Ltd., The Iona Building,
8-10 Pembroke St. 2, Dublin, Ireland
[7]The
Active Badge Location System, Roy Want, Andy Hopper, Veronica Falcao,
Jonathon Gibbons, Olivetti Research Ltd, 24a Trumpington Street, Cambridge
CB2 1QA, England
[8]
Programming Mobile Agents in Java, Danny B. Lange and Daniel T.
Chang, IBM Corporation, September 9, 1996