The Data-Centric Modus Operandi
First Published Tuesday, 17th August 2010 02:05 pm from Real-Time Innovations (RTI) : Rick Warren
The opinions expressed by this blogger and those providing comments are theirs alone, this does not reflect the opinion of Automated Trader or any employee thereof. Automated Trader is not responsible for the accuracy of any of the information supplied by this article.
Data distribution is not messaging, and it
is not eventing. However, href="http://blogs.rti.com/2009/06/03/thinking-differently-about-messaging/">data
distribution subsumes messaging and eventing as use
cases to a large extent, and as a result it often gets lumped
into those categories.
distribution is about observing a changing world. A
system whose communication is based on this paradigm tends to
become data-centric: it becomes more
concerned with modeling the first-class concepts of its business
domain and less concerned with managing second-class
concepts like queues and messages. Along the way, it enjoys the
benefits of decreased coupling and improved reliability,
scalability, and performance.
Data Distribution and Its
messaging is an evolution of the remote
method invocation (RMI) paradigm - an attempt to make
that paradigm less coupled and more scalable by making it
asynchronous. A message says "I tell you to do
this." When compared with RMI, "I"
and "you" are more abstract, both in identity
and multiplicity, and the request can be queued for processing at
a later time or by another party without making the sender wait.
These are improvements, but the interaction remains coupled,
because the roles of "I" and
"you" (often in the guises of
"client" and "server" or
the trendier "service consumer" and
"service provider"), as well as the intention
of what action should be performed, are still very much in
Eventing, like data
distribution, is preoccupied with changes to the world. An event
says "I changed in this way." It reduces
coupling by entirely removing both the recipient of that
information and any notion of intention from you business logic
and your mental model; who might receive an event, and what they
might choose to do as a result, are not the business of the event
source. But state management remains a problem, because in order
to understand the change that occurred, all recipients must have
an up-to-date understanding of the state of the world prior to
the latest event - "the price went up by a
dollar" doesn't do me any good if I
don't know what the price was before. This temporal
coupling means that every recipient must process every event in
order, whether those events are interesting or not, just in case
the interpretation of a subsequent interesting event should
happen to require the state established by a previous
processing and state management are complex and expensive. As a
mitigation, they are frequently factored out of the applications
that need the data and into state-management
"servers" that "clients"
must query using a message-centric or even RMI-based approach
- a huge regression in engineering practice! The system
becomes complicated by the presence of multiple interacting
communication paradigms, and the servers (which serve no business
role) introduce performance and fault-tolerance choke
A data-centric architecture eliminate
these problems by simplifying the interactions. A data sample
says simply "the world is like this." It
thereby eliminates coupling not only in terms of source,
recipients, and their intentions, but also in terms of time.
There's no longer any need for recipients to process or
store information they don't care about, because
samples don't implicitly encompass previous samples.
Therefore it becomes perfectly reasonable for one observer to
examine the state of the world every second, or every minute, or
every hour - and for another to observe every single
intermediate state, even if those states change from one to the
other many times a second.
Modeling the World with
A set of DDS entities, and
the data they distribute and manage, define a view into this
- A "domain" defines the boundaries
of the world, the set of information that a collaborating group
of applications might find interesting. A "domain
participant" defines the presence of some application
in that world; it is the data-centric analogue to what is
frequently known as a "connection" in the
"type" is a structural description of some
part of the world - for example, an Antelope
is brown in color and has four legs and two horns; a Ferrari is
red in color and has four wheels and two seats. A type has a
formal definition, usually (though not always) in a declarative
language like XSD or OMG IDL, and it implies a corresponding
definition in the target programming language.
- A "quality-of-service" (QoS)
definition defines the fidelity with which some party/parties
is/are able to describe the world. For example, will the
description contain every state the world passes through or only
a subset? Will observers have access to new states of the world
only, or will they be able to see previous states as well? If the
latter, how far back will those previous states go?
- A "topic" defines some aspect or
subset of the world consisting of similar objects. As such, it
combines a type, which defines the structure of those objects,
with a QoS definition, which defines how they can be observed to
- An "instance"
defines a single object in the group defined by a topic. For
example, a topic may be used to distribute the positions of
airplanes as detected by a radar. Each plane would be an
instance. All radar tracks have the same structure (type) and are
updated in the same way (QoS). But they are also distinct from
one another: it matters whether the plane at a given location
happens to be American Airlines flight 123 or Delta flight
- A "data writer"
defines a source of information about a particular subset of the
world (topic). As such, it may override the QoS of its topic
- multiple parties may provide information about the
same part of the world but with different degrees of
- A "data
reader" defines an observer of a particular subset of
the world (topic). As such, it may also override the QoS of its
topic. Furthermore, it may only be able and/or interested to
observe certain states of the world. For example, it may only be
interested in airplanes flying over a particular geographic area
or in stocks trading at over $20/share.
By creating a data reader with a certain QoS definition,
an application makes an affirmative statement that it wishes to
observe a certain portion of the world under a certain set of
circumstances. For example, it may state that it is interested in
observing the most recent five states (samples) to the objects
(instances) in its part of the world (topic), but it
doesn't need to process changes more frequently than
once every second.
This statement is one of
interest only; it in no way requires the observer to actually
observe a certain set of samples in a certain way or within a
certain period of time. On the one hand, the observer may choose
to be notified asynchronously of every new sample and to respond
to it immediately. On the other, it may "go
away" to other business and return hours later; when it
does, it will find the most recent five samples of each instance,
occurring no more frequently than once every second, waiting for
it. In the mean time, DDS will have taken care of all of the
necessary data reception, filtering, and replacement in order to
make that happen.
DDS's ability to
combine notification and lightweight caching - in
effect, to maintain an application's observed state of
the world on its behalf - is something no other
standards-based technology provides. Developers of data-centric
systems reap the benefits: href="http://www.rti.com/resources/product-tour/performance-scalability.html">higher
performance and scalability, href="http://www.rti.com/resources/product-tour/system-architecture.html">greater
tolerance to dynamic network conditions, and ultimately
ROI and time-to-market.
href="http://feeds.wordpress.com/1.0/gocomments/rtidds.wordpress.com/255/"> alt="" border="0"