Lattice of event subscriptions
Some collections could represent continuous queries, as opposed to raw data. It would then be possible for a service to subscribe to high-level collection changes.

Originating email:

Since we're discussing federation tools, here's the long form of some of my requirements. (At the low level, high-level requirements and shared data model would be the subject of another post.)

 

I've been looking a bit at openwhisk, apex, gtor, and I'm a bit puzzled.

I suspect I'm missing something, because I believe that my concern is simple, common and I do not see it being addressed by a common tool.

So I have a set of data. Some of the data falls in natural collections (e.g. by datatype). There are subsets/combinations/derived data from this dataset that are not so natural, but clearly useful; in database parlance, those correspond to queries/views/materialized views, in order of materialization; materialization is a time/space/latency optimization concern, but basically we have derived collections of interest from a dataset. 

As a client interested in one of those derived datasets, I want at least two things: a snapshot of this derived data at any given time, and signals when something gets added to or removed from the dataset for any reasons. [1_]

Now the database space is very good at giving me natural collections and snapshots of derived data, but does not think much in terms of signals (unless at the natural collection level.)

Unless I'm missing something, the streaming space thinks in terms of signals and a lattice of heterogeneous transducers for these signals, which is great, but not in terms of an underlying dataset.

I saw some systems that seem like they might understand both concerns, but they sit squarely within a given technology, and are not federating. (eg. Stanford Stream, whence CQL (C as in continuous, not Cassandra!) Stream is dead, but GTor might still use CQL.)

I could see both concerns in Spark, which has both big-data federating queries and streaming operators, I'm trying to see how well those two work together.

One other space I'm looking at is streaming of linked data fragments.

Both of those impose a lot of technological choices, but seem to be a step in the right direction.

(I strongly strongly believe that our data resources should otherwise obey linked data principles, but that's a separate concern.)

Am I missing something about OpenWhisk / Apex / some other technology, so they fit my bill better than I understand?

 

What I had started to discuss with Alec is a simple pubsub mechanism between federation servers; very much old technology, but conventions using a lower common denominator makes it easier for heterogeneous platforms to share. 

Ideally, federation processes could subscribe not just to a natural collection, but to a derived dataset of another process; eventually even one that they can define themselves (after vetting/negotiation to make sure that they do not hog resources.)

So basically we could agree to give a REST+pubsub endpoint for at least basic types, and eventually for materialized queries. 

Each platform could then use a lattice of queries internally to optimize those continuous queries; and the platforms could form such a lattice between them at a higher level.

Is that too much to ask from federation participants? Is there a standard and tooling that escaped me for this?

 

_1: Sometimes, I also want a history of those transitions; but let's say that this is out of scope for now. OTH, materialization of signals in the datastore can be necessary but should be optional. A simple motivating example: objects that have changed since I last consulted them. This is per user, hence could create a combinatorial explosion if materialized for lurkers, whereas the reading flags are only applied to active users. But active users subscribing to this information as an event stream makes a lot of sense even if it's not stored.

CONTEXT(Help)
-
Knowledge Federation Webservices Protocol »Knowledge Federation Webservices Protocol
Requirements »Requirements
The communication between platforms should allow a services ecosystem »The communication between platforms should allow a services ecosystem
Lattice of event subscriptions
"Follow" as event subscriptions is planned feature »"Follow" as event subscriptions is planned feature
This represents an ecosystem-wide P2P-like extension »This represents an ecosystem-wide P2P-like extension
adds complexity to each ecosystem member »adds complexity to each ecosystem member
Publish/Subscribe »Publish/Subscribe
OpenWhisk »OpenWhisk
+Comments (0)
+Citations (0)
+About