Categories
TechReport

Streaming System Benchmarks

Streaming systems are complex; apart from correct functionality (which might differ between implementations and vendors) many non-functional aspects can be benchmarked such as memory comsumption, latency, and throughput. For RDF Stream Processing several benchmarks exist, shown as follows. From data stream management, older benchmarks exist which are not specific to RDF data but might be adapted. Some are listed below.

RDF Stream Benchmarks

Other Stream or CEP Benchmarks

  • BEAST 4
  • NEXMark 5
  • Linear Road 6
  • BiCEP 7 – a benchmarking framework
  • Fast Flower Delivery (FFD) 8 – a functional benchmarking scenario

  1. Zhang, Y.; Duc, P.; Corcho, O. & Calbimonte, J.-P. SRBench: A Streaming RDF/SPARQL Benchmark The Semantic Web –- ISWC 2012, Springer Berlin Heidelberg, 2012, 7649, 641-657 

  2. Dell’Aglio, D.; Calbimonte, J.-P.; Balduini, M.; Corcho, O. & Della Valle, E. On Correctness in RDF Stream Processor Benchmarking. The Semantic Web – ISWC 2013, Springer Berlin Heidelberg, 2013, 8219, 326-342 

  3. Le-Phuoc, D.; Dao-Tran, M.; Pham, M.-D.; Boncz, P.; Eiter, T. & Fink, M. Linked Stream Data Processing Engines: Facts and Figures. The Semantic Web – ISWC 2012, Springer Berlin Heidelberg, 2012, 7650, 300-312 

  4. Geppert, A.; Berndtsson, M.; Lieuwen, D. & Roncancio, C. Performance evaluation of object-oriented active database systems using the BEAST benchmark. Theor. Pract. Object Syst., John Wiley & Sons, Inc., 1998, 4, 135-149 

  5. Tucker, P.; Tufte, K.; Papadimos, V. & Maier, D. NEXMark – A benchmark for querying data streams. Oregon Health & Sciences University, 2002 

  6. Arasu, A.; Cherniack, M.; Galvez, E.; Maier, D.; Maskey, A. S.; Ryvkina, E.; Stonebraker, M. & Tibbetts, R. Linear road: a stream data management benchmark. VLDB ’04: Proceedings of the Thirtieth international conference on Very large data bases, VLDB Endowment, 2004, 480-491 

  7. Bizarro, P. BiCEP – Benchmarking Complex Event Processing Systems Event Processing, Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, 2007 

  8. Etzion, O. & Niblett, P. Event Processing in Action Manning Publications Co., 2010  

Categories
TechReport

An RDF Model for Events

Why do we need an event model? Many RDF streaming systems discussed have little or no model for the real-time data they ingest. These systems make the lowest common assumptions about the structure of the data, i.e. that the data consist of a stream of RDF triples. Thus, each piece of real-time data (event) is one triple. One triple, however, cannot hold a lot of information. For example: flexibility in timstamping (one vs. two timestamps or application time vs. system time) is only possible if timstamps can be attached to event structure. Flat triples cannot do that. Another example is when typing data, the triple <myInstance> rdf:type <MyClass> can introduce a type, but the event (one triple) is "full". This means that any structure in the data must be inferred from more than one event. However, consumers cannot make assumptions about events which are not yet received: Events occur spontaneously and event consumers are often decoupled from the senders (cf. publish/subscribe systems). Therefore, structure is needed in individual events.

Events should be self-describing. A common understanding of data is crucial for consumers and producers 1, especially in a distributed and heterogeneous system such as the Web. Therefore, a consumer must find a way to understand received events which entails the need for a universal event model 1.

Model

The figure shows the event model in a class diagram 2. The class "Event" at bottom left of the figure is the superclass for any event to conform to our model. This class makes use of related work by inheriting from the class "DUL:Event" from Dolce Ultralight based on DOLCE 3. That class provides a notion of time and helps distinguish events (things that happen) from facts (which are always valid).

Event Model

In accordance with our requirements 2 some properties are mandatory while the rest are optional. An instance of class Event MUST have (i) a type, (ii) at least one timestamp and (iii) a relevant stream. We describe the event properties in detail as follows.

The type of an event must be specified using rdf:type. The type must be the class Event or any subclass.

The event model supports interval-based events as well as point-based events by either using just the property :endTime for a point or both :startTime and :endTime for an interval. The property :endTime thus has a cardinality of [1..1] whereas :startTime has a cardinality of [0..1]. Both temporal properties are subproperties of DUL:hasEventDate from the super class. We improve the semantics by distinguishing start from end whereas the superclass has an alternative, more difficult way of formulating intervals using subobjects reifying the interval.

The property :stream associates an event with a stream. Streams are used in our system as a unit of organisation for events governing publish/subscribe and access control. Streams themselves are modelled using title, description and a topic needed for topic-based publish/subscribe.

The first optional property is :location. For for geo-referencing of events (where necessary) we re-use the basic geo vocabulary from the W3C 4. The property may be used to locate events in physical locations on the globe. The property is subproperty of DUL:hasLocation and geo:location to inherit the semantics from those schemas.

Inter-event relationships may be supported by linking a complex event to the simple events which caused it. Thus, RDF Lists may be used in :members to maintain an ordered and complete account of member events. The linked events are identified by their URI. These linked events could have further member events themselves. This facilitates modelling of composite events 5. The :members property is a subproperty of DUL:hasConstituent from the superclass.

The property :eventPattern may be used to link a complex event to the pattern which caused the event to be detected. Direct links to event patterns may be provided by RESTful services. Using such links can help in recording provenance of derived events.

The source of an event may be specified using the :source property. This is an optional property to record the creator of an event where needed. The property is a subproperty of DUL:involvesAgent. Agents may be human or non-human.

A human readable synopsis of an event may be added using the :message property. This proves useful in scenarios where events are received by human end users. The :message property is a subproperty of dc:title, a popular way of describing things using natural language. Multilingualism is provided by the feature of language tags for string literals in RDF 6.

N-ary predicates 7 may be used to maintain event properties which are valid only for a specific event, e.g. a volatile sensor reading such as the temperature measurement belonging to a specific event. For example, instead of plainly stating the disputable fact that "the city of Nice has a temperature in Celsius of 23 degrees" which looks like this:

dbpedia:Nice :curTemp "23" .

We can instead state that the city of Nice has said temperature but qualified by the conjunction with a given event "e2" in the following n-ary predicate:

dbpedia:Nice :curTemp [
    rdf:value "23" ;
    :event  <http://events...org/ids/e2#event>
] .

Endowment of further structure for events is left to domain-specific schemas. For example the W3C Semantic Sensor Network (SSN) Ontology may be added if fine-grained modelling of sensors and pertaining sensor readings is needed.

Example

The listing below shows several facts about our event model along an example. The listing uses the example of a Facebook event generated by our event adapter described in 2.

@prefix :       <http://events.event-processing.org/types/> .
@prefix e:      <http://events.event-processing.org/ids/> .
@prefix user:   <http://graph.facebook.com/schema/user#> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .

e:5534987067802526 {
    <http://events.event-processing.org/ids/5534987067802526#event>
        a :FacebookStatusFeedEvent ;
        :endTime "2012-03-28T06:04:26.522Z"^^xsd:dateTime ;
        :status "I bought some JEANS this morning" ;
        :stream <http://streams...org/ids/FacebookStatusFeed#stream> ;
        user:id "100000058455726" ;
        user:link <http://graph.facebook.com/roland.stuehmer#> ;
        user:location "Karlsruhe, Germany" ;
        user:name "Roland Stühmer" .
}
  1. The example shows an event using quadruples in TriG syntax 8. The graph name (a.k.a context) before the curly braces is used as a unique identifier, e.g. to enable efficient indexing of contiguous triples in the storage backend for historic events.
  2. The event in this example has the ID 5534987067802526 as part of its URI. There is a distinction made between URIs for things and URIs for their information resources, i.e. the event object 5534987067802526#event and the Web document 5534987067802526 describing the event. The two URIs might carry, e.g. a different creation date, which is why it can be important to separate them. The fragment identifier #event is used to differentiate them. See 9 for an in-depth discussion of the matter of disambiguation (also known as the httpRange-14 issue).
  3. There is an event type hierarchy from which the type Facebook-StatusFeedEvent is inherited. This hierarchy can be extended by any user by referencing the RDF type :Event as a super class.
  4. The event may link to entities from static Linked Data where further context for the event can be retrieved. In this example the event uses user:link where further context for the event can be retrieved, in this case from the Facebook Graph API. Facebook started publishing Linked Data as RDF 10.
  5. The event links to a stream which is a URI where current events can be obtained in real-time by dereferencing the link.
  6. The namespace event-processing.org is chosen as a generic home for this schema.

Conclusion

We are re-using and creating domain vocabularies to subclass the class Event. For example in the Facebook case we use the schema from the RDF/Turtle API provided by Facebook 10.

We developed this event model to satisfy requirements of an open platform where data from the Web can be re-used and which is extensible for open participation. Future updates to the event schema can be tracked on-line at 11.


  1. Rozsnyai, S.; Schiefer, J. & Schatten, A. Concepts and models for typing events for event-based systems Proceedings of the 1st ACM International Conference on Distributed Event-Based Systems, ACM, 2007, 62-70 

  2. Stühmer, R. Web-oriented Event Processing Karlsruhe Institute of Technology, KIT Scientific Publishing, Karlsruhe, 2014 

  3. Gangemi, A.; Guarino, N.; Masolo, C.; Oltramari, A. & Schneider, L. Sweetening Ontologies with DOLCE Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, Springer-Verlag, 2002, 166-181 

  4. Brickley, D. Basic Geo (WGS84 lat/long) Vocabulary, 2003 

  5. Luckham, D. C. & Schulte, R. Event Processing Glossary – Version 2.0, 2011 

  6. Klyne, G. & Carroll, J. J. Resource Description Framework (RDF): Concepts and Abstract Syntax 2004 

  7. Noy, N. & Rector, A. Defining N-ary Relations on the Semantic Web World Wide Web Consortium, 2006 

  8. Bizer, C. & Cyganiak, R. RDF 1.1 TriG, 2014 

  9. Berners-Lee, T. What HTTP URIs Identify? — Design Issues, 2005 

  10. Weaver, J. & Tarjan, P. Facebook Linked Data via the Graph API Semantic Web Journal, IOS Press, 2012 

  11. Harth, A. & Stühmer, R. Publishing Event Streams as Linked Data Karlsruhe Institute of Technology, FZI Forschungszentrum Informatik, 2011 

Categories
TechReport

RDF Access Control

There are several approaches to modelling access control using RDF. The approaches use RDF as a modelling language for permissions linking users with user’s rights on the one hand and on the other hand are used on RDF data granting access to users (linking permissions with data). All approaches grant access to RDF resources while assuming what is not granted is forbidden.

Related Work

The S4AC Vocabulary Specification 0.21 defines access rights tailored towards RDF query answering, i.e. SPARQL processing. The vocabulary defines access rights Create, Read, Update and Delete. The model is very expressive by allowing fine-grained access condition modelled as contextual queries against arbitrary context data to check. However, the integration with SPARQL is not applicable for our system as not all operations require a query such as a plain subscription to a stream.

SIOC Access is a part of the SIOC specification2. It is a very simple but extensible vocabulary to define permissions in the scope of the social Web. The vocabulary does not have any predefined rights. The lack of rights, the focus on social communities and its lack of traction on the Web are the drawbacks of this candidate when choosing a model for access control in our system.

The W3C WebAccessControl (WAC)3 is a generic vocabulary declaring some predefined rights (Read, Write, Append, Control) on Web information resources. Streams in our system are information resources so the vocabulary can be used without change. Access rights must be extended for our system to govern the real-time access Notify and Subscribe in addition to the predefined rights Read and Write for static data.

Our System, Using Access Control for Streaming Data

Data in our system4 is organized in streams (cf. topic-based publish/subscribe). Attributing access control on a per-stream granularity was chosen. Finer granularity such as per-event attribution was discarded. The expected performance impact at runtime was thought to be unnecessarily high when having to check each event for each of its recipients before delivery. Coarser granularity such as granting access to all streams at once, however, was contradicting our requirement for multitenancy without having the ability to separate users.

After analysing the existing RDF models for access control mentioned above we concluded that W3C WebAccessControl was the most viable candidate of the three available candidates S4AC, SIOC Access and the W3C WebAccessControl. Reasons were its traction on the Web, its generality, and its ease of use compared to the other candidates (e.g. linking permissions with plain RDF resources instead of complex SPARQL queries to define rights).

The figure below shows the concepts of WebAccessControl (WAC). The bottom of the figure shows that a single permission (Authorization in WAC terms) is a ternary relation. It consists of an agent (who can access), an information resource (what) and a mode (how), cf. middle line of the figure. An example ternary relation is: Roland can access the TwitterFeed with permissions Subscribe and Read. The top left of the figure shows an agent can be either a group or an individual user’s account. User accounts can be members in groups. If accounts are defined in several locations, they can be declared to be the same.

Access Control Lists

In the figure the concepts from the WAC vocabulary are highlighted in blue colour. WAC has predefined access rights Read and Write for static data, cf. top right of the figure. For the use with real-time data we extended WAC with the rights Notify and Subscribe. The classes on white background in the figure are defined as part of this work. Finally, the classes in yellow are from the SIOC vocabulary.

The following listing in Turtle syntax shows two example authorizations p0001 and p0002 in the namespace permission starting on line 10 and 15. A user person:rs who is member of the group group:administrators is shown starting on line 20. Both permissions exhibit the ternary relation between who, what and how access is granted. The first permission states that Roland (rs) can access the TwitterFeed with permissions Subscribe and Read. The second permission states that group:administrators can access the FacebookStatusFeed with permission Write.


@prefix acl: http://www.w3.org/ns/auth/acl# .
@prefix foaf: http://xmlns.com/foaf/0.1/ .
@prefix group: http://groups.event-processing.org/id/ .
@prefix permission: http://permissions.event-processing.org/id/ .
@prefix person: http://www.roland-stuehmer.de/profile# .
@prefix s: http://streams.event-processing.org/ids/ .
@prefix sioc: http://rdfs.org/sioc/ns# .
@prefix wsnt: http://docs.oasis-open.org/wsn/b-2/ .

permission:p0001
acl:accessTo s:TwitterFeed ;
acl:agent person:rs ;
acl:mode wsnt:Subscribe , acl:Read .

permission:p0002
acl:accessTo s:FacebookStatusFeed ;
acl:agent group:administrators ;
acl:mode acl:Write .

person:rs
sioc:member_of group:administrators ;
owl:sameAs http://data.semanticweb.org/person/roland-stuehmer .

When defining permissions, the streams are modelled as information resources (e.g. http://.../TwitterFeed on line 11 without the trailing #stream). Elsewhere, streams are modelled with their non-information resource (e.g. http://.../TwitterFeed#stream). Making this distinction (cf. the so-called httpRange-14 issue) we can attribute different metadata to the information for the stream (e.g. annotate permissions) and to the real-world stream (e.g. annotate its real-world event source or author).

Categories
TechReport

Real-time Web

Behind the idea of the Real-time Web is the motivation of the Web being situation-aware and in real-time. This idea was developed as a grand challenge 1 for the field of event processing. The purpose of this challenge is "to identify a single, though broad challenge that impacts society and at the same time measures the progress of research" 1. The challenge is to create a decentralized, global, Internet-like infrastructure, built upon widely-accepted open standards 1.

There are a number of terms (synonyms) given for a Web which is situation-aware. Examples are Real-time Web 2, Web of Events 3, Active Web 4, Reactive Web5 and Event Processing Fabric 1.

They have in common that data must be exchanged quickly after it is created. Moreover, Fromm 2 states that the Real-time Web (i) is a new form of communication which (ii) creates a new body of content, (iii) is real-time, (iv) is public and has an explicit social graph associated with it and (v) carries an implicit model of federation. Indeed, this work makes a contribution to the Real-time Web by enabling a new form of communication using event processing, working in real-time and supporting federated data-creation and consumption.

There are many technological developments on the Web today which can create a lot of events and thus support a Real-time Web. Such events are delivered in a push fashion as opposed to the traditional client–server Web of request and response. For one, there is the W3C Web Notification Working Group which is working on push notifications to actively notify running Web applications. Additionally, HTML5 defines two techniques to facilitate communication initiated by the server. These techniques are Server-Sent Events and WebSockets. They operate at different layers of the protocol stack to achieve push delivery to Web clients. Another approach to push-data on the Web is the Google PubSubHubbub protocol to enable mainly server-to-server notifications. It is designed to avoid inefficient polling of news feeds in Atom or RSS. Lastly, the Facebook Graph API provides an application-specific way to subscribe to Facebook real-time updates from changes to connected people’s profiles.


  1. Chandy, K. M.; Etzion, O. & von Ammon, R. (Eds.) 10201 Executive Summary and Manifesto — Event Processing Event Processing, Schloss Dagstuhl – Leibniz-Zentrum fuer Informatik, Germany, 2011 
  2. Fromm, K. The Real-Time Web: A Primer, 2009 
  3. Jain, R. Toward EventWeb IEEE Distributed Systems Online, IEEE Computer Society, 2007, 8 
  4. Ostrowski, K.; Birman, K. & Dolev, D. Live Distributed Objects: Enabling the Active Web IEEE Internet Computing, IEEE Educational Activities Department, 2007, 11, 72-78 
  5. Bry, F. & Eckert, M. Twelve theses on reactive rules for the web Proceedings of the Workshop on Reactivity on the Web, Munich, Germany, Springer, 2006 
Categories
TechReport

Definition of “Event” for Event Processing

I collected some definitions of event from the view of event processing research and practise. The emphasis is mine:

  • (Etzion & Niblett 2010)1 wrote:

    An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain. The word event is also used to mean a programming entity that represents such an occurrence in a computing system.

  • (Luckham & Schulte 2011)2 wrote:

    Anything that happens, or is contemplated as happening. Or: An object that represents, encodes, or records an event, generally for the purpose of computer processing.

  • (Gupta and Jain 2011)3 wrote:

    Events are first-class objects which means a fundamental information unit which can be stored, queried and merged with other events

  • (Hinze & Voisard 2002)4 wrote:

    An event is the occurrence of a state transition at a certain point in time.

  • (Michelson 2006)5 wrote:

    A notable thing that happens inside or outside your business.

  • (Mühl et al. 2006)6 wrote:

    Any happening of interest that can be observed from within a computer is considered an event. […] A notification is a datum that reifies an event, i.e., it contains data describing the event.


  1. Etzion, O. & Niblett, P. Event Processing in Action Manning Publications Co., 2010 
  2. Luckham, D. C. & Schulte, R. Event Processing Glossary – Version 2.0, 2011 Link 
  3. Gupta, A. & Jain, R. Managing Event Information: Modeling, Retrieval, and Applications, Morgan & Claypool Publishers, 2011 
  4. Hinze, A. & Voisard, A. A parameterized algebra for event notification services Ninth International Symposium on Temporal Representation and Reasoning, 2002. TIME 2002, 2002 
  5. Michelson, B. M. Event-Driven Architecture Overview Patricia Seybold Group, Feb, 2006 Link 
  6. Mühl, G.; Fiege, L. & Pietzuch, P. Distributed Event-Based Systems Springer-Verlag New York, Inc., 2006 
Categories
TechReport

Immutability and Event Derivation in RDF

"In many event processing systems […] events are immutable"1. This stems from the definition of what an event is: "An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened […]"2. So events cannot be made to unhappen.

Open Question: Does this apply to all systems/applications/usecases or just to "many" as stated above?

I made immutability a general assumption in my work3. It is very useful for building systems (distributed systems, consistency, …).

Q: How can a Stream processing agent process events if they are immutable?

A: Every processing task produces new derived events as results. Advantage: the underived events are still available for other uses and remain immutable.

For <abbr title="RDF Stream Processing">RSP this means: (1) create a new (unique) graph for the derived event (2) possibly link back to the base event(s) thus enabling drill-down or root cause / provenance analysis of the derived event. The links can be made with DUL:hasConstituent from DOLCE Ultralight4. In my own work5 I use a new :members property to link from a derived event to its simple events. The property is a subproperty of the mentioned DUL:hasConstituent.

Observation: We talk about adding "received time" and other metadata later by receiving agents: Adding triples later to the event graph with graphname as subject can still be legal and considered as amending the event header. Much like with email: headers can be added by intermediate mail servers but the mail body and ID are immutable.

Categories
TechReport

Stream Punctuation and RDF Stream Processing

Definition by Tucker et al.1 and Maier et al.2:

”A punctuation is a pattern p inserted into the data stream with the meaning that no data item i matching p will occur further on in the stream.”

For event processing systems, events are the fundamental unit of information3. This means each event is processed atomically, i.e. completely or not at all. For RDF stream processing systems this can cause problems if events are modelled as graphs consisting of multiple quadruples: How can a receiver of an event know that all quadruples pertaining to the event are transmitted in order to start processing the event?

For streams of RDF graphs punctuation can be used like this: A punctuation is a pattern ”p” inserted into the quadruple stream with the meaning that no quadruples i from graph p will occur further on in the stream.

Punctuation could be implemented using special ("magic") quadruples but when using the Web stack(!) we can do punctuation out-of-band, i.e. implement punctuation on a lower layer of the stack. For example, we can communicate through ”chunked transfer encoding” (Fielding et al. 1999, Section 3.6.1)4 from HTTP 1.1. Each chunk contains a complete graph and the receiver will know that after a chunk is received the event is completely received and can be processed further in an atomic fashion. There is a guarantee that no quads for this graph will arrive later. Using HTTP chunked connections no special (or magic) quads are needed.

”Chunked transfer encoding” is also used by the RDF publish/subscribe middleware Ztreamy5 to provide long-lived connections using pure HTTP with the goal of disseminating events to subscribers. Further related work6 investigates the exchange of RDF over different protocols such as XMPP on top of HTTP (and thus TCP) but even UDP. However, none of these protocols provides pure HTTP stream URIs which are easily referenced in Linked Data.


  1. Tucker, P.; Maier, D.; Sheard, T. & Fegaras, L. Exploiting punctuation semantics in continuous data streams Knowledge and Data Engineering, IEEE Transactions on, 2003, 15, 555-568 [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1198390
  2. Maier, D.; Li, J.; Tucker, P.; Tufte, K. & Papadimos, V. Semantics of Data Streams and Operators Proceedings of the 10th International Conference on Database Theory, Springer-Verlag, 2005, 37-52 [http://datalab.cs.pdx.edu/niagaraST/icdt05.pdf
  3. Gupta, A. & Jain, R. Managing Event Information: Modeling, Retrieval, and Applications Managing Event Information, Morgan & Claypool Publishers, 2011 
  4. Fielding, R.; Gettys, J.; Mogul, J.; Frystyk, H.; Masinter, L.; Leach, P. & Berners-Lee, T. Hypertext Transfer Protocol — HTTP/1.1 RFC Editor, 1999 [http://www.w3.org/Protocols/rfc2616/rfc2616.html
  5. Fisteus, J. A.; García, N. F.; Fernández, L. S. & Fuentes-Lorenzo, D. (2014), ‘Ztreamy: A middleware for publishing semantic streams on the Web ‘, Web Semantics: Science, Services and Agents on the World Wide Web 25(0), 16 – 23. 
  6. Shinavier, J. Optimizing real-time RDF data streams CoRR, 2010, abs/1011.3595 [http://arxiv.org/abs/1011.3595