Expressive and extensible formalism for transducers
Version: 1.3VoDoo/Stream is a software that provides hight level expressive and extensible formalism for transducers for any kind of format. It was mainly based on three major paradigms. First one was a stream layer for tokenization. An automata layer for recognitions. Last one was a rule based document transformation based on stream and automata.
Operating System: Linux
VoDoo/Stream project is based on three concepts:
# First one inspired by event-based programming style like SAX or generic lexer in Objective-Caml provides a stream based for data denotation.
# Second one provides expressive and classical automata in order to match and recognize patterns when analyzing streams.
# The last one was a hight level structuration of automata done in order to provide expressive mechanism for data transformation.
Finally a XSLT like language is defined in order to express data transformations.
Stream was a simple formalism based on opening and closing a level, labels and text. Using this simple grammar we provide a simple tree (XML for example) stream denotation (XML was given by a dedicate SAX handler). Current supported formats are XML and free text. More formalisms can be supported and done using stream extension facility. A stream interpreation was provided for Document Object Model. Then a stream can manipulate either a pure text, an ad-hoc stream and a DOM based data.
In comparison the STAX approach was a low level XML matching integration based on token stream representation of XML fragments. The Stream representation used with classical switch/case conditional structure is similar to STAX approach but such integration is two low level and do not provide an expressive layer for XML management and was in fact at the same level than SAX.
Automata for Stream recognition
Automata provides a hight level for pattern recognition and variable binding. It produces DAG with specific attributes for variable denotations. Such automata is able to find or also to match a given stream. An automata was built using a given stream containing extended formalism including pattern like repetition, any kind of label or text and choice. Such stream was analysed in order to given a direct acyclic graph used for the automata generation (classical approach).
Transducer for Stream transformation
Transducers are in fact ordered set of rules. A rule has a selection part and a body. A selection can deal with pathes (tree visitor) and current entity. A first entity was the tree node and selection can be done filtering its name or attributes. A second entity was the string which can be filtered using usual pattern matching. A body was a piece of java code which is able to continue parsing or not (recursive descent).
Transducer Stream Processor language: XSP