mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

Knowledge Management feat. Wiktionary

March 12, 2010 12:55 by marx

Wiktionary is about Knowledge Management.

Although the term itself has been around for ages, it would probably be hard to find two persons who would agree on what it stands for precisely. Knowledge management has come a long way, from huge hierarchical file systems full of text files of the 70's, to dedicated document management systems of the 80's, to enterprise portals, intranets and content management systems of the 90's. However, it's always been a balancing act between strengths and weaknesses in particular areas, to get the mix between collaborative, structural and navigational facets right.

Two burning issues building a knowledge management infrastructure as we see it are: How to define and access the knowledge we want to manage? and How to store the knowledge we have created/defined?

Regarding the first question, the keywords are collaborative effort in knowledge creation, and intuitive, effortless navigation during knowledge retrieval. In today's internet one of the most successful technologies of the Web 2.0 era is Wikipedia, or more generally - wiki. This is arguably the easiest to use, most widely recognised and probably the cheapest to build method to give a huge number of very different people located all over the world an efficient access to manage an unimaginably vast amount of complex and disparate information. So we found it to be good and put it to use.

One way to define knowledge management in a simple way is: it's about things (concepts, ideas, facts etc.) and relationships between them. In our today's internet-based world we have probably most (or at least a big share) of the data, facts and figures we ever need freely available for us, anytime, anywhere. So it's not about the existence or access of data, it's about navigation and finding it. The relationships are as important and sometimes even more important than the related items themselves. More than that, relationships tend to carry information with them, which might be even more significant than the information carried by the related items. 

Which brings us to the semantics (meaning) of the relationships. In Wikipedia (and in the Internet in general) the links carry only one universal meaning: we can navigate from here to there. A human being clicking on a link has to guess the meaning and significance of the link, and he/she does this by using a combination of intuition, experience and creativity. However, this is a pretty limited and inefficient way to associate things to each other. Adding semantics to relationships enables us to understand why and how various ideas, concepts, topics and terms are related. Some very obvious examples: 'synonym', 'antonym', 'part of', 'previous version', 'owner', 'creator'. The mindshift towards technologies with more semantically 'rich' relations is visible in the evolution from classifications to ontologies, from XML to RDF etc.

Finally, simply by enumerating things and relationships between them we have created a model, which forces us to think 'properly': we only define concepts and ideas that are meaningful in our domain of interest, and we only define relationships that are actually allowed and possible between those concepts and ideas. A model validates all our proceedings and forces us to 'do right things'. Wiktionary employs this approach as the cornerstone of it's technology; in fact, the metamodel acting as the base of Wiktionary houses a multitude of different models, enabling Wiktionary to support management of knowledge in disparate subject domains simultaneously and even have links between concepts belonging to different domains. So, regarding our second issue, metamodel defines a structured storage mechanism for our knowledge repository.

In data processing world, there has always been an ancient controversy between structured and unstructured data. Structured data is good for the computers, and can be managed and processed very efficiently. However, we, humans tend to think in an unstructured way, and most of us feel very uncomfortable while being forced to squeeze the way we do things into rigid and structured patterns. Wiktionary aims to bridge those two opposites by building on a well-defined underlying structure, at the same time providing a comfortable, unstructured user experience. We have two pretty controversial goals and the approach we have taken - Wiktionary - is arguably the cheapest route to solve both of them.

Trees and Hierarchies the MMX Way

March 30, 2009 22:34 by marx

Implementing trees and hierarchies in a relational database is an issue that has been puzzling many and has triggered numerous posts, articles and even some books on the topic. 

As stated by Joe Celko in Chapter 26, Trees [1]: "Unfortunately, SQL provides poor support for such data. It does not directly map hierarchical data into tables, because tables are based on sets rather than on graphs. SQL directly supports neither the retrieval of the raw data in a meaningful recursive or hierarchical fashion nor computation of recursively defined functions that commonly occur in these types of applications. <...> Since the nodes contain the data, we can add columns to represent the edges of a tree. This is usually done in one of two ways in SQL: a single table or two tables." The single table representation enables one-to-many relationships via self-references (parent-child) while more general, two table representation handles many-to-many relationships of arbitrary cardinality. Based on the principles of Meta-Object Facility (MOF), MMX implements both M1 (model) and M2 (metamodel) layers of abstraction. Two most important relationship types defined by UML, Generalization and Association, are realized.

Generalization is defined on M2 level and is implemented via SQL self-relationship mechanism. Each class defined in M2 must belong to one class hierarchy, and only single inheritance is allowed. In terms of semantic relationship types in Controlled Vocabularies [2], this is an 'isA' relationship. Associations (as well as aggregations and compositions) are realized as a relationship table (an associative or a 'join table') allowing any class to be related to any other class with an arbitrary number of associations of different type (with support for mandatory and multiplicity constraints). This implementation enables straightforward translation of metamodels expressed as UML class diagrams into equivalent representation as MMX M2 level class objects.

M1 level deals with instances of M2 classes and parent-child hierarchies here denote 'inclusion', 'broader-narrower' or structural relationships between objects ('partOf' relationship in Controlled Vocabularies world). UML Links are implemented as a many-to-many relationship table, with both parent-child and link relationships being inherited from associations defined on M2 level. This inheritance enables automatic validation of M1 models against M2 metamodels by defining general rules to reinforce the integrity of models based on the characteristics of respective metamodel elements.

('single table', parent-child, one-to-many)
('two tables', relationship table, many-to-many)
Class hierarchy ('isA'),
UML Generalization
UML Associations
Object hierarchies ('whole-part'),
UML Links
UML Links

There seems to be a huge controversy in data management community whether implementing hierarchies in SQL should employ recursion support built into modern database systems or not. While a technique employing manual traversal and management of tree structures is proposed by Joe Celko in [1], the book is 15 years old and meanwhile the world (and databases) have changed a bit. Recursion is now part of ANSI SQL-99 with most big players providing at least basic support for it, and in many cases arguable gain in performance without taking advantage of recursive processing makes way to the gain in ease and speed of application development with it.

MMX Framework encapsulates all the details of handling inheritance, traversing hierarchies, navigating linked object paths etc. in MMX Metadata API realized as a set of table functions (database functions that return table as the result) that can be easily mapped by Object-Relational Mappers [3]. The performance penalty paid for recursion that might be an issue in an enterprise scale DWH is not an issue here - after all, MMX Framework is designed for (and mostly used in) metadata management, where data amounts are not beyond comprehension. 

[1] Joe Celko's SQL For Smarties: Advanced SQL Programming, 1995.

[2] Zeng, Marcia Lei. Construction of Controlled Vocabularies, A Primer (based on Z39.19), 2005.

[3] Scott W. Ambler. Mapping Objects to Relational Databases, 2000.

Mapping UML class diagrams to MMX metamodel

November 21, 2008 22:23 by marx

An obvious choice for a tool for designing metamodels for MMX M2 level (MMX/M2) is UML class diagram. Class diagram is a mixture of elements concerned with both data structures and behaviour, and we are only interested in data structures aspect. The important elements that we need to consider while mapping an UML class diagram to MMX/M2 are: classes, interfaces, objects, attributes, annotations, associations, generalizations, enumerations and data types. Here's how those elements are mapped to the constructs of MMX Metamodel:

classes An UML class is implemented as an instance of MD_OBJECT_TYPE. A mandatory name column contains the name of the class and there's an indicator column to denote whether the class is an abstract or a concrete one.
interfaces An interface is implemented in exactly the same way as an abstract class.
objects An UML object is an instance of a class. Objects are implemented as instances of MD_OBJECT that get their object types supplied by MD_OBJECT_TYPE. The relationships between MD_OBJECT and MD_OBJECT_TYPE are essential for consistency of MMX model and are enforced by the facilities of referential integrity provided by the underlying database.
attributes An UML class attribute is implemented as an instance of MD_PROPERTY_TYPE. Each row in this table is related to the owner class of the attribute, and to the domain class of the attribute (a data type or an enumeration). In case a default value is provided it is stored in the default value column.
packages Package element is currently not mapped to MMX metadata model as it provides no additional benefits in this context. A metamodel is always assumed to belong to a single package with a single namespace.
annotations Comments and notes are stored as a text column of a class diagram element instance that it belongs to.
associations An UML association between two classes is implemented as an instance of MD_RELATION_TYPE. Each row in this table has two relations with MD_OBJECT_TYPE, one for each end of the association, and a name made up of both role names (all mandatory columns). An association type column indicates whether the row denotes an association, an aggregation or a composition, with null value denoting an association. Note that relationships in MMX metamodel are directional by design. 
aggregations An aggregation relationship is implemented exactly as an association, with the association type of aggregation ('A').
compositions A composition relationship is implemented exactly as an association, with the association type of composition ('C').
multiplicity Multiplicity of an association is stored in multiplicity type column of MD_RELATION_TYPE and takes a value from the predefined set of multiplicity types. The following notation is used:
0..1 (optional, zero or one) 'Z'
1 (one, or an exact number n) '1' ('n')
0..* or * (zero, one or more) '*'
1..* (at least one)  'P'
generalization Generalization has a very special role in MMX metadata model architecture as the mechanism for maintaining class hierarchies. Inheritance is implemented as parent-child relationship of MD_OBJECT_TYPE realizing superclass and subclass relations between classes. Note that MD_PROPERTY_TYPE and MD_RELATION_TYPE also have this relationship and can therefore constitute hierarchies of their own. Multiple inheritance is not permitted due to the single-parent nature of the parent-child relationship.  
enumerations Enumerations are implemented as instances of MD_OBJECT_TYPE inherited from an abstract domain class. Enumeration literals are stored as instances of MD_OBJECT related to one particular MD_OBJECT_TYPE.
data types Like enumerations, UML data types are instances of MD_OBJECT_TYPE inherited from an abstract data type class. Unlike enumerations, data types do not have an implicit set of possible values.
constraints UML constraints are technically just informal annotations to the model that have to be taken care of during system implementation. MMX Metamodel does not support constraints in any formal way so they are left for an application to handle. 


Not all UML class diagram elements and features (eg. those designed for code generation) are relevant in the scope of metamodeling and are therefore not considered here. As an example, visibility property of class attributes is of no concern in metamodel context. Similarily, not all MMX/M2 features are required for mapping UML class diagrams.