mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

Moving Data in XDTL: Wooden, Painted Red

August 4, 2011 00:37 by mmx

Transferring data from a place to another is the cornerstone of almost every data transformation job. XDTL has a very flexible and feature-ladden fetch command for that job. fetch element has the following attributes:

source

a query to a source database. source can contain either an implicit query or a variable containing the query. 

target

target table or output text file. Specifying destination connection attribute indicates that target is a table. In case target is a text file, all the common attributes of text files (delimiter, quote, null, encoding) also apply and can optionally be present.

overwrite

overwrite="1" causes an automatic TRUNCATE command against the target table. 

rowset

specifies an XDTL variable (internally a Java array) to store the resultset of the source query. rowset and target are mutually exclusive and should not be present in the same fetch element. 

connection

connection of the source query. 

destination 

connection of the target table. 

 

The most obvious use case for fetch would be transferring data from a database table to a text file. The following xdtl command would fetch the data specified by the query (source attribute) into a CSV (target attribute) file with ; as the delimiter and " as the quote symbol (defaults, they can be overriden):

<xdtl:fetch source="SELECT col, col, ... FROM table WHERE cond" target="textfile" connection="$conn"/> 

In case we need to process the results of the fetch operation further inside the xdtl package, fetch can store the result set in a variable (technically a Java array): 

<xdtl:fetch source="SELECT col, col, ... FROM table WHERE cond" rowset="myrowset" connection="$conn"/>
<xdtl:for item="myrow" rowset="$myrowset"> 
...do something here. columns have to be addressed via Java array syntax: ${myrow[0]}, ${myrow[1]}, etc...
</xdtl:for>

We can go even further and let the read command - normally used to read data from text file(s) into database table(s) - to consider this rowset variable as its input and store data into a table with compliant structure, ie. matching the rowset by position  (type="ROWSET" tells read that input data is coming from a Java array instead of a text file):

<xdtl:fetch source="SELECT * FROM sourcetable WHERE cond" rowset="myrowset" connection="$conn"/>
<xdtl:read source="$myrowset" target="targettable" type="ROWSET" overwrite="1" connection="$dest"/>

Finally, in case we have no need to process the data in the xdtl package, use fetch command to achieve the behavior of a typical DataPump task in most of the graphical ETL tools in just one step (here, the second connection attribute, destination is used to specify target connection):

<xdtl:fetch source="SELECT * FROM sourcetable WHERE cond" target="targettable" connection="$conn" destination="$dest"/> 

Caution! fetch can transfer data between different servers that could come from different vendors. It can do automatic datatype conversions within reasonable limits (*), but all 'weird', non-standard data types (**) need to be taken care of in the source query statement. Source and target data structures are assumed to be identical, this requirement must be handled in source query by presenting data in suitable format and structure for the target table. And, mondo importante: data columns are matched based on position, not column names!

Taking all this into consideration, all the basic data transfer tasks can be handled with ease. For more complex transformations extensions or customized library tasks should be used. Naturally fetch would work best for reasonable data amounts; for large data other approaches (eg. bulkload tools) should be employed.

(*) Datatype conversion is done in JDBC via Java Primitive Data Types.

(**)



MMX Framework: Extending with Extended Properties

December 10, 2010 13:36 by mmx

Pretty often there is a need to add some extra attributes to a metamodel that has already been deployed. The need for those changes usually arises from changed requirements for an application. So how to enable those changes without extending the metamodel (M2 layer) itself?

There is a special property type, AnyProperty, to handle this scenario. Whan an object type (class) has a property of type AnyProperty, an object (instance of the class) can have an arbitrary number of different properties of type AnyProperty that all share the same property_type_cd. Then how to distinguish between those properties? 

All the instances of an AnyProperty descend from the same property type of a class in M2 layer, but an application who 'owns' them can give them new names. Alternatively, it is possible to use the same name, but a distinguishing prefix directly in the value field of the property. So an AnyProperty instance 'belongs' to an application and its completely up to the application how it chooses to use, denote and present this property.

Downsides (there obviously are some, right?). First, the inheritance mechanisms built into MMX Framework (MMXMD API) obviously cannot provide full support to those properties, as the names of them are not known to M2 layer. Second, AnyProperty is defined as String in M2 so an application cannot rely on the type information if it chooses to have an instance of AnyProperty that is implemented with some other datatype (eg. an XML document or a date formatted as a string).

Anyway, this mechanism (which bears some resemblance with anyAttribute element in XML Schema) enables making run-time changes in metamodel when design-time is not an option (run-time and design-time here refer to modeling, not application design), just remember to use it with care.

 



XDTL Packages in Five Easy Steps: Beginner

November 14, 2010 13:24 by mmx

Continuing from the previous step, this one actually does something useful, namely transfers some stuff from one location to another.

<?xml version="1.0" encoding="UTF-8"?>
<xdtl:package name="loader.test"
  xmlns:xdtl="http://xdtl.org/xdtl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://xdtl.org/xdtl http://xdtl.org/xdtl/xdtl.xsd">

  <xdtl:parameter name="filename" required="1"></xdtl:parameter>
  <xdtl:parameter name="tablename" required="1"></xdtl:parameter>
  <xdtl:parameter name="dbhost" default="localhost:5432" required="0"></xdtl:parameter>
  <xdtl:parameter name="dbname" default="postgres" required="0"></xdtl:parameter>
  <xdtl:parameter name="dbuser" default="postgres" required="0"></xdtl:parameter>
  <xdtl:parameter name="dbpass" default="" required="0"></xdtl:parameter>

  <xdtl:variable name="tmpdir">/tmp/incoming</xdtl:variable>
  <xdtl:variable name="bakdir">/tmp/incoming/bak</xdtl:variable>
  <xdtl:variable name="nocr">-e 's/\r//g'</xdtl:variable>
  <xdtl:variable name="tempdb">staging</xdtl:variable>

  <xdtl:connection name="local" type="DB">jdbc:postgresql://$dbhost/$dbname?user=$dbuser&amp;password=$dbpass&amp</xdtl:connection>

  <xdtl:tasks>
    <xdtl:task name="load_file">
      <xdtl:steps>
        <xdtl:get source="ftp://www.site.com/$filename.zip" target="$tmpdir" overwrite="1"/>
        <xdtl:unpack source="$tmpdir/$filename.zip" target="$tmpdir" overwrite="1"/>
        <xdtl:strip expr="$nocr" source="$tmpdir/$filename.txt" target="$tmpdir/$filename.tmp" overwrite="1"/>
        <xdtl:read source="$tmpdir/$filename.tmp" target="$tempdb.$filename" connection="$local" type="CSV" overwrite="1" delimiter=";" quote='"'/>
        <xdtl:query cmd="SELECT COUNT(*) FROM pg_tables WHERE tablename='$tablename'" target="exist" connection="$local"/>
        <xdtl:if expr="${exist != 0}">
        <xdtl:query cmd="INSERT INTO $tablename
          SELECT t0.* 
          FROM $tempdb.$filename t0
          LEFT OUTER JOIN $tablename t1 ON t1.id = t0.id
          WHERE t1.id IS NULL" connection="$local"/>
        </xdtl:if>
        <xdtl:move source="$tmpdir/$filename.zip" target="$bakdir/$filename.$xdtlDateCode.rar" overwrite="1"/>
        <xdtl:clear target="$tmpdir/$filename.*"/>
      </xdtl:steps>
    </xdtl:task>
  </xdtl:tasks>
</xdtl:package>

In short, a text file is downloaded (get), unzipped (unpack), stripped from CR's with sed (strip) and gets loaded into staging area (read). If the target table exists the rows that are not in the target table already get INSERTed. Finally, incoming file is archived away (move) and the temp files are trashed (clear). All in all, a typical plain vanilla data warehousing task. At this point, some explanations are probably needed.
 
1. The required parameters, filename and tablename get their values in some other (calling) package outside this one. If not, the package get an error.  The rest of the parameters are optional, ie. they might get their values from a calling procedure, if not they fall back to the defaults. 
 
2. The external utilities (get, unpack, strip etc.) get their context in the global configuration file (resources/globals.xml), like this:
 
<entry key="get">wget %target:-O :% %overwrite::-nc% %source%</entry>
 
%target%, %overwrite% and %source% are evaluated to the actual get element attribute values. 
 
3. xdtlDateCode is a variable that was evaluated to the current date in the startup.js script (there can be an arbitrary number of such variable definitions): 
 
var xdtlDateCode = java.lang.String.format("%1$tY%1$tm%1$td", new Array(new java.util.Date()));
 
To be continued...