Apache Daffodil Improves DFDL Compatibility
Written by Kay Ewbank   
Tuesday, 12 March 2019

Apache Daffodil. an open source implementation of the Data Format Description Language to convert between fixed format data and XML/JSON, has been updated to improve DFDL compatibility. 

The Data Format Description Language (DFDL) is a specification that was developed by the Open Grid Forum to create a standard way of describing different data formats, including both textual and binary, scientific and numeric, legacy and modern, commercial record-oriented, and many industry and military standards.

daffodil

 

The open-source implementation, Daffodil, is currently an Apache Incubator project, has Java and Scala APIs, provides Apache NiFI processors for parsing and unparsing NiFi FlowFiles, and has an  extension to XML Calabash that declares XProc pipeline steps to parse and unparse input data. 

DFDL defines a language that is a subset of W3C XML schema to describe the logical format of the data, and annotations within the schema to describe the physical representation. The Open Grid Forum was created by a merger between the Global Grid Forum and the Enterprise Grid Alliance, and is a group of developers and vendors interested in standardizing grid computing.

Daffodil uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON, meaning developers can use XML or JSON to consume, inspect, and manipulate fixed format data. Daffodil can also be used in the reverse direction to serialize or “unparse” an XML or JSON infoset back to the original data format.
 

The updated release has a number of changes and bug fixes specifically made to improve IBM DFDL compatibility, including the TDML runner being improved to tolerate left-over data for IBM test compatibility.

Test Data Markup Language (TDML) it is a way of specifying a DFDL schema, input test data, and expected result or expected error/diagnostic messages, all self-contained in an XML file. IBM created TDML to capture tests for their own DFDL implementation. Daffodil incorporated the idea and has extended it, though there is now an effort to reconcile TDML dialects so that all implementations can run the same tests.

This release of Daffodil incorporates TDML runner cross validation, meaning it is now possible to use the TDML runner with tests with different DFDL implementations, including the IBM DFDL implementation. The TDML runner has also added type-aware infoset comparisons, meaning developers can now provide an xsi:type attribute in infoset elements, allowing the TDML runner to determine if two elements are logically the same even if there infoset values may differ.

 daffodil

More Information

Apache Daffodil

Related Articles

Flink Gets Event-time Streaming

The Significance Of Big Data

IBM Pure Data

IBM Hot Data In A Flash

Perform Data Queries Faster With Drill

BigQuery Now Open to All

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Firefox 1.0 Released 20 Years Ago
10/11/2024

A news item with the headline "Firefox browser takes on Microsoft" from 20 years ago has attracted renewed attention. It was originally published on the BBC News website on November 9th, 2004 rec [ ... ]



Sequin - Open Source Message Stream Built On Postgres
31/10/2024

Sequin is a tool for capturing changes and streaming data out of your Postgres database, guaranteeing exactly once processing. What does that mean?


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info