Apache Daffodil Improves DFDL Compatibility
Written by Kay Ewbank   
Tuesday, 12 March 2019

Apache Daffodil. an open source implementation of the Data Format Description Language to convert between fixed format data and XML/JSON, has been updated to improve DFDL compatibility. 

The Data Format Description Language (DFDL) is a specification that was developed by the Open Grid Forum to create a standard way of describing different data formats, including both textual and binary, scientific and numeric, legacy and modern, commercial record-oriented, and many industry and military standards.

daffodil

 

The open-source implementation, Daffodil, is currently an Apache Incubator project, has Java and Scala APIs, provides Apache NiFI processors for parsing and unparsing NiFi FlowFiles, and has an  extension to XML Calabash that declares XProc pipeline steps to parse and unparse input data. 

DFDL defines a language that is a subset of W3C XML schema to describe the logical format of the data, and annotations within the schema to describe the physical representation. The Open Grid Forum was created by a merger between the Global Grid Forum and the Enterprise Grid Alliance, and is a group of developers and vendors interested in standardizing grid computing.

Daffodil uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON, meaning developers can use XML or JSON to consume, inspect, and manipulate fixed format data. Daffodil can also be used in the reverse direction to serialize or “unparse” an XML or JSON infoset back to the original data format.
 

The updated release has a number of changes and bug fixes specifically made to improve IBM DFDL compatibility, including the TDML runner being improved to tolerate left-over data for IBM test compatibility.

Test Data Markup Language (TDML) it is a way of specifying a DFDL schema, input test data, and expected result or expected error/diagnostic messages, all self-contained in an XML file. IBM created TDML to capture tests for their own DFDL implementation. Daffodil incorporated the idea and has extended it, though there is now an effort to reconcile TDML dialects so that all implementations can run the same tests.

This release of Daffodil incorporates TDML runner cross validation, meaning it is now possible to use the TDML runner with tests with different DFDL implementations, including the IBM DFDL implementation. The TDML runner has also added type-aware infoset comparisons, meaning developers can now provide an xsi:type attribute in infoset elements, allowing the TDML runner to determine if two elements are logically the same even if there infoset values may differ.

 daffodil

More Information

Apache Daffodil

Related Articles

Flink Gets Event-time Streaming

The Significance Of Big Data

IBM Pure Data

IBM Hot Data In A Flash

Perform Data Queries Faster With Drill

BigQuery Now Open to All

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


GitHub Universe AI Announcements - Copilot And Spark
30/10/2024

GitHub has announced several improvements for developers at Universe, its annual conference. Developers will get multi-model Copilot and GitHub Spark, an AI-native tool for building applications in na [ ... ]



Apache Fury Adds Optimized Serializers For Scala
31/10/2024

Apache Fury has been updated to add GraalVM native images and with optimized serializers for Scala collection. The update also reduces Scala collection serialization cost via the use of  encoding [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info