Apache Daffodil Improves DFDL Compatibility |
Written by Kay Ewbank |
Tuesday, 12 March 2019 |
Apache Daffodil. an open source implementation of the Data Format Description Language to convert between fixed format data and XML/JSON, has been updated to improve DFDL compatibility. The Data Format Description Language (DFDL) is a specification that was developed by the Open Grid Forum to create a standard way of describing different data formats, including both textual and binary, scientific and numeric, legacy and modern, commercial record-oriented, and many industry and military standards.
The open-source implementation, Daffodil, is currently an Apache Incubator project, has Java and Scala APIs, provides Apache NiFI processors for parsing and unparsing NiFi FlowFiles, and has an extension to XML Calabash that declares XProc pipeline steps to parse and unparse input data. DFDL defines a language that is a subset of W3C XML schema to describe the logical format of the data, and annotations within the schema to describe the physical representation. The Open Grid Forum was created by a merger between the Global Grid Forum and the Enterprise Grid Alliance, and is a group of developers and vendors interested in standardizing grid computing. Daffodil uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON, meaning developers can use XML or JSON to consume, inspect, and manipulate fixed format data. Daffodil can also be used in the reverse direction to serialize or “unparse” an XML or JSON infoset back to the original data format.The updated release has a number of changes and bug fixes specifically made to improve IBM DFDL compatibility, including the TDML runner being improved to tolerate left-over data for IBM test compatibility. Test Data Markup Language (TDML) it is a way of specifying a DFDL schema, input test data, and expected result or expected error/diagnostic messages, all self-contained in an XML file. IBM created TDML to capture tests for their own DFDL implementation. Daffodil incorporated the idea and has extended it, though there is now an effort to reconcile TDML dialects so that all implementations can run the same tests. This release of Daffodil incorporates TDML runner cross validation, meaning it is now possible to use the TDML runner with tests with different DFDL implementations, including the IBM DFDL implementation. The TDML runner has also added type-aware infoset comparisons, meaning developers can now provide an
More InformationRelated ArticlesFlink Gets Event-time Streaming Perform Data Queries Faster With Drill
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |