AJCC Technical Process For Converting the Cancer Staging Manual Into Interoperable Content
Building the Foundation
The AJCC needed to provide their staging manual in an electronic format where…
- Those who use the staging manual could access the specific information they needed
- Those who write the content for the staging manual would not experience interruptions to their workflow
Scriptorium Publishing worked with the AJCC to develop a DITA specialization based on the structure of the existing staging manual and new chapter template:
- created information architecture
- developed/implemented specialization
- worked with the AJCC on converting content from Word to DITA
- worked with easyDITA on testing the specialization
- made adjustments to specialization as needed during the conversion process
- challenges included:
- balancing the needs for structure and flexibility (the AJCC needed the specialization to have a more open structure than originally planned)
- building a specialization that would allow for future changes in staging
The AJCC started the process of moving to structured content by developing a template for a typical Microsoft Word chapter in the Cancer Staging Manual. Each chapter covered specific types of cancers for a single disease site; most chapters presented the staging information for each disease site in the same order. The template showed the implied structure of the content, which Scriptorium used as a starting point to determine the explicit XML structure needed.
Because the structure of the content was very specific to the AJCC, Scriptorium recommended the Darwin Information Typing Architecture (DITA), a form of XML that could be customized through specialization (the process of creating new elements based on existing ones). Scriptorium developed a specialization for the AJCC by creating new elements with AJCC-specific names — for example, a simple table in DITA became a staging table in the AJCC’s custom structure. The specialization also included rules about the number and sequence of elements to ensure that any new content followed the structure.
Developing the API Connector
The AJCC worked with easyDITA to set up an environment for content management and authoring, and to build an API connector. Because the content was now structured, very specific types of information within the Cancer Staging Manual could be programmatically identified and made available through the API.
As you might expect, the way that information developers approach a body of knowledge as content creators can vary somewhat from the way that consumers of that information would approach it. The API connector not only provided a means for medical information systems to access specific information in the Cancer Staging Manual but also created a conceptual bridge to transform information from the creator model to the consumer model.
As an example, at one point in the development, over forty information objects had been identified as common across most of the 55+ diseases in the Manual. Eventually, the API bundled many of these information objects together into collections that better served the needs of information consumers. Also, as the specialized structure was redefined to allow for more alternatives in the formatting of disease information, the API was adapted to make it easy to retrieve the right type of disease information, regardless of the particular format it happened to be in.
Converting the Content
The AJCC converted its content from Microsoft Word to XML in small batches to ensure that the specialized structure worked as intended. The conversion process allowed Scriptorium and easyDITA to test the structure and make adjustments to both the specialization and the CCMS’ authoring environment, as needed. The first converted documents also served as a testbed for easyDITA to refine the API.
Converting the content revealed that the structure was too strict to allow for variations in the way staging information was presented. The AJCC worked with Scriptorium to identify patterns in these deviations from the typical structure; for example, some disease sites used a different set of definitions or an alternate format for their staging tables. Scriptorium redefined the specialization rules to reflect these patterns and allow more room for variations in the structure.
While loosening the specialization rules posed the risk of a less consistent structure, it also left room for future changes in staging. For future editions, the volunteer authors will either update the content in easyDITA or in Word with a back-conversion from the DITA source. Therefore, a more open structure will make updates easier for these authors by minimizing their learning curve.