MainframeMaster

COBOL XML Processing

COBOL XML processing lets you read and produce XML from within a COBOL program. IBM Enterprise COBOL provides XML PARSE for reading XML (event-driven) and XML GENERATE for writing XML from a group item. This page explains how XML PARSE works, the events it raises, how to handle encoding (EBCDIC, ASCII, Unicode), and how to use XML GENERATE and basic exception handling.

Explain Like I'm Five: What Is XML Processing in COBOL?

Imagine XML as a nested set of labeled boxes: each box has a name (like "customer" or "name") and can contain text or more boxes. XML processing in COBOL is the program opening those boxes one by one and doing something with the names and the text inside. The program does not hold the whole document in memory like a tree; instead, the parser tells it "here is the start of a box called customer," then "here is the text inside the name box," then "here is the end of the name box." The program reacts to each message and saves or uses the data it needs. Going the other way, XML GENERATE is like the program filling in a form (a COBOL structure) and having the compiler turn that form into XML text.

XML PARSE: Event-Driven Parsing

XML PARSE is a single statement that takes the XML input (a COBOL data item containing the document) and the name of a processing procedure. When you execute XML PARSE, the compiler's XML parser scans the document and, for each significant part (document start/end, element start/end, text content, attributes), it calls your procedure. Inside the procedure you use the special register XML-EVENT to see which event occurred and then read other special registers (XML-ELEMENT-NAME, XML-CONTENT, etc.) to get the element name and content. This is similar to SAX-style parsing: low memory use and one pass through the document. You do not get a random-access tree; you handle each event in sequence and maintain your own state (e.g. which element you are in) if needed.

Basic XML PARSE Syntax

The form is: XML PARSE identifier (or literal) PROCESSING PROCEDURE procedure-name. The identifier is the data item that holds the XML; it can be alphanumeric or national. You can add ON EXCEPTION and NOT ON EXCEPTION to handle parsing errors (e.g. invalid XML, encoding issues). With ENCODING you can specify a code page. The processing procedure must be in the same program and is invoked once per event; it should not execute XML PARSE again (no re-entrant parse from inside the procedure).

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DATA DIVISION. WORKING-STORAGE SECTION. 01 WS-XML-BUFFER PIC X(10000). 01 WS-ELEMENT-NAME PIC X(50). 01 WS-CONTENT PIC X(500). PROCEDURE DIVISION. MOVE 'Widget' TO WS-XML-BUFFER XML PARSE WS-XML-BUFFER PROCESSING PROCEDURE XML-HANDLER ON EXCEPTION DISPLAY 'XML PARSE ERROR' END-XML GOBACK. XML-HANDLER. EVALUATE XML-EVENT WHEN 'START-OF-ELEMENT' MOVE XML-ELEMENT-NAME TO WS-ELEMENT-NAME WHEN 'CONTENT-OF-ELEMENT' MOVE XML-CONTENT TO WS-CONTENT *> Process WS-CONTENT for current element WHEN OTHER CONTINUE END-EVALUATE.

The procedure name is specified without parentheses in PROCESSING PROCEDURE. The parser passes control to it for each event; when the procedure returns, the parser continues. Use EVALUATE XML-EVENT to branch on the event type. XML-ELEMENT-NAME and XML-CONTENT have the correct values only for the events that provide them (e.g. element name on start/end, content on CONTENT-OF-ELEMENT).

XML PARSE Events

The parser generates a sequence of events. START-OF-DOCUMENT occurs once at the beginning; END-OF-DOCUMENT once at the end. For each element, you get START-OF-ELEMENT (when the opening tag is seen), zero or more CONTENT-OF-ELEMENT events for the text content, and END-OF-ELEMENT when the closing tag is seen. If the element has attributes, attribute-related events occur during the start tag. The order is deterministic: you see the document in the same order as in the file. To "remember" where you are (e.g. "I am inside order/item") you can maintain a stack or level counter in working storage: increment or push on START-OF-ELEMENT and decrement or pop on END-OF-ELEMENT, and only process CONTENT-OF-ELEMENT when you are at the right level or element name.

XML PARSE event types
EventWhenTypical use
START-OF-DOCUMENTBefore any elementInitialize state, open output, or set flags.
END-OF-DOCUMENTAfter all contentFinalize, close output, or validate.
START-OF-ELEMENTOpening tag encounteredXML-ELEMENT-NAME has the tag name; push context or start collecting attributes.
END-OF-ELEMENTClosing tag encounteredPop context, finish current element handling.
CONTENT-OF-ELEMENTText between tagsXML-CONTENT has the text; copy to your fields or process.
ATTRIBUTE-NAME / ATTRIBUTE-VALUEAttribute in start tagRead attribute name and value from special registers.

Special Registers: XML-EVENT, XML-ELEMENT-NAME, XML-CONTENT

XML-EVENT is set by the parser to a value that indicates the current event (e.g. 'START-OF-ELEMENT', 'CONTENT-OF-ELEMENT'). Your procedure should EVALUATE or IF on XML-EVENT and only use other registers where they are defined. XML-ELEMENT-NAME contains the local name of the current element for start and end element events; use it to decide which element you are in. XML-CONTENT contains the character data for CONTENT-OF-ELEMENT events. For attributes, the compiler provides registers such as XML-ATTRIBUTE-NAME and XML-ATTRIBUTE-VALUE (exact names depend on the compiler version). Namespace information may be in XML-NAMESPACE-PREFIX and related registers. Copy the values you need into your own fields if you need to use them after returning from the procedure, because the parser may overwrite the special registers on the next event.

Encoding: EBCDIC, ASCII, and Unicode

On z/OS, COBOL data is usually in EBCDIC. XML documents can be in EBCDIC, ASCII, or Unicode (UTF-8, UTF-16). The XML declaration (e.g. <?xml version="1.0" encoding="UTF-8"?>) states the encoding; the parser may use it to interpret the bytes. If your XML is in ASCII and your program holds EBCDIC, the character codes will be wrong unless you convert. IBM documents a common approach: convert from EBCDIC to Unicode (national) using NATIONAL-OF with the right source code page, then if needed to ASCII using DISPLAY-OF with code page 819 (or the target encoding). Then parse the converted data. For output (XML GENERATE), you can specify encoding so the generated XML is in the desired character set. Getting encoding wrong causes wrong characters or parser errors; always match the data in the buffer to what the parser expects.

Encoding considerations
EncodingNote
EBCDICNative on z/OS. XML in EBCDIC can be parsed directly if the document declares the correct encoding.
Unicode (UTF-8/UTF-16)Supported via national data and conversion. Use NATIONAL-OF/DISPLAY-OF with the right code page.
ASCIINot native; convert to Unicode or EBCDIC first. Code page 819 is often used for ASCII conversion.

Exception Handling: ON EXCEPTION

XML PARSE can fail for several reasons: invalid XML (malformed tags, bad nesting), invalid characters for the encoding, or resource limits. Use ON EXCEPTION to run error-handling code; NOT ON EXCEPTION for success path. The compiler may set exception codes (e.g. in XML-CODE or a similar register) to indicate the reason. In the exception block you should log the error and avoid using parsed data; the document may be partially processed. For production programs, always use ON EXCEPTION and decide whether to retry, use a default, or abend.

XML GENERATE: Producing XML from COBOL Data

XML GENERATE builds an XML document from a COBOL group item. You define a record structure (nested groups and elementary items); the compiler maps group and data names to XML element and attribute names. You move data into the group, then execute XML GENERATE receiving-item FROM group-item. The receiving item is an alphanumeric or national field that will hold the generated XML. You can control name mapping (e.g. replacing hyphens with underscores), suppressing low-values or spaces, and encoding. XML GENERATE is the inverse of parsing: good for producing XML for web services or file output when your data is already in a COBOL structure. The generated XML is well-formed but not necessarily pretty-printed; you can post-process if you need indentation.

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
01 ORDER-REC. 05 ORDER-ID PIC 9(6). 05 CUSTOMER-NAME PIC X(30). 05 ITEM OCCURS 3. 10 ITEM-NAME PIC X(20). 10 ITEM-QTY PIC 9(4). 01 WS-XML-OUT PIC X(2000). MOVE 12345 TO ORDER-ID MOVE 'ACME Corp' TO CUSTOMER-NAME MOVE 'Widget' TO ITEM-NAME(1) MOVE 10 TO ITEM-QTY(1) XML GENERATE WS-XML-OUT FROM ORDER-REC COUNT IN WS-XML-LEN ON EXCEPTION DISPLAY 'XML GENERATE ERROR' END-XML

COUNT IN gives the length of the generated XML. You can then write WS-XML-OUT (1:WS-XML-LEN) to a file or send it to a web service. Names in the group become element names; the hierarchy is preserved. Check your compiler manual for options to customize names (e.g. ATTRIBUTE for fields that should become attributes instead of child elements).

Namespaces

XML documents often use namespaces (prefix:localname). XML PARSE can be namespace-aware: you get the local name and the namespace URI (or prefix) in special registers so you can distinguish elements from different namespaces. When generating with XML GENERATE, namespace support depends on compiler options and how you name the group items. For complex namespace handling, refer to the IBM Enterprise COBOL documentation for XML-NAMESPACE and related clauses.

Step-by-Step: Parsing an XML Document

  1. Load the XML into a COBOL field (e.g. by reading a file or receiving from a queue). Ensure the encoding matches what the parser expects; convert if necessary (e.g. ASCII to EBCDIC or to national).
  2. Write a processing procedure that EVALUATEs XML-EVENT. For START-OF-ELEMENT, save or check XML-ELEMENT-NAME. For CONTENT-OF-ELEMENT, copy XML-CONTENT to your working fields and process according to the current element (tracked by your state). For END-OF-ELEMENT, update your state (e.g. pop a stack).
  3. Execute XML PARSE identifier PROCESSING PROCEDURE your-proc ON EXCEPTION perform error handling END-XML.
  4. After END-XML, if no exception, use the data you collected in working storage. If exception, do not rely on partial data; log and handle the error.

Step-by-Step: Generating XML with XML GENERATE

  1. Define a group item that mirrors the XML structure you want (nested groups for nested elements; elementary items for leaf content). Move all required values into the group.
  2. Provide a receiving field large enough for the generated XML (estimate from total size of data plus tag overhead, or use a large buffer and COUNT IN to get the actual length).
  3. Execute XML GENERATE receiving-field FROM group-item COUNT IN length-field ON EXCEPTION handle error END-XML.
  4. Use the content of the receiving field from 1 to length-field (from COUNT IN) for file output, CICS, or HTTP response.

Best Practices

Test Your Knowledge

Test Your Knowledge

1. XML PARSE in COBOL uses:

  • DOM tree in memory
  • Event-driven parsing with a processing procedure
  • Only STRING and UNSTRING
  • JCL to call a utility

2. To get the element name during parsing you use:

  • XML-CONTENT
  • XML-ELEMENT-NAME
  • XML-FILE
  • READ FILE

3. For ASCII-encoded XML on a mainframe you typically:

  • Parse it directly
  • Convert to Unicode or EBCDIC before parsing
  • Use XML GENERATE only
  • Ignore encoding