COBOL XML processing lets you read and produce XML from within a COBOL program. IBM Enterprise COBOL provides XML PARSE for reading XML (event-driven) and XML GENERATE for writing XML from a group item. This page explains how XML PARSE works, the events it raises, how to handle encoding (EBCDIC, ASCII, Unicode), and how to use XML GENERATE and basic exception handling.
Imagine XML as a nested set of labeled boxes: each box has a name (like "customer" or "name") and can contain text or more boxes. XML processing in COBOL is the program opening those boxes one by one and doing something with the names and the text inside. The program does not hold the whole document in memory like a tree; instead, the parser tells it "here is the start of a box called customer," then "here is the text inside the name box," then "here is the end of the name box." The program reacts to each message and saves or uses the data it needs. Going the other way, XML GENERATE is like the program filling in a form (a COBOL structure) and having the compiler turn that form into XML text.
XML PARSE is a single statement that takes the XML input (a COBOL data item containing the document) and the name of a processing procedure. When you execute XML PARSE, the compiler's XML parser scans the document and, for each significant part (document start/end, element start/end, text content, attributes), it calls your procedure. Inside the procedure you use the special register XML-EVENT to see which event occurred and then read other special registers (XML-ELEMENT-NAME, XML-CONTENT, etc.) to get the element name and content. This is similar to SAX-style parsing: low memory use and one pass through the document. You do not get a random-access tree; you handle each event in sequence and maintain your own state (e.g. which element you are in) if needed.
The form is: XML PARSE identifier (or literal) PROCESSING PROCEDURE procedure-name. The identifier is the data item that holds the XML; it can be alphanumeric or national. You can add ON EXCEPTION and NOT ON EXCEPTION to handle parsing errors (e.g. invalid XML, encoding issues). With ENCODING you can specify a code page. The processing procedure must be in the same program and is invoked once per event; it should not execute XML PARSE again (no re-entrant parse from inside the procedure).
1234567891011121314151617181920212223242526DATA DIVISION. WORKING-STORAGE SECTION. 01 WS-XML-BUFFER PIC X(10000). 01 WS-ELEMENT-NAME PIC X(50). 01 WS-CONTENT PIC X(500). PROCEDURE DIVISION. MOVE '' TO WS-XML-BUFFER XML PARSE WS-XML-BUFFER PROCESSING PROCEDURE XML-HANDLER ON EXCEPTION DISPLAY 'XML PARSE ERROR' END-XML GOBACK. XML-HANDLER. EVALUATE XML-EVENT WHEN 'START-OF-ELEMENT' MOVE XML-ELEMENT-NAME TO WS-ELEMENT-NAME WHEN 'CONTENT-OF-ELEMENT' MOVE XML-CONTENT TO WS-CONTENT *> Process WS-CONTENT for current element WHEN OTHER CONTINUE END-EVALUATE. - Widget
The procedure name is specified without parentheses in PROCESSING PROCEDURE. The parser passes control to it for each event; when the procedure returns, the parser continues. Use EVALUATE XML-EVENT to branch on the event type. XML-ELEMENT-NAME and XML-CONTENT have the correct values only for the events that provide them (e.g. element name on start/end, content on CONTENT-OF-ELEMENT).
The parser generates a sequence of events. START-OF-DOCUMENT occurs once at the beginning; END-OF-DOCUMENT once at the end. For each element, you get START-OF-ELEMENT (when the opening tag is seen), zero or more CONTENT-OF-ELEMENT events for the text content, and END-OF-ELEMENT when the closing tag is seen. If the element has attributes, attribute-related events occur during the start tag. The order is deterministic: you see the document in the same order as in the file. To "remember" where you are (e.g. "I am inside order/item") you can maintain a stack or level counter in working storage: increment or push on START-OF-ELEMENT and decrement or pop on END-OF-ELEMENT, and only process CONTENT-OF-ELEMENT when you are at the right level or element name.
| Event | When | Typical use |
|---|---|---|
| START-OF-DOCUMENT | Before any element | Initialize state, open output, or set flags. |
| END-OF-DOCUMENT | After all content | Finalize, close output, or validate. |
| START-OF-ELEMENT | Opening tag encountered | XML-ELEMENT-NAME has the tag name; push context or start collecting attributes. |
| END-OF-ELEMENT | Closing tag encountered | Pop context, finish current element handling. |
| CONTENT-OF-ELEMENT | Text between tags | XML-CONTENT has the text; copy to your fields or process. |
| ATTRIBUTE-NAME / ATTRIBUTE-VALUE | Attribute in start tag | Read attribute name and value from special registers. |
XML-EVENT is set by the parser to a value that indicates the current event (e.g. 'START-OF-ELEMENT', 'CONTENT-OF-ELEMENT'). Your procedure should EVALUATE or IF on XML-EVENT and only use other registers where they are defined. XML-ELEMENT-NAME contains the local name of the current element for start and end element events; use it to decide which element you are in. XML-CONTENT contains the character data for CONTENT-OF-ELEMENT events. For attributes, the compiler provides registers such as XML-ATTRIBUTE-NAME and XML-ATTRIBUTE-VALUE (exact names depend on the compiler version). Namespace information may be in XML-NAMESPACE-PREFIX and related registers. Copy the values you need into your own fields if you need to use them after returning from the procedure, because the parser may overwrite the special registers on the next event.
On z/OS, COBOL data is usually in EBCDIC. XML documents can be in EBCDIC, ASCII, or Unicode (UTF-8, UTF-16). The XML declaration (e.g. <?xml version="1.0" encoding="UTF-8"?>) states the encoding; the parser may use it to interpret the bytes. If your XML is in ASCII and your program holds EBCDIC, the character codes will be wrong unless you convert. IBM documents a common approach: convert from EBCDIC to Unicode (national) using NATIONAL-OF with the right source code page, then if needed to ASCII using DISPLAY-OF with code page 819 (or the target encoding). Then parse the converted data. For output (XML GENERATE), you can specify encoding so the generated XML is in the desired character set. Getting encoding wrong causes wrong characters or parser errors; always match the data in the buffer to what the parser expects.
| Encoding | Note |
|---|---|
| EBCDIC | Native on z/OS. XML in EBCDIC can be parsed directly if the document declares the correct encoding. |
| Unicode (UTF-8/UTF-16) | Supported via national data and conversion. Use NATIONAL-OF/DISPLAY-OF with the right code page. |
| ASCII | Not native; convert to Unicode or EBCDIC first. Code page 819 is often used for ASCII conversion. |
XML PARSE can fail for several reasons: invalid XML (malformed tags, bad nesting), invalid characters for the encoding, or resource limits. Use ON EXCEPTION to run error-handling code; NOT ON EXCEPTION for success path. The compiler may set exception codes (e.g. in XML-CODE or a similar register) to indicate the reason. In the exception block you should log the error and avoid using parsed data; the document may be partially processed. For production programs, always use ON EXCEPTION and decide whether to retry, use a default, or abend.
XML GENERATE builds an XML document from a COBOL group item. You define a record structure (nested groups and elementary items); the compiler maps group and data names to XML element and attribute names. You move data into the group, then execute XML GENERATE receiving-item FROM group-item. The receiving item is an alphanumeric or national field that will hold the generated XML. You can control name mapping (e.g. replacing hyphens with underscores), suppressing low-values or spaces, and encoding. XML GENERATE is the inverse of parsing: good for producing XML for web services or file output when your data is already in a COBOL structure. The generated XML is well-formed but not necessarily pretty-printed; you can post-process if you need indentation.
1234567891011121314151601 ORDER-REC. 05 ORDER-ID PIC 9(6). 05 CUSTOMER-NAME PIC X(30). 05 ITEM OCCURS 3. 10 ITEM-NAME PIC X(20). 10 ITEM-QTY PIC 9(4). 01 WS-XML-OUT PIC X(2000). MOVE 12345 TO ORDER-ID MOVE 'ACME Corp' TO CUSTOMER-NAME MOVE 'Widget' TO ITEM-NAME(1) MOVE 10 TO ITEM-QTY(1) XML GENERATE WS-XML-OUT FROM ORDER-REC COUNT IN WS-XML-LEN ON EXCEPTION DISPLAY 'XML GENERATE ERROR' END-XML
COUNT IN gives the length of the generated XML. You can then write WS-XML-OUT (1:WS-XML-LEN) to a file or send it to a web service. Names in the group become element names; the hierarchy is preserved. Check your compiler manual for options to customize names (e.g. ATTRIBUTE for fields that should become attributes instead of child elements).
XML documents often use namespaces (prefix:localname). XML PARSE can be namespace-aware: you get the local name and the namespace URI (or prefix) in special registers so you can distinguish elements from different namespaces. When generating with XML GENERATE, namespace support depends on compiler options and how you name the group items. For complex namespace handling, refer to the IBM Enterprise COBOL documentation for XML-NAMESPACE and related clauses.
1. XML PARSE in COBOL uses:
2. To get the element name during parsing you use:
3. For ASCII-encoded XML on a mainframe you typically: