The UNSTRING statement is one of COBOL's most powerful and essential tools for parsing and extracting data from formatted strings. It enables you to split a single source string into multiple receiving fields based on delimiter characters, making it indispensable for processing delimited data files, parsing formatted input, extracting specific fields from complex strings, and handling CSV (comma-separated values) data. Understanding UNSTRING is crucial for any COBOL programmer working with text processing, data import/export, or file parsing tasks in mainframe environments.
Beyond basic string splitting, UNSTRING provides sophisticated parsing capabilities including support for multiple delimiters, pointer-based parsing for processing strings in segments, tallying to count parsed fields, delimiter detection to identify which separator was found, and count tracking to know exactly how many characters were placed in each field. These advanced features make UNSTRING suitable for complex data extraction scenarios where precise control over parsing behavior is required.
The UNSTRING statement implements a sophisticated parsing mechanism that reads characters from a source string and distributes them into multiple receiving fields, stopping and moving to the next field whenever a delimiter is encountered. This process continues until all receiving fields are filled, the source string is exhausted, or an overflow condition occurs. The statement provides fine-grained control over the parsing process through various clauses that allow you to specify delimiters, control starting positions, track parsing progress, and handle exceptional conditions.
At its core, UNSTRING performs character-by-character processing of the source string, copying characters into the current receiving field until a delimiter is found. When a delimiter is encountered, UNSTRING stops copying to the current field, moves to the next receiving field, and continues parsing from the character after the delimiter. This mechanism enables precise extraction of data fields from formatted strings where fields are separated by known delimiter characters such as commas, spaces, tabs, or custom separators.
The architectural design of UNSTRING supports both simple and complex parsing scenarios. For simple cases, you can parse a comma-delimited string into a fixed number of fields with minimal syntax. For complex cases, you can use multiple delimiters, process strings in segments using pointers, track parsing statistics, and handle various edge cases through overflow detection and delimiter identification. This flexibility makes UNSTRING suitable for a wide range of data processing tasks from simple field extraction to complex data transformation pipelines.
UNSTRING is fundamental to implementing data parsing and extraction patterns in COBOL applications. These patterns are essential for processing external data formats, importing data from other systems, parsing user input, and transforming data between different formats. Common patterns include CSV parsing, fixed-format record parsing, log file processing, configuration file reading, and data validation through field extraction.
In enterprise environments, UNSTRING enables COBOL applications to interface with contemporary data formats and systems. CSV files, JSON-like structures, pipe-delimited files, and other text-based data formats can be processed using UNSTRING to extract individual fields. This capability is crucial for data integration, ETL (Extract, Transform, Load) operations, and system interoperability where data must be parsed and validated before processing.
The parsing patterns enabled by UNSTRING also support data validation and error detection. By extracting fields and examining their contents, applications can validate data formats, detect malformed records, identify missing fields, and ensure data quality before further processing. This validation capability is essential for maintaining data integrity in enterprise applications where incorrect data can have significant business impact.
The performance characteristics of UNSTRING are important considerations in high-volume data processing scenarios. UNSTRING performs character-by-character processing, which can be efficient for most use cases but may require optimization for very large strings or high-frequency parsing operations. Understanding these performance characteristics enables developers to make informed decisions about when to use UNSTRING versus alternative parsing approaches.
Optimization strategies for UNSTRING include minimizing the number of UNSTRING operations in tight loops, using appropriate receiving field sizes to avoid unnecessary padding, leveraging pointer-based parsing for processing large strings in segments, and structuring parsing logic to minimize overhead. For very high-performance scenarios, preprocessing data or using specialized parsing routines may be more efficient, but UNSTRING provides an excellent balance of functionality, readability, and performance for most enterprise applications.
Memory efficiency is also a consideration when using UNSTRING. Receiving fields should be sized appropriately - too large wastes memory, while too small may truncate data. The COUNT IN clause helps identify truncation issues, and proper field sizing based on expected data characteristics optimizes both memory usage and processing efficiency.
12345678910111213UNSTRING source-field [DELIMITED BY [ALL] {delimiter-1|literal-1} [OR [ALL] {delimiter-2|literal-2}] ...] INTO receiving-field-1 [DELIMITER IN delimiter-field-1] [COUNT IN count-field-1] receiving-field-2 [DELIMITER IN delimiter-field-2] [COUNT IN count-field-2] ... [WITH POINTER pointer-field] [TALLYING IN tally-field] [ON OVERFLOW imperative-statement-1] [NOT ON OVERFLOW imperative-statement-2] [END-UNSTRING]
The UNSTRING statement parses the source-field and distributes its contents into multiple receiving fields based on specified delimiters.
The string to be parsed. This can be any alphanumeric data item or literal. UNSTRING reads from this field and splits it into the receiving fields.
Specifies one or more delimiter characters that mark where to split the source string. When UNSTRING encounters a delimiter, it stops copying to the current receiving field and moves to the next one. The delimiter itself is not copied to any receiving field.
ALL keyword: When used (e.g., DELIMITED BY ALL ","), consecutive occurrences of the delimiter are treated as a single delimiter. This prevents empty fields between consecutive delimiters, which is useful for handling multiple spaces or tabs.
Multiple delimiters: Use OR to specify multiple delimiters (e.g., DELIMITED BY "," OR " "). UNSTRING will split on any of the specified delimiters.
Lists the fields that will receive the parsed data. UNSTRING fills these fields in order, stopping at each delimiter and moving to the next field. If there are more delimiters than receiving fields, the ON OVERFLOW clause (if specified) is executed.
Each receiving field can optionally have DELIMITER IN (to store which delimiter was found) and COUNT IN (to store how many characters were copied).
Specifies a numeric field that indicates the starting position in the source string (1-based). Before UNSTRING, initialize this to the desired starting position. UNSTRING automatically updates it as processing proceeds, so after UNSTRING completes, it indicates where processing stopped. This enables parsing strings in multiple passes.
Counts how many receiving fields were actually filled with data. Initialize the tally field to zero before UNSTRING. It's incremented for each field that receives data. This helps validate that the expected number of fields were parsed.
For each receiving field, stores the actual delimiter character that was found. This is useful when using multiple delimiters to know which specific delimiter was encountered for each field. The delimiter field should be a single-character field.
For each receiving field, stores the number of characters that were actually copied. This helps determine if fields were fully populated, detect truncation, and handle variable-length data. The count field should be numeric.
ON OVERFLOW executes when there are more delimiters in the source than receiving fields, or when the source string is longer than can be processed. NOT ON OVERFLOW executes when parsing completes without overflow. These clauses enable proper error handling and data validation.
123456789101112131415161718192021222324252627282930313233343536IDENTIFICATION DIVISION. PROGRAM-ID. PARSE-EXAMPLE. DATA DIVISION. WORKING-STORAGE SECTION. 01 CSV-LINE PIC X(100) VALUE 'JOHN,SMITH,123 MAIN ST,NEW YORK,NY,10001'. 01 PARSED-FIELDS. 05 FIRST-NAME PIC X(15). 05 LAST-NAME PIC X(20). 05 STREET PIC X(30). 05 CITY PIC X(20). 05 STATE PIC X(2). 05 ZIP-CODE PIC X(10). PROCEDURE DIVISION. MAIN-LOGIC. DISPLAY 'Original: ' CSV-LINE UNSTRING CSV-LINE DELIMITED BY ',' INTO FIRST-NAME LAST-NAME STREET CITY STATE ZIP-CODE END-UNSTRING DISPLAY 'First Name: ' FIRST-NAME DISPLAY 'Last Name: ' LAST-NAME DISPLAY 'Street: ' STREET DISPLAY 'City: ' CITY DISPLAY 'State: ' STATE DISPLAY 'ZIP: ' ZIP-CODE STOP RUN.
This example parses a simple comma-delimited string into individual fields. Each comma marks where one field ends and the next begins.
12345678910111213141516171819202122232425262728DATA DIVISION. WORKING-STORAGE SECTION. 01 INPUT-LINE PIC X(80) VALUE 'JOHN SMITH|123 MAIN ST|NEW YORK|NY|10001'. 01 PARSED-DATA. 05 NAME-FIELD PIC X(30). 05 ADDRESS-FIELD PIC X(30). 05 CITY-FIELD PIC X(20). 05 STATE-FIELD PIC X(2). 05 ZIP-FIELD PIC X(10). 01 DELIMITER-FOUND PIC X(1). PROCEDURE DIVISION. MAIN-LOGIC. UNSTRING INPUT-LINE DELIMITED BY '|' OR ',' INTO NAME-FIELD DELIMITER IN DELIMITER-FOUND ADDRESS-FIELD CITY-FIELD STATE-FIELD ZIP-FIELD END-UNSTRING DISPLAY 'Name: ' NAME-FIELD DISPLAY 'Delimiter found: ' DELIMITER-FOUND DISPLAY 'Address: ' ADDRESS-FIELD STOP RUN.
This example uses multiple delimiters (pipe or comma) and captures which delimiter was found for the first field.
123456789101112131415161718192021222324252627282930313233343536373839DATA DIVISION. WORKING-STORAGE SECTION. 01 LONG-STRING PIC X(200) VALUE 'FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6,FIELD7,FIELD8'. 01 FIELD-1 PIC X(20). 01 FIELD-2 PIC X(20). 01 PARSE-POINTER PIC 9(3) VALUE 1. 01 FIELD-COUNT PIC 9(2) VALUE 0. PROCEDURE DIVISION. MAIN-LOGIC. *> Parse first two fields UNSTRING LONG-STRING DELIMITED BY ',' INTO FIELD-1 FIELD-2 WITH POINTER PARSE-POINTER TALLYING IN FIELD-COUNT END-UNSTRING DISPLAY 'Field 1: ' FIELD-1 DISPLAY 'Field 2: ' FIELD-2 DISPLAY 'Pointer after first parse: ' PARSE-POINTER DISPLAY 'Fields parsed: ' FIELD-COUNT *> Continue parsing from where we left off UNSTRING LONG-STRING DELIMITED BY ',' INTO FIELD-1 FIELD-2 WITH POINTER PARSE-POINTER TALLYING IN FIELD-COUNT END-UNSTRING DISPLAY 'Field 1 (second pass): ' FIELD-1 DISPLAY 'Field 2 (second pass): ' FIELD-2 DISPLAY 'Final pointer: ' PARSE-POINTER STOP RUN.
This example demonstrates parsing a long string in segments using the POINTER to resume from where the previous UNSTRING stopped.
12345678910111213141516171819202122232425262728293031323334353637DATA DIVISION. WORKING-STORAGE SECTION. 01 INPUT-DATA PIC X(100) VALUE 'VALUE1,VALUE2,VALUE3,VALUE4,VALUE5,VALUE6'. 01 PARSED-VALUES. 05 VALUE-1 PIC X(15). 05 VALUE-2 PIC X(15). 05 VALUE-3 PIC X(15). 01 PARSE-POINTER PIC 9(3) VALUE 1. 01 OVERFLOW-FLAG PIC X(1) VALUE 'N'. 88 HAS-OVERFLOW VALUE 'Y'. PROCEDURE DIVISION. MAIN-LOGIC. UNSTRING INPUT-DATA DELIMITED BY ',' INTO VALUE-1 VALUE-2 VALUE-3 WITH POINTER PARSE-POINTER ON OVERFLOW MOVE 'Y' TO OVERFLOW-FLAG DISPLAY 'WARNING: More data than receiving fields' DISPLAY 'Processing stopped at position: ' PARSE-POINTER NOT ON OVERFLOW DISPLAY 'All data parsed successfully' END-UNSTRING DISPLAY 'Value 1: ' VALUE-1 DISPLAY 'Value 2: ' VALUE-2 DISPLAY 'Value 3: ' VALUE-3 IF HAS-OVERFLOW DISPLAY 'Remaining data starts at position: ' PARSE-POINTER END-IF STOP RUN.
This example demonstrates detecting and handling overflow when there are more delimiters than receiving fields.
1234567891011121314151617181920212223242526272829303132333435DATA DIVISION. WORKING-STORAGE SECTION. 01 INPUT-RECORD PIC X(80) VALUE 'JOHN,SMITH,12345'. 01 PARSED-FIELDS. 05 FIRST-NAME PIC X(20). 05 LAST-NAME PIC X(20). 05 ID-NUMBER PIC X(10). 01 FIELD-COUNTS. 05 NAME-COUNT PIC 9(2). 05 LAST-COUNT PIC 9(2). 05 ID-COUNT PIC 9(2). PROCEDURE DIVISION. MAIN-LOGIC. UNSTRING INPUT-RECORD DELIMITED BY ',' INTO FIRST-NAME COUNT IN NAME-COUNT LAST-NAME COUNT IN LAST-COUNT ID-NUMBER COUNT IN ID-COUNT END-UNSTRING DISPLAY 'First Name: ' FIRST-NAME ' (Length: ' NAME-COUNT ')' DISPLAY 'Last Name: ' LAST-NAME ' (Length: ' LAST-COUNT ')' DISPLAY 'ID Number: ' ID-NUMBER ' (Length: ' ID-COUNT ')' *> Validate field lengths IF NAME-COUNT < 2 DISPLAY 'ERROR: First name too short' END-IF IF ID-COUNT NOT = 5 DISPLAY 'ERROR: ID number must be 5 digits' END-IF STOP RUN.
COUNT IN helps validate that fields contain the expected amount of data, which is crucial for data quality checks.
12345678910111213141516171819202122232425262728DATA DIVISION. WORKING-STORAGE SECTION. 01 SPACE-DELIMITED PIC X(100) VALUE 'JOHN SMITH 123 MAIN ST NEW YORK'. 01 PARSED-FIELDS. 05 NAME-PART PIC X(20). 05 LAST-PART PIC X(20). 05 STREET-PART PIC X(30). 05 CITY-PART PIC X(20). PROCEDURE DIVISION. MAIN-LOGIC. *> Without ALL - would create empty fields *> With ALL - treats multiple spaces as one delimiter UNSTRING SPACE-DELIMITED DELIMITED BY ALL SPACE INTO NAME-PART LAST-PART STREET-PART CITY-PART END-UNSTRING DISPLAY 'Name: ' NAME-PART DISPLAY 'Last: ' LAST-PART DISPLAY 'Street: ' STREET-PART DISPLAY 'City: ' CITY-PART STOP RUN.
The ALL keyword prevents empty fields when there are multiple consecutive delimiters, which is common with space-delimited data.
123456789101112131415161718192021222324252627282930313233343536373839DATA DIVISION. WORKING-STORAGE SECTION. 01 VARIABLE-RECORD PIC X(200). 01 PARSED-FIELDS. 05 FIELD-1 PIC X(30). 05 FIELD-2 PIC X(30). 05 FIELD-3 PIC X(30). 05 FIELD-4 PIC X(30). 05 FIELD-5 PIC X(30). 01 FIELDS-PARSED PIC 9(2) VALUE 0. 01 EXPECTED-FIELDS PIC 9(2) VALUE 3. PROCEDURE DIVISION. MAIN-LOGIC. *> Example with variable number of fields MOVE 'VALUE1,VALUE2,VALUE3' TO VARIABLE-RECORD UNSTRING VARIABLE-RECORD DELIMITED BY ',' INTO FIELD-1 FIELD-2 FIELD-3 FIELD-4 FIELD-5 TALLYING IN FIELDS-PARSED END-UNSTRING DISPLAY 'Fields parsed: ' FIELDS-PARSED DISPLAY 'Expected: ' EXPECTED-FIELDS IF FIELDS-PARSED NOT = EXPECTED-FIELDS DISPLAY 'WARNING: Unexpected number of fields' END-IF DISPLAY 'Field 1: ' FIELD-1 DISPLAY 'Field 2: ' FIELD-2 DISPLAY 'Field 3: ' FIELD-3 STOP RUN.
TALLYING helps validate that the expected number of fields were parsed, which is important for data validation and error detection.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110IDENTIFICATION DIVISION. PROGRAM-ID. CSV-PARSER. *> Comprehensive CSV parsing example DATA DIVISION. WORKING-STORAGE SECTION. 01 CSV-RECORD PIC X(500). 01 PARSED-CUSTOMER. 05 CUSTOMER-ID PIC 9(8). 05 CUSTOMER-NAME PIC X(50). 05 EMAIL PIC X(50). 05 PHONE PIC X(20). 05 ADDRESS PIC X(100). 05 CITY PIC X(30). 05 STATE PIC X(2). 05 ZIP-CODE PIC X(10). 01 PARSING-CONTROLS. 05 PARSE-POINTER PIC 9(4) VALUE 1. 05 FIELDS-COUNT PIC 9(2) VALUE 0. 05 FIELD-LENGTHS. 10 ID-LENGTH PIC 9(2). 10 NAME-LENGTH PIC 9(2). 10 EMAIL-LENGTH PIC 9(2). 01 VALIDATION-FLAGS. 05 VALID-RECORD PIC X(1). 88 IS-VALID VALUE 'Y'. 88 IS-INVALID VALUE 'N'. 01 ERROR-MESSAGE PIC X(100). PROCEDURE DIVISION. MAIN-LOGIC. *> Example CSV record MOVE '12345678,John Smith,john@email.com,555-1234,' '123 Main St,New York,NY,10001' TO CSV-RECORD PERFORM PARSE-CSV-RECORD PERFORM VALIDATE-PARSED-DATA PERFORM DISPLAY-RESULTS STOP RUN. PARSE-CSV-RECORD. MOVE 1 TO PARSE-POINTER MOVE 0 TO FIELDS-COUNT MOVE 'N' TO VALID-RECORD UNSTRING CSV-RECORD DELIMITED BY ',' INTO CUSTOMER-ID COUNT IN ID-LENGTH CUSTOMER-NAME COUNT IN NAME-LENGTH EMAIL COUNT IN EMAIL-LENGTH PHONE ADDRESS CITY STATE ZIP-CODE WITH POINTER PARSE-POINTER TALLYING IN FIELDS-COUNT ON OVERFLOW MOVE 'WARNING: Extra fields in record' TO ERROR-MESSAGE NOT ON OVERFLOW IF FIELDS-COUNT = 8 MOVE 'Y' TO VALID-RECORD ELSE MOVE 'ERROR: Missing required fields' TO ERROR-MESSAGE END-IF END-UNSTRING. VALIDATE-PARSED-DATA. IF IS-VALID *> Validate individual fields IF CUSTOMER-ID = ZERO MOVE 'ERROR: Invalid customer ID' TO ERROR-MESSAGE MOVE 'N' TO VALID-RECORD END-IF IF NAME-LENGTH < 3 MOVE 'ERROR: Name too short' TO ERROR-MESSAGE MOVE 'N' TO VALID-RECORD END-IF IF EMAIL-LENGTH < 5 MOVE 'ERROR: Invalid email' TO ERROR-MESSAGE MOVE 'N' TO VALID-RECORD END-IF END-IF. DISPLAY-RESULTS. DISPLAY '=== CSV Parsing Results ===' DISPLAY 'Fields Parsed: ' FIELDS-COUNT DISPLAY 'Valid Record: ' VALID-RECORD IF IS-VALID DISPLAY 'Customer ID: ' CUSTOMER-ID DISPLAY 'Name: ' CUSTOMER-NAME DISPLAY 'Email: ' EMAIL DISPLAY 'Phone: ' PHONE DISPLAY 'Address: ' ADDRESS DISPLAY 'City: ' CITY DISPLAY 'State: ' STATE DISPLAY 'ZIP: ' ZIP-CODE ELSE DISPLAY 'Error: ' ERROR-MESSAGE END-IF.
This comprehensive example demonstrates parsing CSV data with validation, field counting, length tracking, and error handling - all essential for production CSV processing.
1. What does the UNSTRING statement do?
2. What does the POINTER in UNSTRING indicate?
3. What happens when UNSTRING has more delimiters than receiving fields?
4. What does TALLYING IN count?
5. What does DELIMITER IN store?
6. How do you specify multiple delimiters in UNSTRING?
7. What is the ALL keyword used for in UNSTRING?
8. What does COUNT IN store?