What is the UNSTRING statement in COBOL?

The UNSTRING statement in COBOL is used to parse (split) a source string into multiple receiving fields based on specified delimiters. It's the opposite of the STRING statement - while STRING combines multiple strings into one, UNSTRING breaks one string into multiple parts. UNSTRING is essential for parsing formatted input data like comma-separated values (CSV), parsing dates, extracting specific fields from formatted records, and processing delimited text files. It provides flexible parsing capabilities with support for multiple delimiters, pointer control, and overflow handling.

How does UNSTRING work with delimiters?

UNSTRING splits the source string whenever it encounters a delimiter character. You specify one or more delimiters using the DELIMITED BY clause. When UNSTRING finds a delimiter, it stops copying to the current receiving field and moves to the next one. You can specify multiple delimiters using OR (e.g., DELIMITED BY "," OR " "), and UNSTRING will split on any of them. The ALL keyword makes UNSTRING treat consecutive occurrences of a delimiter as a single delimiter, which is useful for handling multiple spaces or tabs. The delimiter itself is not copied to the receiving field - it marks where to split.

What is the POINTER in UNSTRING?

The POINTER in UNSTRING (specified with WITH POINTER) indicates the starting position in the source string where parsing should begin. It's a numeric field that you initialize before the UNSTRING statement. As UNSTRING processes the source string, it automatically updates the POINTER to indicate where processing stopped. This allows you to parse the string in multiple passes, resuming from where the previous UNSTRING left off. The POINTER is 1-based (first character is position 1). After UNSTRING completes, the POINTER indicates the next character position, which is useful for continuing parsing or detecting where processing ended.

What does TALLYING do in UNSTRING?

The TALLYING IN clause in UNSTRING counts how many receiving fields were actually filled with data. This is useful when the source string has fewer delimiters than receiving fields, or when you want to know how many fields were populated. The tally field is incremented for each receiving field that receives data. If a receiving field doesn't receive data (because there are no more delimiters), it's not counted. This helps you determine how many fields were successfully parsed, which is important for error handling and data validation.

What happens when UNSTRING runs out of receiving fields?

When UNSTRING encounters more delimiters than there are receiving fields, the ON OVERFLOW clause (if specified) is executed. This allows you to handle cases where the source string has more data than expected. Without ON OVERFLOW, UNSTRING simply stops processing when all receiving fields are filled, and any remaining data in the source string is ignored. The POINTER will indicate where processing stopped, so you can check if there's remaining data. It's important to handle overflow conditions to ensure data isn't lost and to detect malformed input.

What is DELIMITER IN in UNSTRING?

The DELIMITER IN clause (e.g., DELIMITER IN delimiter-field) stores the actual delimiter character that was found for each receiving field. This is useful when you're using multiple delimiters and want to know which specific delimiter was encountered for each field. For example, if you're parsing a string with both commas and spaces as delimiters, DELIMITER IN tells you whether a comma or space was found for each field. This information can be important for data validation, logging, or further processing decisions.

What is COUNT IN in UNSTRING?

The COUNT IN clause (e.g., COUNT IN count-field) stores the number of characters that were actually copied into each receiving field. This is useful for knowing exactly how much data was placed in each field, especially when receiving fields are larger than the data being copied. COUNT IN helps with data validation, determining if fields were fully populated, and handling variable-length data. It's particularly useful when you need to know the actual length of data in each field for further processing.

How do you parse CSV data with UNSTRING?

To parse CSV (comma-separated values) data with UNSTRING, use a comma as the delimiter: UNSTRING csv-line DELIMITED BY "," INTO field1 field2 field3 ... END-UNSTRING. For CSV files that may have quoted fields or escaped commas, you may need multiple UNSTRING operations or additional processing. Handle the ON OVERFLOW clause to detect if there are more fields than expected. Use TALLYING to count how many fields were successfully parsed. For CSV with spaces after commas, you might use DELIMITED BY "," OR " " or trim spaces after parsing.

Can UNSTRING handle multiple delimiters?

Yes, UNSTRING can handle multiple delimiters using the OR keyword. For example: UNSTRING source DELIMITED BY "," OR " " OR "|" INTO field1 field2 field3. UNSTRING will split on any of the specified delimiters. The DELIMITER IN clause can tell you which specific delimiter was found for each field. The ALL keyword can be used with any delimiter to treat consecutive occurrences as a single delimiter, which is useful for handling multiple spaces or tabs as a single separator.

What is the difference between UNSTRING and STRING?

UNSTRING and STRING are opposite operations. STRING combines multiple source strings into one receiving field, while UNSTRING splits one source string into multiple receiving fields. STRING is used for building formatted output (like combining name parts into a full name), while UNSTRING is used for parsing formatted input (like splitting a full name into parts). STRING uses DELIMITED BY to determine when to stop copying from each source, while UNSTRING uses DELIMITED BY to determine where to split the source. Both support POINTER, but STRING uses it to indicate where to start placing data, while UNSTRING uses it to indicate where to start reading data.

MainframeMaster

COBOL Tutorial

COBOL UNSTRING Statement

The UNSTRING statement is one of COBOL's most powerful and essential tools for parsing and extracting data from formatted strings. It enables you to split a single source string into multiple receiving fields based on delimiter characters, making it indispensable for processing delimited data files, parsing formatted input, extracting specific fields from complex strings, and handling CSV (comma-separated values) data. Understanding UNSTRING is crucial for any COBOL programmer working with text processing, data import/export, or file parsing tasks in mainframe environments.

Beyond basic string splitting, UNSTRING provides sophisticated parsing capabilities including support for multiple delimiters, pointer-based parsing for processing strings in segments, tallying to count parsed fields, delimiter detection to identify which separator was found, and count tracking to know exactly how many characters were placed in each field. These advanced features make UNSTRING suitable for complex data extraction scenarios where precise control over parsing behavior is required.

Understanding UNSTRING Statement Architecture

The UNSTRING statement implements a sophisticated parsing mechanism that reads characters from a source string and distributes them into multiple receiving fields, stopping and moving to the next field whenever a delimiter is encountered. This process continues until all receiving fields are filled, the source string is exhausted, or an overflow condition occurs. The statement provides fine-grained control over the parsing process through various clauses that allow you to specify delimiters, control starting positions, track parsing progress, and handle exceptional conditions.

At its core, UNSTRING performs character-by-character processing of the source string, copying characters into the current receiving field until a delimiter is found. When a delimiter is encountered, UNSTRING stops copying to the current field, moves to the next receiving field, and continues parsing from the character after the delimiter. This mechanism enables precise extraction of data fields from formatted strings where fields are separated by known delimiter characters such as commas, spaces, tabs, or custom separators.

The architectural design of UNSTRING supports both simple and complex parsing scenarios. For simple cases, you can parse a comma-delimited string into a fixed number of fields with minimal syntax. For complex cases, you can use multiple delimiters, process strings in segments using pointers, track parsing statistics, and handle various edge cases through overflow detection and delimiter identification. This flexibility makes UNSTRING suitable for a wide range of data processing tasks from simple field extraction to complex data transformation pipelines.

Comprehensive UNSTRING Statement Capabilities:

Delimiter-Based Parsing: Split strings based on one or more delimiter characters, with support for treating consecutive delimiters as a single separator using the ALL keyword.
Pointer Control: Specify starting positions and track parsing progress through the WITH POINTER clause, enabling segment-based processing of long strings.
Field Counting: Use TALLYING IN to count how many receiving fields were successfully populated, essential for validating parsed data.
Delimiter Detection: Identify which specific delimiter was found for each field using DELIMITER IN, useful when multiple delimiters are specified.
Character Counting: Track exactly how many characters were placed in each field using COUNT IN, important for handling variable-length data.
Overflow Handling: Detect and handle cases where the source string contains more data than can fit in the receiving fields through ON OVERFLOW clauses.
Flexible Delimiter Specification: Use literals, data items, or combinations of multiple delimiters to handle various data formats.
Automatic Pointer Management: UNSTRING automatically updates the pointer as it processes, enabling continuation of parsing in subsequent operations.

Data Parsing and Extraction Patterns

UNSTRING is fundamental to implementing data parsing and extraction patterns in COBOL applications. These patterns are essential for processing external data formats, importing data from other systems, parsing user input, and transforming data between different formats. Common patterns include CSV parsing, fixed-format record parsing, log file processing, configuration file reading, and data validation through field extraction.

In enterprise environments, UNSTRING enables COBOL applications to interface with contemporary data formats and systems. CSV files, JSON-like structures, pipe-delimited files, and other text-based data formats can be processed using UNSTRING to extract individual fields. This capability is crucial for data integration, ETL (Extract, Transform, Load) operations, and system interoperability where data must be parsed and validated before processing.

The parsing patterns enabled by UNSTRING also support data validation and error detection. By extracting fields and examining their contents, applications can validate data formats, detect malformed records, identify missing fields, and ensure data quality before further processing. This validation capability is essential for maintaining data integrity in enterprise applications where incorrect data can have significant business impact.

Performance and Efficiency Considerations

The performance characteristics of UNSTRING are important considerations in high-volume data processing scenarios. UNSTRING performs character-by-character processing, which can be efficient for most use cases but may require optimization for very large strings or high-frequency parsing operations. Understanding these performance characteristics enables developers to make informed decisions about when to use UNSTRING versus alternative parsing approaches.

Optimization strategies for UNSTRING include minimizing the number of UNSTRING operations in tight loops, using appropriate receiving field sizes to avoid unnecessary padding, leveraging pointer-based parsing for processing large strings in segments, and structuring parsing logic to minimize overhead. For very high-performance scenarios, preprocessing data or using specialized parsing routines may be more efficient, but UNSTRING provides an excellent balance of functionality, readability, and performance for most enterprise applications.

Memory efficiency is also a consideration when using UNSTRING. Receiving fields should be sized appropriately - too large wastes memory, while too small may truncate data. The COUNT IN clause helps identify truncation issues, and proper field sizing based on expected data characteristics optimizes both memory usage and processing efficiency.

Basic UNSTRING Syntax and Structure

Complete UNSTRING Syntax

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
UNSTRING source-field
    [DELIMITED BY [ALL] {delimiter-1|literal-1} 
         [OR [ALL] {delimiter-2|literal-2}] ...]
    INTO receiving-field-1 [DELIMITER IN delimiter-field-1]
                           [COUNT IN count-field-1]
         receiving-field-2 [DELIMITER IN delimiter-field-2]
                           [COUNT IN count-field-2]
         ...
    [WITH POINTER pointer-field]
    [TALLYING IN tally-field]
    [ON OVERFLOW imperative-statement-1]
    [NOT ON OVERFLOW imperative-statement-2]
    [END-UNSTRING]

The UNSTRING statement parses the source-field and distributes its contents into multiple receiving fields based on specified delimiters.

Syntax Components Explained

source-field

The string to be parsed. This can be any alphanumeric data item or literal. UNSTRING reads from this field and splits it into the receiving fields.

DELIMITED BY

Specifies one or more delimiter characters that mark where to split the source string. When UNSTRING encounters a delimiter, it stops copying to the current receiving field and moves to the next one. The delimiter itself is not copied to any receiving field.

ALL keyword: When used (e.g., DELIMITED BY ALL ","), consecutive occurrences of the delimiter are treated as a single delimiter. This prevents empty fields between consecutive delimiters, which is useful for handling multiple spaces or tabs.

Multiple delimiters: Use OR to specify multiple delimiters (e.g., DELIMITED BY "," OR " "). UNSTRING will split on any of the specified delimiters.

INTO receiving-field

Lists the fields that will receive the parsed data. UNSTRING fills these fields in order, stopping at each delimiter and moving to the next field. If there are more delimiters than receiving fields, the ON OVERFLOW clause (if specified) is executed.

Each receiving field can optionally have DELIMITER IN (to store which delimiter was found) and COUNT IN (to store how many characters were copied).

WITH POINTER

Specifies a numeric field that indicates the starting position in the source string (1-based). Before UNSTRING, initialize this to the desired starting position. UNSTRING automatically updates it as processing proceeds, so after UNSTRING completes, it indicates where processing stopped. This enables parsing strings in multiple passes.

TALLYING IN

Counts how many receiving fields were actually filled with data. Initialize the tally field to zero before UNSTRING. It's incremented for each field that receives data. This helps validate that the expected number of fields were parsed.

DELIMITER IN

For each receiving field, stores the actual delimiter character that was found. This is useful when using multiple delimiters to know which specific delimiter was encountered for each field. The delimiter field should be a single-character field.

COUNT IN

For each receiving field, stores the number of characters that were actually copied. This helps determine if fields were fully populated, detect truncation, and handle variable-length data. The count field should be numeric.

ON OVERFLOW / NOT ON OVERFLOW

ON OVERFLOW executes when there are more delimiters in the source than receiving fields, or when the source string is longer than can be processed. NOT ON OVERFLOW executes when parsing completes without overflow. These clauses enable proper error handling and data validation.

Simple UNSTRING Examples

Example 1: Basic Comma-Delimited Parsing

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
       IDENTIFICATION DIVISION.
       PROGRAM-ID. PARSE-EXAMPLE.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 CSV-LINE          PIC X(100) VALUE 
           'JOHN,SMITH,123 MAIN ST,NEW YORK,NY,10001'.
       01 PARSED-FIELDS.
           05 FIRST-NAME     PIC X(15).
           05 LAST-NAME      PIC X(20).
           05 STREET         PIC X(30).
           05 CITY           PIC X(20).
           05 STATE          PIC X(2).
           05 ZIP-CODE       PIC X(10).
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           DISPLAY 'Original: ' CSV-LINE
           
           UNSTRING CSV-LINE
               DELIMITED BY ','
               INTO FIRST-NAME
                    LAST-NAME
                    STREET
                    CITY
                    STATE
                    ZIP-CODE
           END-UNSTRING
           
           DISPLAY 'First Name: ' FIRST-NAME
           DISPLAY 'Last Name: ' LAST-NAME
           DISPLAY 'Street: ' STREET
           DISPLAY 'City: ' CITY
           DISPLAY 'State: ' STATE
           DISPLAY 'ZIP: ' ZIP-CODE
           
           STOP RUN.

This example parses a simple comma-delimited string into individual fields. Each comma marks where one field ends and the next begins.

Example 2: Using Multiple Delimiters

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 INPUT-LINE        PIC X(80) VALUE 
           'JOHN SMITH|123 MAIN ST|NEW YORK|NY|10001'.
       01 PARSED-DATA.
           05 NAME-FIELD     PIC X(30).
           05 ADDRESS-FIELD  PIC X(30).
           05 CITY-FIELD     PIC X(20).
           05 STATE-FIELD    PIC X(2).
           05 ZIP-FIELD      PIC X(10).
       01 DELIMITER-FOUND   PIC X(1).
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           UNSTRING INPUT-LINE
               DELIMITED BY '|' OR ','
               INTO NAME-FIELD DELIMITER IN DELIMITER-FOUND
                    ADDRESS-FIELD
                    CITY-FIELD
                    STATE-FIELD
                    ZIP-FIELD
           END-UNSTRING
           
           DISPLAY 'Name: ' NAME-FIELD
           DISPLAY 'Delimiter found: ' DELIMITER-FOUND
           DISPLAY 'Address: ' ADDRESS-FIELD
           
           STOP RUN.

This example uses multiple delimiters (pipe or comma) and captures which delimiter was found for the first field.

Example 3: Using POINTER for Segment Processing

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 LONG-STRING       PIC X(200) VALUE 
           'FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6,FIELD7,FIELD8'.
       01 FIELD-1           PIC X(20).
       01 FIELD-2           PIC X(20).
       01 PARSE-POINTER      PIC 9(3) VALUE 1.
       01 FIELD-COUNT       PIC 9(2) VALUE 0.
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           *> Parse first two fields
           UNSTRING LONG-STRING
               DELIMITED BY ','
               INTO FIELD-1
                    FIELD-2
               WITH POINTER PARSE-POINTER
               TALLYING IN FIELD-COUNT
           END-UNSTRING
           
           DISPLAY 'Field 1: ' FIELD-1
           DISPLAY 'Field 2: ' FIELD-2
           DISPLAY 'Pointer after first parse: ' PARSE-POINTER
           DISPLAY 'Fields parsed: ' FIELD-COUNT
           
           *> Continue parsing from where we left off
           UNSTRING LONG-STRING
               DELIMITED BY ','
               INTO FIELD-1
                    FIELD-2
               WITH POINTER PARSE-POINTER
               TALLYING IN FIELD-COUNT
           END-UNSTRING
           
           DISPLAY 'Field 1 (second pass): ' FIELD-1
           DISPLAY 'Field 2 (second pass): ' FIELD-2
           DISPLAY 'Final pointer: ' PARSE-POINTER
           
           STOP RUN.

This example demonstrates parsing a long string in segments using the POINTER to resume from where the previous UNSTRING stopped.

Example 4: Handling Overflow

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 INPUT-DATA        PIC X(100) VALUE 
           'VALUE1,VALUE2,VALUE3,VALUE4,VALUE5,VALUE6'.
       01 PARSED-VALUES.
           05 VALUE-1       PIC X(15).
           05 VALUE-2       PIC X(15).
           05 VALUE-3       PIC X(15).
       01 PARSE-POINTER     PIC 9(3) VALUE 1.
       01 OVERFLOW-FLAG     PIC X(1) VALUE 'N'.
           88 HAS-OVERFLOW  VALUE 'Y'.
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           UNSTRING INPUT-DATA
               DELIMITED BY ','
               INTO VALUE-1
                    VALUE-2
                    VALUE-3
               WITH POINTER PARSE-POINTER
               ON OVERFLOW
                   MOVE 'Y' TO OVERFLOW-FLAG
                   DISPLAY 'WARNING: More data than receiving fields'
                   DISPLAY 'Processing stopped at position: ' PARSE-POINTER
               NOT ON OVERFLOW
                   DISPLAY 'All data parsed successfully'
           END-UNSTRING
           
           DISPLAY 'Value 1: ' VALUE-1
           DISPLAY 'Value 2: ' VALUE-2
           DISPLAY 'Value 3: ' VALUE-3
           
           IF HAS-OVERFLOW
               DISPLAY 'Remaining data starts at position: ' PARSE-POINTER
           END-IF
           
           STOP RUN.

This example demonstrates detecting and handling overflow when there are more delimiters than receiving fields.

Advanced UNSTRING Techniques

Using COUNT IN for Data Validation

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 INPUT-RECORD      PIC X(80) VALUE 'JOHN,SMITH,12345'.
       01 PARSED-FIELDS.
           05 FIRST-NAME     PIC X(20).
           05 LAST-NAME      PIC X(20).
           05 ID-NUMBER      PIC X(10).
       01 FIELD-COUNTS.
           05 NAME-COUNT     PIC 9(2).
           05 LAST-COUNT     PIC 9(2).
           05 ID-COUNT       PIC 9(2).
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           UNSTRING INPUT-RECORD
               DELIMITED BY ','
               INTO FIRST-NAME COUNT IN NAME-COUNT
                    LAST-NAME COUNT IN LAST-COUNT
                    ID-NUMBER COUNT IN ID-COUNT
           END-UNSTRING
           
           DISPLAY 'First Name: ' FIRST-NAME ' (Length: ' NAME-COUNT ')'
           DISPLAY 'Last Name: ' LAST-NAME ' (Length: ' LAST-COUNT ')'
           DISPLAY 'ID Number: ' ID-NUMBER ' (Length: ' ID-COUNT ')'
           
           *> Validate field lengths
           IF NAME-COUNT < 2
               DISPLAY 'ERROR: First name too short'
           END-IF
           
           IF ID-COUNT NOT = 5
               DISPLAY 'ERROR: ID number must be 5 digits'
           END-IF
           
           STOP RUN.

COUNT IN helps validate that fields contain the expected amount of data, which is crucial for data quality checks.

Using ALL to Handle Multiple Spaces

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 SPACE-DELIMITED   PIC X(100) VALUE 
           'JOHN    SMITH    123 MAIN ST    NEW YORK'.
       01 PARSED-FIELDS.
           05 NAME-PART      PIC X(20).
           05 LAST-PART      PIC X(20).
           05 STREET-PART    PIC X(30).
           05 CITY-PART     PIC X(20).
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           *> Without ALL - would create empty fields
           *> With ALL - treats multiple spaces as one delimiter
           UNSTRING SPACE-DELIMITED
               DELIMITED BY ALL SPACE
               INTO NAME-PART
                    LAST-PART
                    STREET-PART
                    CITY-PART
           END-UNSTRING
           
           DISPLAY 'Name: ' NAME-PART
           DISPLAY 'Last: ' LAST-PART
           DISPLAY 'Street: ' STREET-PART
           DISPLAY 'City: ' CITY-PART
           
           STOP RUN.

The ALL keyword prevents empty fields when there are multiple consecutive delimiters, which is common with space-delimited data.

Complex Parsing with TALLYING

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 VARIABLE-RECORD   PIC X(200).
       01 PARSED-FIELDS.
           05 FIELD-1        PIC X(30).
           05 FIELD-2        PIC X(30).
           05 FIELD-3        PIC X(30).
           05 FIELD-4        PIC X(30).
           05 FIELD-5        PIC X(30).
       01 FIELDS-PARSED     PIC 9(2) VALUE 0.
       01 EXPECTED-FIELDS   PIC 9(2) VALUE 3.
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           *> Example with variable number of fields
           MOVE 'VALUE1,VALUE2,VALUE3' TO VARIABLE-RECORD
           
           UNSTRING VARIABLE-RECORD
               DELIMITED BY ','
               INTO FIELD-1
                    FIELD-2
                    FIELD-3
                    FIELD-4
                    FIELD-5
               TALLYING IN FIELDS-PARSED
           END-UNSTRING
           
           DISPLAY 'Fields parsed: ' FIELDS-PARSED
           DISPLAY 'Expected: ' EXPECTED-FIELDS
           
           IF FIELDS-PARSED NOT = EXPECTED-FIELDS
               DISPLAY 'WARNING: Unexpected number of fields'
           END-IF
           
           DISPLAY 'Field 1: ' FIELD-1
           DISPLAY 'Field 2: ' FIELD-2
           DISPLAY 'Field 3: ' FIELD-3
           
           STOP RUN.

TALLYING helps validate that the expected number of fields were parsed, which is important for data validation and error detection.

Real-World Application: CSV File Processing

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
       IDENTIFICATION DIVISION.
       PROGRAM-ID. CSV-PARSER.
       *> Comprehensive CSV parsing example
       
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 CSV-RECORD        PIC X(500).
       01 PARSED-CUSTOMER.
           05 CUSTOMER-ID    PIC 9(8).
           05 CUSTOMER-NAME  PIC X(50).
           05 EMAIL          PIC X(50).
           05 PHONE          PIC X(20).
           05 ADDRESS        PIC X(100).
           05 CITY           PIC X(30).
           05 STATE          PIC X(2).
           05 ZIP-CODE       PIC X(10).
       01 PARSING-CONTROLS.
           05 PARSE-POINTER  PIC 9(4) VALUE 1.
           05 FIELDS-COUNT   PIC 9(2) VALUE 0.
           05 FIELD-LENGTHS.
               10 ID-LENGTH      PIC 9(2).
               10 NAME-LENGTH    PIC 9(2).
               10 EMAIL-LENGTH   PIC 9(2).
       01 VALIDATION-FLAGS.
           05 VALID-RECORD   PIC X(1).
               88 IS-VALID    VALUE 'Y'.
               88 IS-INVALID  VALUE 'N'.
       01 ERROR-MESSAGE     PIC X(100).
       
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           *> Example CSV record
           MOVE '12345678,John Smith,john@email.com,555-1234,' 
                '123 Main St,New York,NY,10001' 
                TO CSV-RECORD
           
           PERFORM PARSE-CSV-RECORD
           PERFORM VALIDATE-PARSED-DATA
           PERFORM DISPLAY-RESULTS
           
           STOP RUN.
       
       PARSE-CSV-RECORD.
           MOVE 1 TO PARSE-POINTER
           MOVE 0 TO FIELDS-COUNT
           MOVE 'N' TO VALID-RECORD
           
           UNSTRING CSV-RECORD
               DELIMITED BY ','
               INTO CUSTOMER-ID COUNT IN ID-LENGTH
                    CUSTOMER-NAME COUNT IN NAME-LENGTH
                    EMAIL COUNT IN EMAIL-LENGTH
                    PHONE
                    ADDRESS
                    CITY
                    STATE
                    ZIP-CODE
               WITH POINTER PARSE-POINTER
               TALLYING IN FIELDS-COUNT
               ON OVERFLOW
                   MOVE 'WARNING: Extra fields in record' 
                        TO ERROR-MESSAGE
               NOT ON OVERFLOW
                   IF FIELDS-COUNT = 8
                       MOVE 'Y' TO VALID-RECORD
                   ELSE
                       MOVE 'ERROR: Missing required fields' 
                            TO ERROR-MESSAGE
                   END-IF
           END-UNSTRING.
       
       VALIDATE-PARSED-DATA.
           IF IS-VALID
               *> Validate individual fields
               IF CUSTOMER-ID = ZERO
                   MOVE 'ERROR: Invalid customer ID' 
                        TO ERROR-MESSAGE
                   MOVE 'N' TO VALID-RECORD
               END-IF
               
               IF NAME-LENGTH < 3
                   MOVE 'ERROR: Name too short' 
                        TO ERROR-MESSAGE
                   MOVE 'N' TO VALID-RECORD
               END-IF
               
               IF EMAIL-LENGTH < 5
                   MOVE 'ERROR: Invalid email' 
                        TO ERROR-MESSAGE
                   MOVE 'N' TO VALID-RECORD
               END-IF
           END-IF.
       
       DISPLAY-RESULTS.
           DISPLAY '=== CSV Parsing Results ==='
           DISPLAY 'Fields Parsed: ' FIELDS-COUNT
           DISPLAY 'Valid Record: ' VALID-RECORD
           
           IF IS-VALID
               DISPLAY 'Customer ID: ' CUSTOMER-ID
               DISPLAY 'Name: ' CUSTOMER-NAME
               DISPLAY 'Email: ' EMAIL
               DISPLAY 'Phone: ' PHONE
               DISPLAY 'Address: ' ADDRESS
               DISPLAY 'City: ' CITY
               DISPLAY 'State: ' STATE
               DISPLAY 'ZIP: ' ZIP-CODE
           ELSE
               DISPLAY 'Error: ' ERROR-MESSAGE
           END-IF.

This comprehensive example demonstrates parsing CSV data with validation, field counting, length tracking, and error handling - all essential for production CSV processing.

Best Practices for UNSTRING

Always Handle Overflow: Use ON OVERFLOW to detect when there's more data than receiving fields. This prevents silent data loss and enables proper error handling.
Validate Parsed Data: Use TALLYING to ensure the expected number of fields were parsed. Use COUNT IN to validate field lengths and detect truncation.
Size Receiving Fields Appropriately: Make receiving fields large enough for expected data but not excessively large. Use COUNT IN to detect truncation.
Initialize Control Fields: Always initialize POINTER, TALLYING, and COUNT fields before UNSTRING to ensure predictable behavior.
Use ALL for Space-Delimited Data: When parsing space-delimited strings, use DELIMITED BY ALL SPACE to handle multiple spaces correctly.
Document Delimiter Choices: Clearly document which delimiters are expected and why, especially when using multiple delimiters.
Handle Edge Cases: Consider empty fields, missing delimiters, trailing delimiters, and other edge cases in your parsing logic.
Test with Various Input Formats: Test UNSTRING with different input formats, including edge cases, to ensure robust parsing.

Test Your Knowledge

1. What does the UNSTRING statement do?

Combines multiple strings into one
Splits one string into multiple fields based on delimiters
Replaces characters in a string
Searches for text in a string

2. What does the POINTER in UNSTRING indicate?

The number of fields parsed
The starting position in the source string
The number of delimiters found
The size of receiving fields

3. What happens when UNSTRING has more delimiters than receiving fields?

The program terminates
The ON OVERFLOW clause is executed if specified
All data is lost
An error message is displayed

4. What does TALLYING IN count?

The number of delimiters found
The number of receiving fields that received data
The total characters processed
The number of errors

5. What does DELIMITER IN store?

The number of delimiters
The actual delimiter character found for each field
The position of delimiters
The size of delimiters

6. How do you specify multiple delimiters in UNSTRING?

Using commas
Using the OR keyword
Using multiple UNSTRING statements
You cannot use multiple delimiters

7. What is the ALL keyword used for in UNSTRING?

To process all fields
To treat consecutive delimiters as a single delimiter
To process all characters
To ignore delimiters

8. What does COUNT IN store?

The number of delimiters
The number of characters copied into each receiving field
The total length of the source string
The number of receiving fields

COBOL Tutorial

COBOL UNSTRING Statement

Understanding UNSTRING Statement Architecture

Comprehensive UNSTRING Statement Capabilities:

Data Parsing and Extraction Patterns

Performance and Efficiency Considerations

Basic UNSTRING Syntax and Structure

Complete UNSTRING Syntax

Syntax Components Explained

source-field

DELIMITED BY

INTO receiving-field

WITH POINTER

TALLYING IN

DELIMITER IN

COUNT IN

ON OVERFLOW / NOT ON OVERFLOW

Simple UNSTRING Examples

Example 1: Basic Comma-Delimited Parsing

Example 2: Using Multiple Delimiters

Example 3: Using POINTER for Segment Processing

Example 4: Handling Overflow

Advanced UNSTRING Techniques

Using COUNT IN for Data Validation

Using ALL to Handle Multiple Spaces

Complex Parsing with TALLYING

Real-World Application: CSV File Processing

Best Practices for UNSTRING

Test Your Knowledge

Related Concepts

STRING Statement

String Processing

INSPECT Statement

Data Movement

Related Pages