MainframeMaster
MainframeMaster

COBOL Tutorial

Progress0 of 0 lessons

COBOL String Processing

String processing in COBOL involves manipulating character strings (text data) to build, parse, format, and transform text. COBOL provides powerful statements for string operations: STRING for concatenating strings, UNSTRING for parsing strings, and INSPECT for replacing, counting, and converting characters. Understanding string processing is essential for working with text data, building formatted output, parsing input records, and performing data transformations in mainframe COBOL applications.

What is String Processing?

String processing refers to operations that manipulate sequences of characters (strings). In COBOL, strings are typically stored in alphanumeric (PIC X) fields. String processing operations include:

  • Concatenation: Combining multiple strings into one (STRING statement)
  • Parsing: Splitting a string into multiple parts (UNSTRING statement)
  • Replacement: Replacing characters or patterns within a string (INSPECT REPLACING)
  • Counting: Counting occurrences of characters (INSPECT TALLYING)
  • Conversion: Translating characters using conversion tables (INSPECT CONVERTING)

These operations are fundamental for data formatting, input parsing, output generation, data cleaning, and text transformation tasks in COBOL programs.

The STRING Statement

The STRING statement concatenates (combines) multiple source strings or data items into a single receiving field. It allows you to build formatted output by combining literal strings with variable data.

Basic STRING Syntax

cobol
1
2
3
4
5
6
7
STRING source-1 DELIMITED BY size-1 source-2 DELIMITED BY size-2 ... INTO receiving-field [WITH POINTER pointer-field] [ON OVERFLOW imperative-statement] END-STRING.

Key components:

  • source-1, source-2, etc.: The strings or data items to concatenate. These can be literals (like "Hello") or data items (like FIRST-NAME).
  • DELIMITED BY: Specifies how much of each source to copy. You can use DELIMITED BY SIZE (copy entire field) or DELIMITED BY identifier/literal (copy up to a delimiter).
  • INTO receiving-field: The field where the concatenated result is stored.
  • WITH POINTER: Optional. Specifies a numeric field that indicates the starting position in the receiving field (1-based). The pointer is updated as characters are copied.
  • ON OVERFLOW: Optional. Executes if the receiving field runs out of space.

STRING Example: Building a Full Name

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
WORKING-STORAGE SECTION. 01 FIRST-NAME PIC X(20) VALUE 'JOHN'. 01 LAST-NAME PIC X(20) VALUE 'SMITH'. 01 FULL-NAME PIC X(42). 01 NAME-POINTER PIC 9(4) VALUE 1. PROCEDURE DIVISION. MAIN-PARA. STRING FIRST-NAME DELIMITED BY SIZE ' ' DELIMITED BY SIZE LAST-NAME DELIMITED BY SIZE INTO FULL-NAME ON OVERFLOW DISPLAY 'ERROR: Name too long' END-STRING. DISPLAY 'Full Name: ' FULL-NAME. STOP RUN.

In this example:

  • FIRST-NAME contains "JOHN" (20 characters, but only 4 are used)
  • LAST-NAME contains "SMITH" (20 characters, but only 5 are used)
  • DELIMITED BY SIZE means copy the entire field (all 20 characters for each name field)
  • ' ' DELIMITED BY SIZE adds a single space between the names
  • INTO FULL-NAME stores the result: "JOHN SMITH " (note the trailing spaces from the 20-character fields)

STRING with DELIMITED BY SPACE

To avoid copying trailing spaces, use DELIMITED BY SPACE:

cobol
1
2
3
4
5
STRING FIRST-NAME DELIMITED BY SPACE ' ' DELIMITED BY SIZE LAST-NAME DELIMITED BY SPACE INTO FULL-NAME END-STRING.

DELIMITED BY SPACE copies characters up to (but not including) the first space. This means only the actual name characters are copied, not the trailing spaces. The result would be "JOHN SMITH" without trailing spaces.

STRING with POINTER

The POINTER allows you to control where in the receiving field the concatenation starts:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
WORKING-STORAGE SECTION. 01 MESSAGE-TEXT PIC X(80) VALUE 'Hello, '. 01 USER-NAME PIC X(20) VALUE 'JOHN'. 01 MSG-POINTER PIC 9(4) VALUE 8. PROCEDURE DIVISION. STRING USER-NAME DELIMITED BY SPACE INTO MESSAGE-TEXT WITH POINTER MSG-POINTER END-STRING. DISPLAY MESSAGE-TEXT. *> Displays: "Hello, JOHN"

In this example:

  • MESSAGE-TEXT initially contains "Hello, " (7 characters)
  • MSG-POINTER starts at 8, which is the position after "Hello, "
  • STRING copies USER-NAME starting at position 8
  • The result is "Hello, JOHN" with the name appended after the greeting

The UNSTRING Statement

The UNSTRING statement parses (splits) a source string into multiple receiving fields based on delimiters. It's useful for breaking apart formatted data like comma-separated values or parsing dates.

Basic UNSTRING Syntax

cobol
1
2
3
4
5
6
7
8
9
10
11
UNSTRING source-field DELIMITED BY [ALL] delimiter-1 [OR delimiter-2 ...] INTO receiving-field-1 [DELIMITER IN delim-field-1] [COUNT IN count-field-1] receiving-field-2 [DELIMITER IN delim-field-2] [COUNT IN count-field-2] ... [WITH POINTER pointer-field] [TALLYING IN tally-field] [ON OVERFLOW imperative-statement] END-UNSTRING.

Key components:

  • source-field: The string to parse
  • DELIMITED BY: Specifies the delimiter character(s) that mark where to split
  • INTO receiving-field-1, receiving-field-2, etc.: Fields where parsed parts are stored
  • DELIMITER IN: Optional. Stores the delimiter that was found
  • COUNT IN: Optional. Stores the number of characters copied to each receiving field
  • WITH POINTER: Optional. Controls the starting position in the source field
  • TALLYING IN: Optional. Counts how many receiving fields were filled

UNSTRING Example: Parsing a Comma-Separated Name

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
WORKING-STORAGE SECTION. 01 INPUT-RECORD PIC X(50) VALUE 'SMITH,JOHN,123 MAIN ST'. 01 LAST-NAME PIC X(20). 01 FIRST-NAME PIC X(20). 01 ADDRESS PIC X(30). 01 DELIM-1 PIC X. 01 DELIM-2 PIC X. 01 CHAR-COUNT-1 PIC 9(4). 01 CHAR-COUNT-2 PIC 9(4). 01 FIELDS-FILLED PIC 9(4). PROCEDURE DIVISION. MAIN-PARA. UNSTRING INPUT-RECORD DELIMITED BY ',' INTO LAST-NAME DELIMITER IN DELIM-1 COUNT IN CHAR-COUNT-1 FIRST-NAME DELIMITER IN DELIM-2 COUNT IN CHAR-COUNT-2 ADDRESS TALLYING IN FIELDS-FILLED END-UNSTRING. DISPLAY 'Last Name: ' LAST-NAME. DISPLAY 'First Name: ' FIRST-NAME. DISPLAY 'Address: ' ADDRESS. DISPLAY 'Fields filled: ' FIELDS-FILLED. STOP RUN.

In this example:

  • INPUT-RECORD contains "SMITH,JOHN,123 MAIN ST"
  • DELIMITED BY ',' splits on commas
  • LAST-NAME receives "SMITH" (the part before the first comma)
  • FIRST-NAME receives "JOHN" (the part between the first and second comma)
  • ADDRESS receives "123 MAIN ST" (the part after the second comma)
  • DELIM-1 and DELIM-2 both contain "," (the delimiter found)
  • CHAR-COUNT-1 contains 5 (length of "SMITH")
  • CHAR-COUNT-2 contains 4 (length of "JOHN")
  • FIELDS-FILLED contains 3 (three receiving fields were filled)

UNSTRING with Multiple Delimiters

You can specify multiple delimiters using OR:

cobol
1
2
3
4
5
6
UNSTRING INPUT-LINE DELIMITED BY ',' OR ';' OR SPACE INTO FIELD-1 FIELD-2 FIELD-3 END-UNSTRING.

This splits the source string on commas, semicolons, or spaces. UNSTRING will split on whichever delimiter it encounters first.

UNSTRING with ALL

The ALL keyword treats multiple consecutive delimiters as one:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
WORKING-STORAGE SECTION. 01 INPUT-DATA PIC X(30) VALUE 'ONE,,,TWO'. 01 FIELD-1 PIC X(10). 01 FIELD-2 PIC X(10). PROCEDURE DIVISION. UNSTRING INPUT-DATA DELIMITED BY ALL ',' INTO FIELD-1 FIELD-2 END-UNSTRING. *> FIELD-1 receives "ONE" *> FIELD-2 receives "TWO" *> The three commas are treated as a single delimiter

Without ALL, each comma would create a separate empty field. With ALL, multiple consecutive commas are treated as a single delimiter.

The INSPECT Statement

The INSPECT statement performs character-level operations on strings: replacing characters, counting occurrences, or converting characters using translation tables.

INSPECT REPLACING

INSPECT REPLACING replaces characters or patterns within a string:

cobol
1
2
3
4
INSPECT target-field REPLACING [ALL | LEADING | FIRST] old-char BY new-char [old-char-2 BY new-char-2 ...].

Options:

  • ALL: Replace all occurrences
  • LEADING: Replace only leading (beginning) occurrences
  • FIRST: Replace only the first occurrence
  • If none specified: Replace all occurrences (same as ALL)

INSPECT REPLACING Example

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WORKING-STORAGE SECTION. 01 TEXT-FIELD PIC X(30) VALUE 'HELLO WORLD'. 01 PHONE-NUMBER PIC X(14) VALUE ' 555-123-4567'. PROCEDURE DIVISION. MAIN-PARA. *> Replace all spaces with underscores INSPECT TEXT-FIELD REPLACING ALL SPACE BY '_'. DISPLAY TEXT-FIELD. *> Displays: "HELLO_WORLD" *> Replace only leading spaces INSPECT PHONE-NUMBER REPLACING LEADING SPACE BY '0'. DISPLAY PHONE-NUMBER. *> Displays: "000555-123-4567" (leading spaces replaced) *> Replace first occurrence INSPECT TEXT-FIELD REPLACING FIRST '_' BY ' '. DISPLAY TEXT-FIELD. *> Displays: "HELLO WORLD" (first underscore replaced) STOP RUN.

INSPECT TALLYING

INSPECT TALLYING counts occurrences of characters:

cobol
1
2
3
4
INSPECT target-field TALLYING counter-field FOR [ALL | LEADING | CHARACTERS] [char-1 | char-2 ...].

Options:

  • ALL char: Count all occurrences of the character
  • LEADING char: Count only leading occurrences
  • CHARACTERS: Count all characters (total length)

INSPECT TALLYING Example

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
WORKING-STORAGE SECTION. 01 TEXT-LINE PIC X(50) VALUE 'THE QUICK BROWN FOX'. 01 SPACE-COUNT PIC 9(4) VALUE ZERO. 01 CHAR-COUNT PIC 9(4) VALUE ZERO. 01 E-COUNT PIC 9(4) VALUE ZERO. PROCEDURE DIVISION. MAIN-PARA. *> Count all spaces INSPECT TEXT-LINE TALLYING SPACE-COUNT FOR ALL SPACE. DISPLAY 'Spaces: ' SPACE-COUNT. *> Displays: "Spaces: 0003" *> Count all characters INSPECT TEXT-LINE TALLYING CHAR-COUNT FOR CHARACTERS. DISPLAY 'Characters: ' CHAR-COUNT. *> Displays: "Characters: 0019" *> Count all 'E' characters INSPECT TEXT-LINE TALLYING E-COUNT FOR ALL 'E'. DISPLAY 'E characters: ' E-COUNT. *> Displays: "E characters: 0002" STOP RUN.

INSPECT CONVERTING

INSPECT CONVERTING performs character-by-character translation using a conversion table:

cobol
1
2
INSPECT target-field CONVERTING from-chars TO to-chars.

Each character in the target field is looked up in the from-chars string. If found at position N, it's replaced with the character at position N in to-chars.

INSPECT CONVERTING Example: Case Conversion

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
WORKING-STORAGE SECTION. 01 UPPER-TEXT PIC X(20) VALUE 'HELLO WORLD'. 01 LOWER-TEXT PIC X(20) VALUE 'hello world'. 01 MIXED-TEXT PIC X(20) VALUE 'HeLLo WoRLd'. PROCEDURE DIVISION. MAIN-PARA. *> Convert to lowercase INSPECT UPPER-TEXT CONVERTING 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' TO 'abcdefghijklmnopqrstuvwxyz'. DISPLAY UPPER-TEXT. *> Displays: "hello world" *> Convert to uppercase INSPECT LOWER-TEXT CONVERTING 'abcdefghijklmnopqrstuvwxyz' TO 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. DISPLAY LOWER-TEXT. *> Displays: "HELLO WORLD" *> Convert mixed case INSPECT MIXED-TEXT CONVERTING 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' TO 'abcdefghijklmnopqrstuvwxyz'. DISPLAY MIXED-TEXT. *> Displays: "hello world" STOP RUN.

Combining String Operations

String processing statements can be combined to perform complex operations:

Example: Parsing and Reformatting Data

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
WORKING-STORAGE SECTION. 01 INPUT-RECORD PIC X(50) VALUE 'SMITH,JOHN,555-123-4567'. 01 LAST-NAME PIC X(20). 01 FIRST-NAME PIC X(20). 01 PHONE-RAW PIC X(15). 01 PHONE-FORMATTED PIC X(14) VALUE '( ) - '. 01 AREA-CODE PIC X(3). 01 EXCHANGE PIC X(3). 01 NUMBER PIC X(4). 01 PHONE-PTR PIC 9(4). PROCEDURE DIVISION. MAIN-PARA. *> Step 1: Parse the comma-separated input UNSTRING INPUT-RECORD DELIMITED BY ',' INTO LAST-NAME FIRST-NAME PHONE-RAW END-UNSTRING. *> Step 2: Parse phone number (format: 555-123-4567) UNSTRING PHONE-RAW DELIMITED BY '-' INTO AREA-CODE EXCHANGE NUMBER END-UNSTRING. *> Step 3: Remove dashes from phone number INSPECT PHONE-RAW REPLACING ALL '-' BY SPACE. *> Step 4: Build formatted phone number MOVE 2 TO PHONE-PTR. STRING AREA-CODE DELIMITED BY SIZE ')' DELIMITED BY SIZE EXCHANGE DELIMITED BY SIZE '-' DELIMITED BY SIZE NUMBER DELIMITED BY SIZE INTO PHONE-FORMATTED WITH POINTER PHONE-PTR END-STRING. DISPLAY 'Name: ' FIRST-NAME ' ' LAST-NAME. DISPLAY 'Phone: ' PHONE-FORMATTED. STOP RUN.

This example demonstrates:

  • UNSTRING to parse comma-separated input into fields
  • UNSTRING again to parse the phone number format
  • INSPECT to clean up the phone number
  • STRING to build a formatted phone number with parentheses and dashes

Best Practices for String Processing

Follow these best practices for effective string processing:

  • Size Receiving Fields Appropriately: Ensure receiving fields in STRING and UNSTRING are large enough to hold the expected data. Use ON OVERFLOW to handle cases where data might exceed field sizes.
  • Use DELIMITED BY SPACE for Variable-Length Data: When working with fields that may have trailing spaces, use DELIMITED BY SPACE instead of DELIMITED BY SIZE to avoid copying unwanted spaces.
  • Handle Overflow Conditions: Always consider ON OVERFLOW for STRING operations to handle cases where the receiving field is too small. Check pointer values to determine how much data was actually copied.
  • Validate Delimiters: When using UNSTRING, consider what happens if expected delimiters are missing. Use COUNT IN to verify expected field lengths.
  • Initialize Pointer Fields: When using WITH POINTER, initialize the pointer field to 1 (or the desired starting position) before the STRING/UNSTRING statement.
  • Use TALLYING for Validation: Use TALLYING IN with UNSTRING to verify that the expected number of fields were parsed.
  • Test Edge Cases: Test with empty strings, missing delimiters, multiple consecutive delimiters, and data that exceeds field sizes.
  • Consider Performance: For large strings or frequent operations, consider the performance implications. INSPECT with ALL can be slower on large fields.
  • Document Complex Operations: Complex string processing logic can be hard to understand. Add comments explaining the purpose and expected format of data.
  • Use Meaningful Field Names: Name receiving fields, delimiters, and counters descriptively to make the code self-documenting.

Common String Processing Patterns

Pattern 1: Building Formatted Messages

cobol
1
2
3
4
5
6
STRING 'Hello, ' DELIMITED BY SIZE USER-NAME DELIMITED BY SPACE '. Your balance is $' DELIMITED BY SIZE ACCOUNT-BALANCE DELIMITED BY SIZE INTO MESSAGE-LINE END-STRING.

Pattern 2: Parsing Fixed-Format Records

cobol
1
2
3
4
5
6
7
8
9
10
11
UNSTRING INPUT-RECORD DELIMITED BY '|' INTO FIELD-1 FIELD-2 FIELD-3 TALLYING IN FIELD-COUNT END-UNSTRING. IF FIELD-COUNT NOT = 3 DISPLAY 'ERROR: Invalid record format' END-IF.

Pattern 3: Data Cleaning

cobol
1
2
3
4
5
6
7
8
9
10
11
*> Remove leading spaces INSPECT DATA-FIELD REPLACING LEADING SPACE BY ZERO. *> Replace tabs with spaces INSPECT DATA-FIELD REPLACING ALL X'05' BY SPACE. *> Remove all spaces INSPECT DATA-FIELD REPLACING ALL SPACE BY '_'.

Explain Like I'm 5: String Processing

Think of string processing like working with building blocks:

  • STRING is like gluing blocks together. You take separate blocks (like a first name block, a space block, and a last name block) and glue them together to make one big block (a full name).
  • UNSTRING is like taking apart a block tower. You have one big block with marks on it (delimiters like commas), and you break it apart at those marks to get separate smaller blocks (individual fields).
  • INSPECT is like painting or counting blocks. You can paint all the red blocks blue (REPLACING), count how many red blocks you have (TALLYING), or change blocks based on a pattern (CONVERTING).

So string processing is all about taking text apart, putting it together, changing it, and counting it—just like playing with building blocks, but with words and letters instead!

Practice Exercises

Complete these exercises to reinforce your understanding of string processing:

Exercise 1: Building a Formatted Address

Create a program that uses STRING to build a formatted address from separate fields: street number, street name, city, state, and zip code. Include commas and spaces in the appropriate places.

Exercise 2: Parsing a Date String

Create a program that uses UNSTRING to parse a date in the format "MM/DD/YYYY" into separate month, day, and year fields. Validate that you received three fields.

Exercise 3: Cleaning Input Data

Create a program that uses INSPECT to clean input data: remove leading spaces, replace tabs with spaces, and convert the entire string to uppercase.

Exercise 4: Phone Number Formatting

Create a program that parses a phone number from the format "5551234567" (10 digits) and formats it as "(555) 123-4567" using UNSTRING to extract parts and STRING to rebuild in the new format.

Exercise 5: CSV Parser

Create a program that parses a comma-separated value (CSV) line with 5 fields, handles empty fields (consecutive commas), and displays each field. Use UNSTRING with ALL to handle multiple consecutive commas.

Test Your Knowledge

1. What does the STRING statement do?

  • Splits a string into multiple parts
  • Concatenates multiple strings into one receiving field
  • Replaces characters in a string
  • Counts characters in a string

2. What is a delimiter used for in UNSTRING?

  • To mark the end of the source string
  • To mark where to split the source string into parts
  • To mark the beginning of the source string
  • To mark invalid characters

3. Which INSPECT phrase counts occurrences of characters?

  • REPLACING
  • TALLYING
  • CONVERTING
  • COUNTING

4. What happens when STRING encounters ON OVERFLOW?

  • The program terminates
  • The statement stops and executes the ON OVERFLOW phrase
  • The receiving field is cleared
  • An error message is displayed

5. What does INSPECT CONVERTING do?

  • Replaces specific patterns with other patterns
  • Performs character-by-character translation using a conversion table
  • Counts character occurrences
  • Splits strings into parts

6. In UNSTRING, what does the DELIMITED BY phrase specify?

  • The maximum length of receiving fields
  • The characters that mark where to split the source string
  • The minimum length of receiving fields
  • The characters to ignore

Related Concepts

Related Pages