String processing in COBOL involves manipulating character strings (text data) to build, parse, format, and transform text. COBOL provides powerful statements for string operations: STRING for concatenating strings, UNSTRING for parsing strings, and INSPECT for replacing, counting, and converting characters. Understanding string processing is essential for working with text data, building formatted output, parsing input records, and performing data transformations in mainframe COBOL applications.
String processing refers to operations that manipulate sequences of characters (strings). In COBOL, strings are typically stored in alphanumeric (PIC X) fields. String processing operations include:
These operations are fundamental for data formatting, input parsing, output generation, data cleaning, and text transformation tasks in COBOL programs.
The STRING statement concatenates (combines) multiple source strings or data items into a single receiving field. It allows you to build formatted output by combining literal strings with variable data.
1234567STRING source-1 DELIMITED BY size-1 source-2 DELIMITED BY size-2 ... INTO receiving-field [WITH POINTER pointer-field] [ON OVERFLOW imperative-statement] END-STRING.
Key components:
123456789101112131415161718WORKING-STORAGE SECTION. 01 FIRST-NAME PIC X(20) VALUE 'JOHN'. 01 LAST-NAME PIC X(20) VALUE 'SMITH'. 01 FULL-NAME PIC X(42). 01 NAME-POINTER PIC 9(4) VALUE 1. PROCEDURE DIVISION. MAIN-PARA. STRING FIRST-NAME DELIMITED BY SIZE ' ' DELIMITED BY SIZE LAST-NAME DELIMITED BY SIZE INTO FULL-NAME ON OVERFLOW DISPLAY 'ERROR: Name too long' END-STRING. DISPLAY 'Full Name: ' FULL-NAME. STOP RUN.
In this example:
To avoid copying trailing spaces, use DELIMITED BY SPACE:
12345STRING FIRST-NAME DELIMITED BY SPACE ' ' DELIMITED BY SIZE LAST-NAME DELIMITED BY SPACE INTO FULL-NAME END-STRING.
DELIMITED BY SPACE copies characters up to (but not including) the first space. This means only the actual name characters are copied, not the trailing spaces. The result would be "JOHN SMITH" without trailing spaces.
The POINTER allows you to control where in the receiving field the concatenation starts:
12345678910111213WORKING-STORAGE SECTION. 01 MESSAGE-TEXT PIC X(80) VALUE 'Hello, '. 01 USER-NAME PIC X(20) VALUE 'JOHN'. 01 MSG-POINTER PIC 9(4) VALUE 8. PROCEDURE DIVISION. STRING USER-NAME DELIMITED BY SPACE INTO MESSAGE-TEXT WITH POINTER MSG-POINTER END-STRING. DISPLAY MESSAGE-TEXT. *> Displays: "Hello, JOHN"
In this example:
The UNSTRING statement parses (splits) a source string into multiple receiving fields based on delimiters. It's useful for breaking apart formatted data like comma-separated values or parsing dates.
1234567891011UNSTRING source-field DELIMITED BY [ALL] delimiter-1 [OR delimiter-2 ...] INTO receiving-field-1 [DELIMITER IN delim-field-1] [COUNT IN count-field-1] receiving-field-2 [DELIMITER IN delim-field-2] [COUNT IN count-field-2] ... [WITH POINTER pointer-field] [TALLYING IN tally-field] [ON OVERFLOW imperative-statement] END-UNSTRING.
Key components:
12345678910111213141516171819202122232425262728WORKING-STORAGE SECTION. 01 INPUT-RECORD PIC X(50) VALUE 'SMITH,JOHN,123 MAIN ST'. 01 LAST-NAME PIC X(20). 01 FIRST-NAME PIC X(20). 01 ADDRESS PIC X(30). 01 DELIM-1 PIC X. 01 DELIM-2 PIC X. 01 CHAR-COUNT-1 PIC 9(4). 01 CHAR-COUNT-2 PIC 9(4). 01 FIELDS-FILLED PIC 9(4). PROCEDURE DIVISION. MAIN-PARA. UNSTRING INPUT-RECORD DELIMITED BY ',' INTO LAST-NAME DELIMITER IN DELIM-1 COUNT IN CHAR-COUNT-1 FIRST-NAME DELIMITER IN DELIM-2 COUNT IN CHAR-COUNT-2 ADDRESS TALLYING IN FIELDS-FILLED END-UNSTRING. DISPLAY 'Last Name: ' LAST-NAME. DISPLAY 'First Name: ' FIRST-NAME. DISPLAY 'Address: ' ADDRESS. DISPLAY 'Fields filled: ' FIELDS-FILLED. STOP RUN.
In this example:
You can specify multiple delimiters using OR:
123456UNSTRING INPUT-LINE DELIMITED BY ',' OR ';' OR SPACE INTO FIELD-1 FIELD-2 FIELD-3 END-UNSTRING.
This splits the source string on commas, semicolons, or spaces. UNSTRING will split on whichever delimiter it encounters first.
The ALL keyword treats multiple consecutive delimiters as one:
123456789101112131415WORKING-STORAGE SECTION. 01 INPUT-DATA PIC X(30) VALUE 'ONE,,,TWO'. 01 FIELD-1 PIC X(10). 01 FIELD-2 PIC X(10). PROCEDURE DIVISION. UNSTRING INPUT-DATA DELIMITED BY ALL ',' INTO FIELD-1 FIELD-2 END-UNSTRING. *> FIELD-1 receives "ONE" *> FIELD-2 receives "TWO" *> The three commas are treated as a single delimiter
Without ALL, each comma would create a separate empty field. With ALL, multiple consecutive commas are treated as a single delimiter.
The INSPECT statement performs character-level operations on strings: replacing characters, counting occurrences, or converting characters using translation tables.
INSPECT REPLACING replaces characters or patterns within a string:
1234INSPECT target-field REPLACING [ALL | LEADING | FIRST] old-char BY new-char [old-char-2 BY new-char-2 ...].
Options:
12345678910111213141516171819202122232425WORKING-STORAGE SECTION. 01 TEXT-FIELD PIC X(30) VALUE 'HELLO WORLD'. 01 PHONE-NUMBER PIC X(14) VALUE ' 555-123-4567'. PROCEDURE DIVISION. MAIN-PARA. *> Replace all spaces with underscores INSPECT TEXT-FIELD REPLACING ALL SPACE BY '_'. DISPLAY TEXT-FIELD. *> Displays: "HELLO_WORLD" *> Replace only leading spaces INSPECT PHONE-NUMBER REPLACING LEADING SPACE BY '0'. DISPLAY PHONE-NUMBER. *> Displays: "000555-123-4567" (leading spaces replaced) *> Replace first occurrence INSPECT TEXT-FIELD REPLACING FIRST '_' BY ' '. DISPLAY TEXT-FIELD. *> Displays: "HELLO WORLD" (first underscore replaced) STOP RUN.
INSPECT TALLYING counts occurrences of characters:
1234INSPECT target-field TALLYING counter-field FOR [ALL | LEADING | CHARACTERS] [char-1 | char-2 ...].
Options:
123456789101112131415161718192021222324252627282930WORKING-STORAGE SECTION. 01 TEXT-LINE PIC X(50) VALUE 'THE QUICK BROWN FOX'. 01 SPACE-COUNT PIC 9(4) VALUE ZERO. 01 CHAR-COUNT PIC 9(4) VALUE ZERO. 01 E-COUNT PIC 9(4) VALUE ZERO. PROCEDURE DIVISION. MAIN-PARA. *> Count all spaces INSPECT TEXT-LINE TALLYING SPACE-COUNT FOR ALL SPACE. DISPLAY 'Spaces: ' SPACE-COUNT. *> Displays: "Spaces: 0003" *> Count all characters INSPECT TEXT-LINE TALLYING CHAR-COUNT FOR CHARACTERS. DISPLAY 'Characters: ' CHAR-COUNT. *> Displays: "Characters: 0019" *> Count all 'E' characters INSPECT TEXT-LINE TALLYING E-COUNT FOR ALL 'E'. DISPLAY 'E characters: ' E-COUNT. *> Displays: "E characters: 0002" STOP RUN.
INSPECT CONVERTING performs character-by-character translation using a conversion table:
12INSPECT target-field CONVERTING from-chars TO to-chars.
Each character in the target field is looked up in the from-chars string. If found at position N, it's replaced with the character at position N in to-chars.
1234567891011121314151617181920212223242526272829WORKING-STORAGE SECTION. 01 UPPER-TEXT PIC X(20) VALUE 'HELLO WORLD'. 01 LOWER-TEXT PIC X(20) VALUE 'hello world'. 01 MIXED-TEXT PIC X(20) VALUE 'HeLLo WoRLd'. PROCEDURE DIVISION. MAIN-PARA. *> Convert to lowercase INSPECT UPPER-TEXT CONVERTING 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' TO 'abcdefghijklmnopqrstuvwxyz'. DISPLAY UPPER-TEXT. *> Displays: "hello world" *> Convert to uppercase INSPECT LOWER-TEXT CONVERTING 'abcdefghijklmnopqrstuvwxyz' TO 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. DISPLAY LOWER-TEXT. *> Displays: "HELLO WORLD" *> Convert mixed case INSPECT MIXED-TEXT CONVERTING 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' TO 'abcdefghijklmnopqrstuvwxyz'. DISPLAY MIXED-TEXT. *> Displays: "hello world" STOP RUN.
String processing statements can be combined to perform complex operations:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647WORKING-STORAGE SECTION. 01 INPUT-RECORD PIC X(50) VALUE 'SMITH,JOHN,555-123-4567'. 01 LAST-NAME PIC X(20). 01 FIRST-NAME PIC X(20). 01 PHONE-RAW PIC X(15). 01 PHONE-FORMATTED PIC X(14) VALUE '( ) - '. 01 AREA-CODE PIC X(3). 01 EXCHANGE PIC X(3). 01 NUMBER PIC X(4). 01 PHONE-PTR PIC 9(4). PROCEDURE DIVISION. MAIN-PARA. *> Step 1: Parse the comma-separated input UNSTRING INPUT-RECORD DELIMITED BY ',' INTO LAST-NAME FIRST-NAME PHONE-RAW END-UNSTRING. *> Step 2: Parse phone number (format: 555-123-4567) UNSTRING PHONE-RAW DELIMITED BY '-' INTO AREA-CODE EXCHANGE NUMBER END-UNSTRING. *> Step 3: Remove dashes from phone number INSPECT PHONE-RAW REPLACING ALL '-' BY SPACE. *> Step 4: Build formatted phone number MOVE 2 TO PHONE-PTR. STRING AREA-CODE DELIMITED BY SIZE ')' DELIMITED BY SIZE EXCHANGE DELIMITED BY SIZE '-' DELIMITED BY SIZE NUMBER DELIMITED BY SIZE INTO PHONE-FORMATTED WITH POINTER PHONE-PTR END-STRING. DISPLAY 'Name: ' FIRST-NAME ' ' LAST-NAME. DISPLAY 'Phone: ' PHONE-FORMATTED. STOP RUN.
This example demonstrates:
Follow these best practices for effective string processing:
123456STRING 'Hello, ' DELIMITED BY SIZE USER-NAME DELIMITED BY SPACE '. Your balance is $' DELIMITED BY SIZE ACCOUNT-BALANCE DELIMITED BY SIZE INTO MESSAGE-LINE END-STRING.
1234567891011UNSTRING INPUT-RECORD DELIMITED BY '|' INTO FIELD-1 FIELD-2 FIELD-3 TALLYING IN FIELD-COUNT END-UNSTRING. IF FIELD-COUNT NOT = 3 DISPLAY 'ERROR: Invalid record format' END-IF.
1234567891011*> Remove leading spaces INSPECT DATA-FIELD REPLACING LEADING SPACE BY ZERO. *> Replace tabs with spaces INSPECT DATA-FIELD REPLACING ALL X'05' BY SPACE. *> Remove all spaces INSPECT DATA-FIELD REPLACING ALL SPACE BY '_'.
Think of string processing like working with building blocks:
So string processing is all about taking text apart, putting it together, changing it, and counting it—just like playing with building blocks, but with words and letters instead!
Complete these exercises to reinforce your understanding of string processing:
Create a program that uses STRING to build a formatted address from separate fields: street number, street name, city, state, and zip code. Include commas and spaces in the appropriate places.
Create a program that uses UNSTRING to parse a date in the format "MM/DD/YYYY" into separate month, day, and year fields. Validate that you received three fields.
Create a program that uses INSPECT to clean input data: remove leading spaces, replace tabs with spaces, and convert the entire string to uppercase.
Create a program that parses a phone number from the format "5551234567" (10 digits) and formats it as "(555) 123-4567" using UNSTRING to extract parts and STRING to rebuild in the new format.
Create a program that parses a comma-separated value (CSV) line with 5 fields, handles empty fields (consecutive commas), and displays each field. Use UNSTRING with ALL to handle multiple consecutive commas.
1. What does the STRING statement do?
2. What is a delimiter used for in UNSTRING?
3. Which INSPECT phrase counts occurrences of characters?
4. What happens when STRING encounters ON OVERFLOW?
5. What does INSPECT CONVERTING do?
6. In UNSTRING, what does the DELIMITED BY phrase specify?