What is text processing in COBOL?

Text processing in COBOL involves parsing, normalizing, validating, and transforming text data. It includes operations like splitting delimited strings (CSV, pipe-delimited), normalizing case and whitespace, validating data formats, cleaning input data, and converting between different text formats. Text processing is essential for handling user input, parsing file records, and preparing data for storage or display.

How do you parse delimited data in COBOL?

You parse delimited data in COBOL using the UNSTRING statement. UNSTRING splits a source string into multiple receiving fields based on delimiter characters. For example, to parse CSV data: UNSTRING CSV-LINE DELIMITED BY ',' INTO FIELD-1 FIELD-2 FIELD-3. You can specify multiple delimiters using OR, and use ALL to treat consecutive delimiters as one. The TALLYING IN clause counts how many fields were filled, and COUNT IN stores the length of each parsed field.

What is text normalization in COBOL?

Text normalization in COBOL is the process of converting text data to a consistent format. This includes converting case (uppercase/lowercase), trimming leading and trailing spaces, removing or standardizing whitespace, and handling special characters. Normalization ensures data consistency for comparisons, storage, and processing. Common normalization operations include using FUNCTION UPPER-CASE or FUNCTION LOWER-CASE, trimming spaces by moving to smaller fields, and using INSPECT to clean whitespace.

How do you validate text data in COBOL?

You validate text data in COBOL by checking format, length, content, and structure. Common validation techniques include: checking field length using FUNCTION LENGTH, verifying character types (numeric, alphabetic, alphanumeric), checking for required delimiters or patterns, validating date/time formats, ensuring required fields are not empty, and checking data ranges. You can use INSPECT TALLYING to count specific characters, UNSTRING with TALLYING to verify field counts, and conditional statements to check data validity.

How do you process CSV data in COBOL?

To process CSV data in COBOL, use UNSTRING to split comma-separated values. Handle empty fields (consecutive commas) using DELIMITED BY ALL ','. Use TALLYING IN to count fields and validate the expected number. Handle quoted fields by checking for quotes and processing accordingly. Example: UNSTRING CSV-LINE DELIMITED BY ALL ',' INTO FIELD-1 FIELD-2 FIELD-3 TALLYING IN FIELD-COUNT. Always validate the field count matches expectations and handle edge cases like missing fields or extra commas.

How do you trim spaces from text in COBOL?

To trim trailing spaces in COBOL, move the field to a smaller PIC field which automatically truncates trailing spaces. For example: MOVE SOURCE-FIELD TO TRIMMED-FIELD where TRIMMED-FIELD is smaller. To trim leading spaces, use INSPECT REPLACING LEADING SPACE BY another character, or use UNSTRING with appropriate delimiters. You can also use FUNCTION TRIM if available in your COBOL implementation. For both leading and trailing spaces, combine techniques: first trim trailing by moving to smaller field, then trim leading using INSPECT.

How do you convert text case in COBOL?

You convert text case in COBOL using intrinsic functions: FUNCTION UPPER-CASE(identifier) converts to uppercase, and FUNCTION LOWER-CASE(identifier) converts to lowercase. These functions handle the entire string and preserve non-alphabetic characters. Alternatively, you can use INSPECT CONVERTING with conversion tables, but intrinsic functions are simpler and more efficient. Example: MOVE FUNCTION UPPER-CASE(USER-INPUT) TO UPPER-FIELD. Always move the result to a new field or the same field to apply the conversion.

What is the difference between text processing and string processing in COBOL?

Text processing focuses on parsing, normalizing, and validating formatted text data like CSV, delimited records, and user input. It emphasizes data extraction, cleaning, and validation. String processing focuses on manipulating strings using STRING (concatenation), UNSTRING (parsing), and INSPECT (replacement, counting, conversion). While there's overlap, text processing is more about working with structured text data formats, while string processing is about general string manipulation operations.

MainframeMaster

COBOL Tutorial

Progress0 of 0 lessons

COBOL Text Processing

Text processing in COBOL involves parsing, normalizing, validating, and transforming text data to prepare it for use in your programs. Unlike basic string manipulation, text processing focuses on working with structured text formats like CSV files, delimited records, user input, and formatted data. Understanding text processing is essential for handling real-world data that comes in various formats, cleaning input data, validating user entries, and converting between different text representations in mainframe COBOL applications.

What is Text Processing?

Text processing encompasses operations that work with formatted or structured text data. Key text processing operations include:

Parsing: Splitting delimited strings (CSV, pipe-delimited, tab-delimited) into individual fields
Normalization: Converting text to consistent formats (case, whitespace, encoding)
Validation: Checking data format, length, content, and structure
Cleaning: Removing unwanted characters, trimming spaces, standardizing formats
Transformation: Converting between different text formats and representations
Tokenization: Breaking text into tokens or words based on delimiters

These operations are fundamental for processing user input, parsing file records, handling data imports, validating forms, and preparing data for storage or display in business applications.

Parsing Delimited Data

One of the most common text processing tasks is parsing delimited data, where fields are separated by specific characters like commas, pipes, or tabs. This is essential for processing CSV files, log files, and formatted input records.

Parsing Comma-Separated Values (CSV)

CSV (Comma-Separated Values) is a common format where fields are separated by commas. Here's how to parse CSV data:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
       WORKING-STORAGE SECTION.
       01  CSV-LINE          PIC X(200).
       01  CUSTOMER-ID       PIC X(10).
       01  CUSTOMER-NAME     PIC X(30).
       01  CITY              PIC X(20).
       01  STATE             PIC X(2).
       01  ZIP-CODE          PIC X(10).
       01  FIELD-COUNT       PIC 9(4) VALUE ZERO.
       01  CSV-POINTER       PIC 9(4) VALUE 1.
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           MOVE '12345,JOHN SMITH,AUSTIN,TX,78701' TO CSV-LINE
           
           UNSTRING CSV-LINE
           DELIMITED BY ','
           INTO CUSTOMER-ID
                CUSTOMER-NAME
                CITY
                STATE
                ZIP-CODE
           WITH POINTER CSV-POINTER
           TALLYING IN FIELD-COUNT
           END-UNSTRING
           
           DISPLAY 'Customer ID: ' CUSTOMER-ID
           DISPLAY 'Name: ' CUSTOMER-NAME
           DISPLAY 'City: ' CITY
           DISPLAY 'State: ' STATE
           DISPLAY 'Zip: ' ZIP-CODE
           DISPLAY 'Fields parsed: ' FIELD-COUNT
           
           STOP RUN.

In this example:

CSV-LINE contains the comma-separated data: "12345,JOHN SMITH,AUSTIN,TX,78701"
DELIMITED BY ',' tells UNSTRING to split on commas
INTO specifies the receiving fields where parsed data goes
TALLYING IN FIELD-COUNT counts how many fields were successfully filled (should be 5)
WITH POINTER CSV-POINTER tracks the current position in the source string

After execution, CUSTOMER-ID contains "12345", CUSTOMER-NAME contains "JOHN SMITH", CITY contains "AUSTIN", STATE contains "TX", ZIP-CODE contains "78701", and FIELD-COUNT contains 5.

Handling Empty Fields in CSV

CSV data often contains empty fields (consecutive commas). Use DELIMITED BY ALL to handle this:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
       WORKING-STORAGE SECTION.
       01  CSV-LINE          PIC X(200) VALUE '12345,,AUSTIN,TX,78701'.
       01  CUSTOMER-ID       PIC X(10).
       01  MIDDLE-NAME       PIC X(20).
       01  CITY              PIC X(20).
       01  STATE             PIC X(2).
       01  ZIP-CODE          PIC X(10).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           UNSTRING CSV-LINE
           DELIMITED BY ALL ','
           INTO CUSTOMER-ID
                MIDDLE-NAME
                CITY
                STATE
                ZIP-CODE
           END-UNSTRING
           
           *> MIDDLE-NAME will be empty (spaces) because of the empty field
           DISPLAY 'ID: ' CUSTOMER-ID
           DISPLAY 'Middle: [' MIDDLE-NAME ']'
           DISPLAY 'City: ' CITY
           
           STOP RUN.

DELIMITED BY ALL ',' treats multiple consecutive commas as a single delimiter. Without ALL, each comma would create a separate field, but with ALL, consecutive commas represent a single empty field. This is crucial for correctly parsing CSV data with missing values.

Parsing Pipe-Delimited Data

Pipe-delimited format uses the pipe character (|) as a separator. It's common in data exchange formats:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
       WORKING-STORAGE SECTION.
       01  PIPE-LINE        PIC X(200) VALUE '12345|JOHN SMITH|AUSTIN|TX|78701'.
       01  CUSTOMER-ID      PIC X(10).
       01  CUSTOMER-NAME    PIC X(30).
       01  CITY             PIC X(20).
       01  STATE            PIC X(2).
       01  ZIP-CODE         PIC X(10).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           UNSTRING PIPE-LINE
           DELIMITED BY '|'
           INTO CUSTOMER-ID
                CUSTOMER-NAME
                CITY
                STATE
                ZIP-CODE
           END-UNSTRING
           
           DISPLAY 'Parsed pipe-delimited data'
           STOP RUN.

Parsing with Multiple Delimiters

You can parse data that uses multiple possible delimiters:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
       WORKING-STORAGE SECTION.
       01  INPUT-LINE       PIC X(200) VALUE 'ID=12345|NAME=JOHN|CITY=AUSTIN'.
       01  ID-FIELD         PIC X(10).
       01  NAME-FIELD       PIC X(30).
       01  CITY-FIELD       PIC X(20).
       01  FILLER-FIELD     PIC X(20).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Parse alternating label=value pairs
           UNSTRING INPUT-LINE
           DELIMITED BY '=' OR '|'
           INTO FILLER-FIELD  *> Skip "ID"
                ID-FIELD      *> Get "12345"
                FILLER-FIELD  *> Skip "NAME"
                NAME-FIELD    *> Get "JOHN"
                FILLER-FIELD  *> Skip "CITY"
                CITY-FIELD    *> Get "AUSTIN"
           END-UNSTRING
           
           DISPLAY 'ID: ' ID-FIELD
           DISPLAY 'Name: ' NAME-FIELD
           DISPLAY 'City: ' CITY-FIELD
           
           STOP RUN.

This pattern alternates between labels and values. By using FILLER (or a throwaway field) for labels and actual fields for values, you can extract just the data you need from formatted input like "KEY=VALUE|KEY=VALUE".

Text Normalization

Text normalization converts text data to a consistent format, making it easier to compare, search, and process. Common normalization operations include case conversion, whitespace handling, and character standardization.

Case Normalization

Converting text to consistent case (uppercase or lowercase) ensures comparisons work correctly:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
       WORKING-STORAGE SECTION.
       01  USER-INPUT       PIC X(30) VALUE 'John Smith'.
       01  UPPER-NAME       PIC X(30).
       01  LOWER-NAME       PIC X(30).
       01  SEARCH-NAME      PIC X(30) VALUE 'JOHN SMITH'.
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Convert to uppercase
           MOVE FUNCTION UPPER-CASE(USER-INPUT) TO UPPER-NAME
           
           *> Convert to lowercase
           MOVE FUNCTION LOWER-CASE(USER-INPUT) TO LOWER-NAME
           
           DISPLAY 'Original: ' USER-INPUT
           DISPLAY 'Uppercase: ' UPPER-NAME
           DISPLAY 'Lowercase: ' LOWER-NAME
           
           *> Now comparisons work correctly
           IF UPPER-NAME = SEARCH-NAME
               DISPLAY 'Match found!'
           END-IF
           
           STOP RUN.

FUNCTION UPPER-CASE converts all alphabetic characters to uppercase while preserving numbers, spaces, and special characters. FUNCTION LOWER-CASE does the opposite. These functions are essential for case-insensitive comparisons and data standardization.

Trimming Whitespace

Removing leading and trailing spaces normalizes text fields:

Trimming Trailing Spaces

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
       WORKING-STORAGE SECTION.
       01  SOURCE-FIELD     PIC X(30) VALUE 'JOHN SMITH          '.
       01  TRIMMED-FIELD    PIC X(15).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Moving to smaller field automatically trims trailing spaces
           MOVE SOURCE-FIELD TO TRIMMED-FIELD
           
           DISPLAY 'Source: [' SOURCE-FIELD ']'
           DISPLAY 'Trimmed: [' TRIMMED-FIELD ']'
           
           STOP RUN.

When you move a field to a smaller PIC field, COBOL automatically truncates trailing spaces. TRIMMED-FIELD (PIC X(15)) will contain "JOHN SMITH" without the trailing spaces from the 30-character source field.

Trimming Leading Spaces

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
       WORKING-STORAGE SECTION.
       01  SOURCE-FIELD     PIC X(30) VALUE '     JOHN SMITH'.
       01  TRIMMED-FIELD    PIC X(30).
       01  WORK-FIELD       PIC X(30).
       01  SPACE-COUNT      PIC 9(4) VALUE ZERO.
       01  START-POS        PIC 9(4).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Count leading spaces
           INSPECT SOURCE-FIELD
               TALLYING SPACE-COUNT
               FOR LEADING SPACE
           
           *> Calculate starting position (1-based)
           COMPUTE START-POS = SPACE-COUNT + 1
           
           *> Extract non-space portion using UNSTRING
           UNSTRING SOURCE-FIELD
           DELIMITED BY ALL SPACE
           INTO TRIMMED-FIELD
           WITH POINTER START-POS
           END-UNSTRING
           
           DISPLAY 'Original: [' SOURCE-FIELD ']'
           DISPLAY 'Trimmed: [' TRIMMED-FIELD ']'
           
           STOP RUN.

This approach uses INSPECT to count leading spaces, then UNSTRING starting after the leading spaces to extract the actual content. Alternatively, you can use INSPECT REPLACING to replace leading spaces with another character, but the UNSTRING method is cleaner for extraction.

Standardizing Whitespace

Normalize multiple spaces to single spaces:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
       WORKING-STORAGE SECTION.
       01  TEXT-FIELD       PIC X(50) VALUE 'JOHN    SMITH     DOE'.
       01  NORMALIZED-FIELD PIC X(50).
       01  WORK-FIELD       PIC X(50).
       01  CHAR-PTR         PIC 9(4) VALUE 1.
       01  PREV-CHAR        PIC X.
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Replace multiple spaces with single space
           MOVE TEXT-FIELD TO WORK-FIELD
           
           *> First pass: replace all double spaces with single space
           PERFORM UNTIL WORK-FIELD NOT CONTAINS '  '
               INSPECT WORK-FIELD
                   REPLACING ALL '  ' BY ' '
           END-PERFORM
           
           MOVE WORK-FIELD TO NORMALIZED-FIELD
           
           DISPLAY 'Original: [' TEXT-FIELD ']'
           DISPLAY 'Normalized: [' NORMALIZED-FIELD ']'
           
           STOP RUN.

This repeatedly replaces double spaces with single spaces until no double spaces remain. The result is text with normalized spacing: "JOHN SMITH DOE" instead of "JOHN SMITH DOE".

Text Validation

Validating text data ensures it meets expected formats, lengths, and content requirements before processing or storage.

Validating Field Counts

When parsing delimited data, verify you received the expected number of fields:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
       WORKING-STORAGE SECTION.
       01  CSV-LINE         PIC X(200).
       01  FIELD-1          PIC X(20).
       01  FIELD-2          PIC X(20).
       01  FIELD-3          PIC X(20).
       01  FIELD-COUNT      PIC 9(4) VALUE ZERO.
       01  EXPECTED-COUNT   PIC 9(4) VALUE 3.
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           MOVE 'VALUE1,VALUE2,VALUE3' TO CSV-LINE
           
           UNSTRING CSV-LINE
           DELIMITED BY ','
           INTO FIELD-1
                FIELD-2
                FIELD-3
           TALLYING IN FIELD-COUNT
           END-UNSTRING
           
           IF FIELD-COUNT NOT = EXPECTED-COUNT
               DISPLAY 'ERROR: Expected ' EXPECTED-COUNT 
                       ' fields, got ' FIELD-COUNT
               STOP RUN
           END-IF
           
           DISPLAY 'Validation passed: ' FIELD-COUNT ' fields'
           STOP RUN.

TALLYING IN counts how many receiving fields were filled. If the source has fewer delimiters than expected, some fields remain empty. Always validate the field count matches expectations to catch malformed input.

Validating Field Lengths

Check that parsed fields don't exceed maximum lengths:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
       WORKING-STORAGE SECTION.
       01  INPUT-FIELD      PIC X(50).
       01  MAX-LENGTH       PIC 9(4) VALUE 20.
       01  ACTUAL-LENGTH    PIC 9(4).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           MOVE 'THIS IS A VERY LONG FIELD THAT EXCEEDS LIMIT' TO INPUT-FIELD
           
           *> Get actual length (trimmed)
           COMPUTE ACTUAL-LENGTH = FUNCTION LENGTH(
               FUNCTION TRIM(INPUT-FIELD)
           )
           
           IF ACTUAL-LENGTH > MAX-LENGTH
               DISPLAY 'ERROR: Field length ' ACTUAL-LENGTH 
                       ' exceeds maximum ' MAX-LENGTH
           ELSE
               DISPLAY 'Field length valid: ' ACTUAL-LENGTH
           END-IF
           
           STOP RUN.

FUNCTION LENGTH returns the length of a string. Combined with FUNCTION TRIM (if available), you can validate that trimmed field lengths are within acceptable ranges. This prevents data truncation and ensures data integrity.

Validating Data Formats

Verify that data matches expected formats (numeric, alphabetic, date format, etc.):

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
       WORKING-STORAGE SECTION.
       01  ZIP-CODE        PIC X(10) VALUE '78701'.
       01  ZIP-NUMERIC     PIC 9(5).
       01  IS-VALID        PIC X VALUE 'N'.
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Validate ZIP code is numeric
           IF ZIP-CODE IS NUMERIC
               MOVE ZIP-CODE TO ZIP-NUMERIC
               MOVE 'Y' TO IS-VALID
               DISPLAY 'ZIP code is valid: ' ZIP-CODE
           ELSE
               DISPLAY 'ERROR: ZIP code must be numeric: ' ZIP-CODE
           END-IF
           
           STOP RUN.

COBOL provides class tests like IS NUMERIC, IS ALPHABETIC, and IS ALPHANUMERIC to validate data types. Use these to ensure data matches expected formats before processing.

Validating Required Fields

Ensure required fields are not empty:

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
       WORKING-STORAGE SECTION.
       01  CUSTOMER-NAME   PIC X(30) VALUE SPACES.
       01  IS-EMPTY        PIC X VALUE 'N'.
       01  SPACE-COUNT     PIC 9(4) VALUE ZERO.
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Check if field is empty (all spaces)
           INSPECT CUSTOMER-NAME
               TALLYING SPACE-COUNT
               FOR CHARACTERS
           
           IF SPACE-COUNT = FUNCTION LENGTH(CUSTOMER-NAME)
               MOVE 'Y' TO IS-EMPTY
               DISPLAY 'ERROR: Customer name is required'
           ELSE
               DISPLAY 'Customer name is valid: ' CUSTOMER-NAME
           END-IF
           
           STOP RUN.

This counts all characters in the field. If the count equals the field length, the field contains only spaces (is empty). This validation ensures required fields have actual data.

Data Cleaning

Data cleaning removes unwanted characters, fixes formatting issues, and prepares data for processing:

Removing Special Characters

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
       WORKING-STORAGE SECTION.
       01  PHONE-NUMBER    PIC X(20) VALUE '(555) 123-4567'.
       01  CLEAN-PHONE     PIC X(20).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           MOVE PHONE-NUMBER TO CLEAN-PHONE
           
           *> Remove parentheses
           INSPECT CLEAN-PHONE
               REPLACING ALL '(' BY SPACE
           
           INSPECT CLEAN-PHONE
               REPLACING ALL ')' BY SPACE
           
           *> Remove spaces
           INSPECT CLEAN-PHONE
               REPLACING ALL SPACE BY ZERO
           
           *> Remove dashes
           INSPECT CLEAN-PHONE
               REPLACING ALL '-' BY ZERO
           
           DISPLAY 'Original: ' PHONE-NUMBER
           DISPLAY 'Cleaned: ' CLEAN-PHONE
           
           STOP RUN.

This removes formatting characters (parentheses, spaces, dashes) from a phone number, leaving only digits. Multiple INSPECT statements handle different character replacements.

Removing Control Characters

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
       WORKING-STORAGE SECTION.
       01  TEXT-LINE       PIC X(100) VALUE 'TEXT WITH' X'09' 'TABS'.
       01  CLEAN-LINE      PIC X(100).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           MOVE TEXT-LINE TO CLEAN-LINE
           
           *> Replace tab character (X'09') with space
           INSPECT CLEAN-LINE
               REPLACING ALL X'09' BY SPACE
           
           *> Replace other control characters
           INSPECT CLEAN-LINE
               REPLACING ALL X'0D' BY SPACE  *> Carriage return
           
           INSPECT CLEAN-LINE
               REPLACING ALL X'0A' BY SPACE  *> Line feed
           
           DISPLAY 'Cleaned text: ' CLEAN-LINE
           STOP RUN.

Control characters like tabs (X'09'), carriage returns (X'0D'), and line feeds (X'0A') can cause issues. Replace them with spaces or remove them to clean input data.

Working with Formatted Text

Many text processing tasks involve converting between different text formats:

Converting Date Formats

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
       WORKING-STORAGE SECTION.
       01  DATE-INPUT      PIC X(10) VALUE '12/25/2023'.
       01  MONTH           PIC X(2).
       01  DAY             PIC X(2).
       01  YEAR            PIC X(4).
       01  DATE-OUTPUT     PIC X(10) VALUE SPACES.
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Parse MM/DD/YYYY format
           UNSTRING DATE-INPUT
           DELIMITED BY '/'
           INTO MONTH
                DAY
                YEAR
           END-UNSTRING
           
           *> Convert to YYYY-MM-DD format
           STRING YEAR DELIMITED BY SIZE
                  '-' DELIMITED BY SIZE
                  MONTH DELIMITED BY SIZE
                  '-' DELIMITED BY SIZE
                  DAY DELIMITED BY SIZE
           INTO DATE-OUTPUT
           END-STRING
           
           DISPLAY 'Input: ' DATE-INPUT
           DISPLAY 'Output: ' DATE-OUTPUT
           
           STOP RUN.

This parses a date from MM/DD/YYYY format and converts it to YYYY-MM-DD format. UNSTRING extracts the components, then STRING rebuilds them in the new format.

Building Formatted Output

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
       WORKING-STORAGE SECTION.
       01  FIRST-NAME      PIC X(20) VALUE 'JOHN'.
       01  LAST-NAME       PIC X(20) VALUE 'SMITH'.
       01  FULL-NAME       PIC X(42).
       01  FORMATTED-NAME  PIC X(50).
       
       PROCEDURE DIVISION.
       MAIN-PARA.
           *> Build full name
           STRING FIRST-NAME DELIMITED BY SPACE
                  ' ' DELIMITED BY SIZE
                  LAST-NAME DELIMITED BY SPACE
           INTO FULL-NAME
           END-STRING
           
           *> Format as "Last, First"
           STRING LAST-NAME DELIMITED BY SPACE
                  ', ' DELIMITED BY SIZE
                  FIRST-NAME DELIMITED BY SPACE
           INTO FORMATTED-NAME
           END-STRING
           
           DISPLAY 'Full Name: ' FULL-NAME
           DISPLAY 'Formatted: ' FORMATTED-NAME
           
           STOP RUN.

This demonstrates building formatted names from components. STRING with DELIMITED BY SPACE avoids copying trailing spaces, keeping the output clean.

Best Practices for Text Processing

Follow these best practices for effective text processing:

Always Validate Field Counts: Use TALLYING IN with UNSTRING to verify you received the expected number of fields. Handle cases where fields are missing or extra.
Handle Empty Fields: Use DELIMITED BY ALL when parsing data that may have empty fields (consecutive delimiters). Always check if fields are empty before using them.
Normalize Before Comparing: Convert case and trim whitespace before comparing text values. This ensures comparisons work correctly regardless of input format.
Validate Data Types: Use class tests (IS NUMERIC, IS ALPHABETIC) to ensure data matches expected formats before processing.
Check Field Lengths: Validate that parsed fields don't exceed maximum lengths to prevent truncation and data loss.
Clean Input Data: Remove or replace unwanted characters (control characters, special formatting) early in processing to avoid issues later.
Handle Edge Cases: Test with empty strings, missing delimiters, extra delimiters, and data that exceeds field sizes.
Use Meaningful Field Names: Name parsing fields, counters, and work fields descriptively to make code self-documenting.
Initialize Pointer Fields: When using WITH POINTER, initialize to 1 (or desired starting position) before UNSTRING operations.
Document Expected Formats: Add comments explaining the expected format of input data, delimiter characters, and field order.

Common Text Processing Patterns

Pattern 1: CSV Parser with Validation

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
       WORKING-STORAGE SECTION.
       01  CSV-LINE         PIC X(200).
       01  FIELD-1          PIC X(20).
       01  FIELD-2          PIC X(20).
       01  FIELD-3          PIC X(20).
       01  FIELD-COUNT      PIC 9(4) VALUE ZERO.
       01  EXPECTED-FIELDS  PIC 9(4) VALUE 3.
       
       PROCEDURE DIVISION.
       PARSE-CSV.
           UNSTRING CSV-LINE
           DELIMITED BY ALL ','
           INTO FIELD-1
                FIELD-2
                FIELD-3
           TALLYING IN FIELD-COUNT
           END-UNSTRING
           
           IF FIELD-COUNT NOT = EXPECTED-FIELDS
               DISPLAY 'ERROR: Invalid CSV format'
               STOP RUN
           END-IF
           
           *> Process fields...
           EXIT.

Pattern 2: Normalize and Validate Input

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
       WORKING-STORAGE SECTION.
       01  USER-INPUT       PIC X(30).
       01  NORMALIZED-INPUT PIC X(30).
       01  IS-VALID         PIC X VALUE 'N'.
       
       PROCEDURE DIVISION.
       NORMALIZE-INPUT.
           *> Convert to uppercase
           MOVE FUNCTION UPPER-CASE(USER-INPUT) TO NORMALIZED-INPUT
           
           *> Trim trailing spaces
           MOVE NORMALIZED-INPUT(1:FUNCTION LENGTH(
               FUNCTION TRIM(NORMALIZED-INPUT)
           )) TO NORMALIZED-INPUT
           
           *> Validate not empty
           IF NORMALIZED-INPUT NOT = SPACES
               MOVE 'Y' TO IS-VALID
           END-IF
           
           EXIT.

Pattern 3: Parse Key-Value Pairs

cobol

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
       WORKING-STORAGE SECTION.
       01  INPUT-LINE       PIC X(200) VALUE 'ID=123|NAME=JOHN|CITY=AUSTIN'.
       01  KEY-VALUE-PAIRS  PIC X(20) OCCURS 10 TIMES.
       01  KEY-FIELD        PIC X(10).
       01  VALUE-FIELD      PIC X(20).
       01  PAIR-COUNT       PIC 9(4) VALUE ZERO.
       
       PROCEDURE DIVISION.
       PARSE-KEY-VALUE.
           *> Parse alternating key=value pairs
           UNSTRING INPUT-LINE
           DELIMITED BY '=' OR '|'
           INTO KEY-FIELD
                VALUE-FIELD
                KEY-FIELD
                VALUE-FIELD
                KEY-FIELD
                VALUE-FIELD
           TALLYING IN PAIR-COUNT
           END-UNSTRING
           
           *> Process key-value pairs...
           EXIT.

Explain Like I'm 5: Text Processing

Think of text processing like organizing a messy toy box:

Parsing is like sorting toys into separate bins. You have a big box with all toys mixed together (like "toy1,toy2,toy3"), and you separate them into individual bins (individual fields) based on markers (delimiters like commas).
Normalization is like making sure all your toys face the same direction. You turn all the cars to face forward, put all the blocks in the same orientation, so everything looks consistent and organized.
Validation is like checking your homework. You make sure you have the right number of toys (field count), that each toy is the right type (data format), and that nothing is missing (required fields).
Cleaning is like washing your toys. You remove dirt (unwanted characters), fix broken parts (formatting issues), and make everything ready to play with (process).

So text processing is all about taking messy, unorganized text data and making it neat, consistent, and ready to use—just like organizing and cleaning your toys!

Practice Exercises

Complete these exercises to reinforce your understanding of text processing:

Exercise 1: CSV Parser

Create a program that parses a CSV line with 5 fields (ID, First Name, Last Name, Email, Phone). Validate that all 5 fields are present, and display each field. Handle empty fields correctly.

Exercise 2: Text Normalizer

Create a program that normalizes user input: converts to uppercase, trims leading and trailing spaces, and normalizes multiple spaces to single spaces. Display the original and normalized versions.

Exercise 3: Phone Number Validator

Create a program that validates phone numbers. Accept input in various formats (with/without dashes, parentheses, spaces), clean the input to remove formatting, and validate that it contains exactly 10 digits.

Exercise 4: Date Format Converter

Create a program that parses a date in MM/DD/YYYY format, validates the components (month 1-12, day 1-31, year reasonable), and converts it to YYYY-MM-DD format. Handle invalid dates with error messages.

Exercise 5: Key-Value Parser

Create a program that parses a string in the format "KEY1=VALUE1|KEY2=VALUE2|KEY3=VALUE3". Extract each key-value pair, normalize the keys to uppercase, and display them in a formatted list.

Test Your Knowledge

1. What is the primary purpose of text processing in COBOL?

To perform mathematical calculations
To parse, normalize, validate, and transform text data
To read and write files
To control program flow

2. How do you parse a comma-separated value (CSV) line in COBOL?

Using STRING statement
Using UNSTRING DELIMITED BY ','
Using INSPECT statement
Using MOVE statement

3. How do you convert text to uppercase in COBOL?

INSPECT CONVERTING
FUNCTION UPPER-CASE
MOVE TO UPPERCASE
STRING TO UPPER

4. What does DELIMITED BY ALL do in UNSTRING?

Treats multiple consecutive delimiters as one
Splits on all possible delimiters
Ignores delimiters
Requires all delimiters to be present

5. How do you trim trailing spaces from a text field?

Use INSPECT REPLACING
Move the field to a smaller PIC field
Use UNSTRING
Use STRING

6. What does TALLYING IN do in UNSTRING?

Counts characters in each field
Counts how many receiving fields were filled
Counts delimiters found
Counts total characters processed

COBOL Tutorial

COBOL Text Processing

What is Text Processing?

Parsing Delimited Data

Parsing Comma-Separated Values (CSV)

Handling Empty Fields in CSV

Parsing Pipe-Delimited Data

Parsing with Multiple Delimiters

Text Normalization

Case Normalization

Trimming Whitespace

Trimming Trailing Spaces

Trimming Leading Spaces

Standardizing Whitespace

Text Validation

Validating Field Counts

Validating Field Lengths

Validating Data Formats

Validating Required Fields

Data Cleaning

Removing Special Characters

Removing Control Characters

Working with Formatted Text

Converting Date Formats

Building Formatted Output

Best Practices for Text Processing

Common Text Processing Patterns

Pattern 1: CSV Parser with Validation

Pattern 2: Normalize and Validate Input

Pattern 3: Parse Key-Value Pairs

Explain Like I'm 5: Text Processing

Practice Exercises

Exercise 1: CSV Parser

Exercise 2: Text Normalizer

Exercise 3: Phone Number Validator

Exercise 4: Date Format Converter

Exercise 5: Key-Value Parser

Test Your Knowledge

Related Concepts

Related Concepts

String Processing

Data Validation

File Operations

Working Storage Section

Related Pages