MainframeMaster
MainframeMaster

COBOL Tutorial

Progress0 of 0 lessons

COBOL Text Processing

Text processing in COBOL involves parsing, normalizing, validating, and transforming text data to prepare it for use in your programs. Unlike basic string manipulation, text processing focuses on working with structured text formats like CSV files, delimited records, user input, and formatted data. Understanding text processing is essential for handling real-world data that comes in various formats, cleaning input data, validating user entries, and converting between different text representations in mainframe COBOL applications.

What is Text Processing?

Text processing encompasses operations that work with formatted or structured text data. Key text processing operations include:

  • Parsing: Splitting delimited strings (CSV, pipe-delimited, tab-delimited) into individual fields
  • Normalization: Converting text to consistent formats (case, whitespace, encoding)
  • Validation: Checking data format, length, content, and structure
  • Cleaning: Removing unwanted characters, trimming spaces, standardizing formats
  • Transformation: Converting between different text formats and representations
  • Tokenization: Breaking text into tokens or words based on delimiters

These operations are fundamental for processing user input, parsing file records, handling data imports, validating forms, and preparing data for storage or display in business applications.

Parsing Delimited Data

One of the most common text processing tasks is parsing delimited data, where fields are separated by specific characters like commas, pipes, or tabs. This is essential for processing CSV files, log files, and formatted input records.

Parsing Comma-Separated Values (CSV)

CSV (Comma-Separated Values) is a common format where fields are separated by commas. Here's how to parse CSV data:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
WORKING-STORAGE SECTION. 01 CSV-LINE PIC X(200). 01 CUSTOMER-ID PIC X(10). 01 CUSTOMER-NAME PIC X(30). 01 CITY PIC X(20). 01 STATE PIC X(2). 01 ZIP-CODE PIC X(10). 01 FIELD-COUNT PIC 9(4) VALUE ZERO. 01 CSV-POINTER PIC 9(4) VALUE 1. PROCEDURE DIVISION. MAIN-PARA. MOVE '12345,JOHN SMITH,AUSTIN,TX,78701' TO CSV-LINE UNSTRING CSV-LINE DELIMITED BY ',' INTO CUSTOMER-ID CUSTOMER-NAME CITY STATE ZIP-CODE WITH POINTER CSV-POINTER TALLYING IN FIELD-COUNT END-UNSTRING DISPLAY 'Customer ID: ' CUSTOMER-ID DISPLAY 'Name: ' CUSTOMER-NAME DISPLAY 'City: ' CITY DISPLAY 'State: ' STATE DISPLAY 'Zip: ' ZIP-CODE DISPLAY 'Fields parsed: ' FIELD-COUNT STOP RUN.

In this example:

  • CSV-LINE contains the comma-separated data: "12345,JOHN SMITH,AUSTIN,TX,78701"
  • DELIMITED BY ',' tells UNSTRING to split on commas
  • INTO specifies the receiving fields where parsed data goes
  • TALLYING IN FIELD-COUNT counts how many fields were successfully filled (should be 5)
  • WITH POINTER CSV-POINTER tracks the current position in the source string

After execution, CUSTOMER-ID contains "12345", CUSTOMER-NAME contains "JOHN SMITH", CITY contains "AUSTIN", STATE contains "TX", ZIP-CODE contains "78701", and FIELD-COUNT contains 5.

Handling Empty Fields in CSV

CSV data often contains empty fields (consecutive commas). Use DELIMITED BY ALL to handle this:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WORKING-STORAGE SECTION. 01 CSV-LINE PIC X(200) VALUE '12345,,AUSTIN,TX,78701'. 01 CUSTOMER-ID PIC X(10). 01 MIDDLE-NAME PIC X(20). 01 CITY PIC X(20). 01 STATE PIC X(2). 01 ZIP-CODE PIC X(10). PROCEDURE DIVISION. MAIN-PARA. UNSTRING CSV-LINE DELIMITED BY ALL ',' INTO CUSTOMER-ID MIDDLE-NAME CITY STATE ZIP-CODE END-UNSTRING *> MIDDLE-NAME will be empty (spaces) because of the empty field DISPLAY 'ID: ' CUSTOMER-ID DISPLAY 'Middle: [' MIDDLE-NAME ']' DISPLAY 'City: ' CITY STOP RUN.

DELIMITED BY ALL ',' treats multiple consecutive commas as a single delimiter. Without ALL, each comma would create a separate field, but with ALL, consecutive commas represent a single empty field. This is crucial for correctly parsing CSV data with missing values.

Parsing Pipe-Delimited Data

Pipe-delimited format uses the pipe character (|) as a separator. It's common in data exchange formats:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
WORKING-STORAGE SECTION. 01 PIPE-LINE PIC X(200) VALUE '12345|JOHN SMITH|AUSTIN|TX|78701'. 01 CUSTOMER-ID PIC X(10). 01 CUSTOMER-NAME PIC X(30). 01 CITY PIC X(20). 01 STATE PIC X(2). 01 ZIP-CODE PIC X(10). PROCEDURE DIVISION. MAIN-PARA. UNSTRING PIPE-LINE DELIMITED BY '|' INTO CUSTOMER-ID CUSTOMER-NAME CITY STATE ZIP-CODE END-UNSTRING DISPLAY 'Parsed pipe-delimited data' STOP RUN.

Parsing with Multiple Delimiters

You can parse data that uses multiple possible delimiters:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WORKING-STORAGE SECTION. 01 INPUT-LINE PIC X(200) VALUE 'ID=12345|NAME=JOHN|CITY=AUSTIN'. 01 ID-FIELD PIC X(10). 01 NAME-FIELD PIC X(30). 01 CITY-FIELD PIC X(20). 01 FILLER-FIELD PIC X(20). PROCEDURE DIVISION. MAIN-PARA. *> Parse alternating label=value pairs UNSTRING INPUT-LINE DELIMITED BY '=' OR '|' INTO FILLER-FIELD *> Skip "ID" ID-FIELD *> Get "12345" FILLER-FIELD *> Skip "NAME" NAME-FIELD *> Get "JOHN" FILLER-FIELD *> Skip "CITY" CITY-FIELD *> Get "AUSTIN" END-UNSTRING DISPLAY 'ID: ' ID-FIELD DISPLAY 'Name: ' NAME-FIELD DISPLAY 'City: ' CITY-FIELD STOP RUN.

This pattern alternates between labels and values. By using FILLER (or a throwaway field) for labels and actual fields for values, you can extract just the data you need from formatted input like "KEY=VALUE|KEY=VALUE".

Text Normalization

Text normalization converts text data to a consistent format, making it easier to compare, search, and process. Common normalization operations include case conversion, whitespace handling, and character standardization.

Case Normalization

Converting text to consistent case (uppercase or lowercase) ensures comparisons work correctly:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
WORKING-STORAGE SECTION. 01 USER-INPUT PIC X(30) VALUE 'John Smith'. 01 UPPER-NAME PIC X(30). 01 LOWER-NAME PIC X(30). 01 SEARCH-NAME PIC X(30) VALUE 'JOHN SMITH'. PROCEDURE DIVISION. MAIN-PARA. *> Convert to uppercase MOVE FUNCTION UPPER-CASE(USER-INPUT) TO UPPER-NAME *> Convert to lowercase MOVE FUNCTION LOWER-CASE(USER-INPUT) TO LOWER-NAME DISPLAY 'Original: ' USER-INPUT DISPLAY 'Uppercase: ' UPPER-NAME DISPLAY 'Lowercase: ' LOWER-NAME *> Now comparisons work correctly IF UPPER-NAME = SEARCH-NAME DISPLAY 'Match found!' END-IF STOP RUN.

FUNCTION UPPER-CASE converts all alphabetic characters to uppercase while preserving numbers, spaces, and special characters. FUNCTION LOWER-CASE does the opposite. These functions are essential for case-insensitive comparisons and data standardization.

Trimming Whitespace

Removing leading and trailing spaces normalizes text fields:

Trimming Trailing Spaces

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
WORKING-STORAGE SECTION. 01 SOURCE-FIELD PIC X(30) VALUE 'JOHN SMITH '. 01 TRIMMED-FIELD PIC X(15). PROCEDURE DIVISION. MAIN-PARA. *> Moving to smaller field automatically trims trailing spaces MOVE SOURCE-FIELD TO TRIMMED-FIELD DISPLAY 'Source: [' SOURCE-FIELD ']' DISPLAY 'Trimmed: [' TRIMMED-FIELD ']' STOP RUN.

When you move a field to a smaller PIC field, COBOL automatically truncates trailing spaces. TRIMMED-FIELD (PIC X(15)) will contain "JOHN SMITH" without the trailing spaces from the 30-character source field.

Trimming Leading Spaces

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
WORKING-STORAGE SECTION. 01 SOURCE-FIELD PIC X(30) VALUE ' JOHN SMITH'. 01 TRIMMED-FIELD PIC X(30). 01 WORK-FIELD PIC X(30). 01 SPACE-COUNT PIC 9(4) VALUE ZERO. 01 START-POS PIC 9(4). PROCEDURE DIVISION. MAIN-PARA. *> Count leading spaces INSPECT SOURCE-FIELD TALLYING SPACE-COUNT FOR LEADING SPACE *> Calculate starting position (1-based) COMPUTE START-POS = SPACE-COUNT + 1 *> Extract non-space portion using UNSTRING UNSTRING SOURCE-FIELD DELIMITED BY ALL SPACE INTO TRIMMED-FIELD WITH POINTER START-POS END-UNSTRING DISPLAY 'Original: [' SOURCE-FIELD ']' DISPLAY 'Trimmed: [' TRIMMED-FIELD ']' STOP RUN.

This approach uses INSPECT to count leading spaces, then UNSTRING starting after the leading spaces to extract the actual content. Alternatively, you can use INSPECT REPLACING to replace leading spaces with another character, but the UNSTRING method is cleaner for extraction.

Standardizing Whitespace

Normalize multiple spaces to single spaces:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
WORKING-STORAGE SECTION. 01 TEXT-FIELD PIC X(50) VALUE 'JOHN SMITH DOE'. 01 NORMALIZED-FIELD PIC X(50). 01 WORK-FIELD PIC X(50). 01 CHAR-PTR PIC 9(4) VALUE 1. 01 PREV-CHAR PIC X. PROCEDURE DIVISION. MAIN-PARA. *> Replace multiple spaces with single space MOVE TEXT-FIELD TO WORK-FIELD *> First pass: replace all double spaces with single space PERFORM UNTIL WORK-FIELD NOT CONTAINS ' ' INSPECT WORK-FIELD REPLACING ALL ' ' BY ' ' END-PERFORM MOVE WORK-FIELD TO NORMALIZED-FIELD DISPLAY 'Original: [' TEXT-FIELD ']' DISPLAY 'Normalized: [' NORMALIZED-FIELD ']' STOP RUN.

This repeatedly replaces double spaces with single spaces until no double spaces remain. The result is text with normalized spacing: "JOHN SMITH DOE" instead of "JOHN SMITH DOE".

Text Validation

Validating text data ensures it meets expected formats, lengths, and content requirements before processing or storage.

Validating Field Counts

When parsing delimited data, verify you received the expected number of fields:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
WORKING-STORAGE SECTION. 01 CSV-LINE PIC X(200). 01 FIELD-1 PIC X(20). 01 FIELD-2 PIC X(20). 01 FIELD-3 PIC X(20). 01 FIELD-COUNT PIC 9(4) VALUE ZERO. 01 EXPECTED-COUNT PIC 9(4) VALUE 3. PROCEDURE DIVISION. MAIN-PARA. MOVE 'VALUE1,VALUE2,VALUE3' TO CSV-LINE UNSTRING CSV-LINE DELIMITED BY ',' INTO FIELD-1 FIELD-2 FIELD-3 TALLYING IN FIELD-COUNT END-UNSTRING IF FIELD-COUNT NOT = EXPECTED-COUNT DISPLAY 'ERROR: Expected ' EXPECTED-COUNT ' fields, got ' FIELD-COUNT STOP RUN END-IF DISPLAY 'Validation passed: ' FIELD-COUNT ' fields' STOP RUN.

TALLYING IN counts how many receiving fields were filled. If the source has fewer delimiters than expected, some fields remain empty. Always validate the field count matches expectations to catch malformed input.

Validating Field Lengths

Check that parsed fields don't exceed maximum lengths:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
WORKING-STORAGE SECTION. 01 INPUT-FIELD PIC X(50). 01 MAX-LENGTH PIC 9(4) VALUE 20. 01 ACTUAL-LENGTH PIC 9(4). PROCEDURE DIVISION. MAIN-PARA. MOVE 'THIS IS A VERY LONG FIELD THAT EXCEEDS LIMIT' TO INPUT-FIELD *> Get actual length (trimmed) COMPUTE ACTUAL-LENGTH = FUNCTION LENGTH( FUNCTION TRIM(INPUT-FIELD) ) IF ACTUAL-LENGTH > MAX-LENGTH DISPLAY 'ERROR: Field length ' ACTUAL-LENGTH ' exceeds maximum ' MAX-LENGTH ELSE DISPLAY 'Field length valid: ' ACTUAL-LENGTH END-IF STOP RUN.

FUNCTION LENGTH returns the length of a string. Combined with FUNCTION TRIM (if available), you can validate that trimmed field lengths are within acceptable ranges. This prevents data truncation and ensures data integrity.

Validating Data Formats

Verify that data matches expected formats (numeric, alphabetic, date format, etc.):

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
WORKING-STORAGE SECTION. 01 ZIP-CODE PIC X(10) VALUE '78701'. 01 ZIP-NUMERIC PIC 9(5). 01 IS-VALID PIC X VALUE 'N'. PROCEDURE DIVISION. MAIN-PARA. *> Validate ZIP code is numeric IF ZIP-CODE IS NUMERIC MOVE ZIP-CODE TO ZIP-NUMERIC MOVE 'Y' TO IS-VALID DISPLAY 'ZIP code is valid: ' ZIP-CODE ELSE DISPLAY 'ERROR: ZIP code must be numeric: ' ZIP-CODE END-IF STOP RUN.

COBOL provides class tests like IS NUMERIC, IS ALPHABETIC, and IS ALPHANUMERIC to validate data types. Use these to ensure data matches expected formats before processing.

Validating Required Fields

Ensure required fields are not empty:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
WORKING-STORAGE SECTION. 01 CUSTOMER-NAME PIC X(30) VALUE SPACES. 01 IS-EMPTY PIC X VALUE 'N'. 01 SPACE-COUNT PIC 9(4) VALUE ZERO. PROCEDURE DIVISION. MAIN-PARA. *> Check if field is empty (all spaces) INSPECT CUSTOMER-NAME TALLYING SPACE-COUNT FOR CHARACTERS IF SPACE-COUNT = FUNCTION LENGTH(CUSTOMER-NAME) MOVE 'Y' TO IS-EMPTY DISPLAY 'ERROR: Customer name is required' ELSE DISPLAY 'Customer name is valid: ' CUSTOMER-NAME END-IF STOP RUN.

This counts all characters in the field. If the count equals the field length, the field contains only spaces (is empty). This validation ensures required fields have actual data.

Data Cleaning

Data cleaning removes unwanted characters, fixes formatting issues, and prepares data for processing:

Removing Special Characters

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
WORKING-STORAGE SECTION. 01 PHONE-NUMBER PIC X(20) VALUE '(555) 123-4567'. 01 CLEAN-PHONE PIC X(20). PROCEDURE DIVISION. MAIN-PARA. MOVE PHONE-NUMBER TO CLEAN-PHONE *> Remove parentheses INSPECT CLEAN-PHONE REPLACING ALL '(' BY SPACE INSPECT CLEAN-PHONE REPLACING ALL ')' BY SPACE *> Remove spaces INSPECT CLEAN-PHONE REPLACING ALL SPACE BY ZERO *> Remove dashes INSPECT CLEAN-PHONE REPLACING ALL '-' BY ZERO DISPLAY 'Original: ' PHONE-NUMBER DISPLAY 'Cleaned: ' CLEAN-PHONE STOP RUN.

This removes formatting characters (parentheses, spaces, dashes) from a phone number, leaving only digits. Multiple INSPECT statements handle different character replacements.

Removing Control Characters

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
WORKING-STORAGE SECTION. 01 TEXT-LINE PIC X(100) VALUE 'TEXT WITH' X'09' 'TABS'. 01 CLEAN-LINE PIC X(100). PROCEDURE DIVISION. MAIN-PARA. MOVE TEXT-LINE TO CLEAN-LINE *> Replace tab character (X'09') with space INSPECT CLEAN-LINE REPLACING ALL X'09' BY SPACE *> Replace other control characters INSPECT CLEAN-LINE REPLACING ALL X'0D' BY SPACE *> Carriage return INSPECT CLEAN-LINE REPLACING ALL X'0A' BY SPACE *> Line feed DISPLAY 'Cleaned text: ' CLEAN-LINE STOP RUN.

Control characters like tabs (X'09'), carriage returns (X'0D'), and line feeds (X'0A') can cause issues. Replace them with spaces or remove them to clean input data.

Working with Formatted Text

Many text processing tasks involve converting between different text formats:

Converting Date Formats

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
WORKING-STORAGE SECTION. 01 DATE-INPUT PIC X(10) VALUE '12/25/2023'. 01 MONTH PIC X(2). 01 DAY PIC X(2). 01 YEAR PIC X(4). 01 DATE-OUTPUT PIC X(10) VALUE SPACES. PROCEDURE DIVISION. MAIN-PARA. *> Parse MM/DD/YYYY format UNSTRING DATE-INPUT DELIMITED BY '/' INTO MONTH DAY YEAR END-UNSTRING *> Convert to YYYY-MM-DD format STRING YEAR DELIMITED BY SIZE '-' DELIMITED BY SIZE MONTH DELIMITED BY SIZE '-' DELIMITED BY SIZE DAY DELIMITED BY SIZE INTO DATE-OUTPUT END-STRING DISPLAY 'Input: ' DATE-INPUT DISPLAY 'Output: ' DATE-OUTPUT STOP RUN.

This parses a date from MM/DD/YYYY format and converts it to YYYY-MM-DD format. UNSTRING extracts the components, then STRING rebuilds them in the new format.

Building Formatted Output

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
WORKING-STORAGE SECTION. 01 FIRST-NAME PIC X(20) VALUE 'JOHN'. 01 LAST-NAME PIC X(20) VALUE 'SMITH'. 01 FULL-NAME PIC X(42). 01 FORMATTED-NAME PIC X(50). PROCEDURE DIVISION. MAIN-PARA. *> Build full name STRING FIRST-NAME DELIMITED BY SPACE ' ' DELIMITED BY SIZE LAST-NAME DELIMITED BY SPACE INTO FULL-NAME END-STRING *> Format as "Last, First" STRING LAST-NAME DELIMITED BY SPACE ', ' DELIMITED BY SIZE FIRST-NAME DELIMITED BY SPACE INTO FORMATTED-NAME END-STRING DISPLAY 'Full Name: ' FULL-NAME DISPLAY 'Formatted: ' FORMATTED-NAME STOP RUN.

This demonstrates building formatted names from components. STRING with DELIMITED BY SPACE avoids copying trailing spaces, keeping the output clean.

Best Practices for Text Processing

Follow these best practices for effective text processing:

  • Always Validate Field Counts: Use TALLYING IN with UNSTRING to verify you received the expected number of fields. Handle cases where fields are missing or extra.
  • Handle Empty Fields: Use DELIMITED BY ALL when parsing data that may have empty fields (consecutive delimiters). Always check if fields are empty before using them.
  • Normalize Before Comparing: Convert case and trim whitespace before comparing text values. This ensures comparisons work correctly regardless of input format.
  • Validate Data Types: Use class tests (IS NUMERIC, IS ALPHABETIC) to ensure data matches expected formats before processing.
  • Check Field Lengths: Validate that parsed fields don't exceed maximum lengths to prevent truncation and data loss.
  • Clean Input Data: Remove or replace unwanted characters (control characters, special formatting) early in processing to avoid issues later.
  • Handle Edge Cases: Test with empty strings, missing delimiters, extra delimiters, and data that exceeds field sizes.
  • Use Meaningful Field Names: Name parsing fields, counters, and work fields descriptively to make code self-documenting.
  • Initialize Pointer Fields: When using WITH POINTER, initialize to 1 (or desired starting position) before UNSTRING operations.
  • Document Expected Formats: Add comments explaining the expected format of input data, delimiter characters, and field order.

Common Text Processing Patterns

Pattern 1: CSV Parser with Validation

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WORKING-STORAGE SECTION. 01 CSV-LINE PIC X(200). 01 FIELD-1 PIC X(20). 01 FIELD-2 PIC X(20). 01 FIELD-3 PIC X(20). 01 FIELD-COUNT PIC 9(4) VALUE ZERO. 01 EXPECTED-FIELDS PIC 9(4) VALUE 3. PROCEDURE DIVISION. PARSE-CSV. UNSTRING CSV-LINE DELIMITED BY ALL ',' INTO FIELD-1 FIELD-2 FIELD-3 TALLYING IN FIELD-COUNT END-UNSTRING IF FIELD-COUNT NOT = EXPECTED-FIELDS DISPLAY 'ERROR: Invalid CSV format' STOP RUN END-IF *> Process fields... EXIT.

Pattern 2: Normalize and Validate Input

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
WORKING-STORAGE SECTION. 01 USER-INPUT PIC X(30). 01 NORMALIZED-INPUT PIC X(30). 01 IS-VALID PIC X VALUE 'N'. PROCEDURE DIVISION. NORMALIZE-INPUT. *> Convert to uppercase MOVE FUNCTION UPPER-CASE(USER-INPUT) TO NORMALIZED-INPUT *> Trim trailing spaces MOVE NORMALIZED-INPUT(1:FUNCTION LENGTH( FUNCTION TRIM(NORMALIZED-INPUT) )) TO NORMALIZED-INPUT *> Validate not empty IF NORMALIZED-INPUT NOT = SPACES MOVE 'Y' TO IS-VALID END-IF EXIT.

Pattern 3: Parse Key-Value Pairs

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
WORKING-STORAGE SECTION. 01 INPUT-LINE PIC X(200) VALUE 'ID=123|NAME=JOHN|CITY=AUSTIN'. 01 KEY-VALUE-PAIRS PIC X(20) OCCURS 10 TIMES. 01 KEY-FIELD PIC X(10). 01 VALUE-FIELD PIC X(20). 01 PAIR-COUNT PIC 9(4) VALUE ZERO. PROCEDURE DIVISION. PARSE-KEY-VALUE. *> Parse alternating key=value pairs UNSTRING INPUT-LINE DELIMITED BY '=' OR '|' INTO KEY-FIELD VALUE-FIELD KEY-FIELD VALUE-FIELD KEY-FIELD VALUE-FIELD TALLYING IN PAIR-COUNT END-UNSTRING *> Process key-value pairs... EXIT.

Explain Like I'm 5: Text Processing

Think of text processing like organizing a messy toy box:

  • Parsing is like sorting toys into separate bins. You have a big box with all toys mixed together (like "toy1,toy2,toy3"), and you separate them into individual bins (individual fields) based on markers (delimiters like commas).
  • Normalization is like making sure all your toys face the same direction. You turn all the cars to face forward, put all the blocks in the same orientation, so everything looks consistent and organized.
  • Validation is like checking your homework. You make sure you have the right number of toys (field count), that each toy is the right type (data format), and that nothing is missing (required fields).
  • Cleaning is like washing your toys. You remove dirt (unwanted characters), fix broken parts (formatting issues), and make everything ready to play with (process).

So text processing is all about taking messy, unorganized text data and making it neat, consistent, and ready to use—just like organizing and cleaning your toys!

Practice Exercises

Complete these exercises to reinforce your understanding of text processing:

Exercise 1: CSV Parser

Create a program that parses a CSV line with 5 fields (ID, First Name, Last Name, Email, Phone). Validate that all 5 fields are present, and display each field. Handle empty fields correctly.

Exercise 2: Text Normalizer

Create a program that normalizes user input: converts to uppercase, trims leading and trailing spaces, and normalizes multiple spaces to single spaces. Display the original and normalized versions.

Exercise 3: Phone Number Validator

Create a program that validates phone numbers. Accept input in various formats (with/without dashes, parentheses, spaces), clean the input to remove formatting, and validate that it contains exactly 10 digits.

Exercise 4: Date Format Converter

Create a program that parses a date in MM/DD/YYYY format, validates the components (month 1-12, day 1-31, year reasonable), and converts it to YYYY-MM-DD format. Handle invalid dates with error messages.

Exercise 5: Key-Value Parser

Create a program that parses a string in the format "KEY1=VALUE1|KEY2=VALUE2|KEY3=VALUE3". Extract each key-value pair, normalize the keys to uppercase, and display them in a formatted list.

Test Your Knowledge

1. What is the primary purpose of text processing in COBOL?

  • To perform mathematical calculations
  • To parse, normalize, validate, and transform text data
  • To read and write files
  • To control program flow

2. How do you parse a comma-separated value (CSV) line in COBOL?

  • Using STRING statement
  • Using UNSTRING DELIMITED BY ','
  • Using INSPECT statement
  • Using MOVE statement

3. How do you convert text to uppercase in COBOL?

  • INSPECT CONVERTING
  • FUNCTION UPPER-CASE
  • MOVE TO UPPERCASE
  • STRING TO UPPER

4. What does DELIMITED BY ALL do in UNSTRING?

  • Treats multiple consecutive delimiters as one
  • Splits on all possible delimiters
  • Ignores delimiters
  • Requires all delimiters to be present

5. How do you trim trailing spaces from a text field?

  • Use INSPECT REPLACING
  • Move the field to a smaller PIC field
  • Use UNSTRING
  • Use STRING

6. What does TALLYING IN do in UNSTRING?

  • Counts characters in each field
  • Counts how many receiving fields were filled
  • Counts delimiters found
  • Counts total characters processed

Related Concepts

Related Pages