Record blocking is a fundamental concept in COBOL file processing that significantly impacts I/O performance and storage efficiency. Understanding the difference between physical records (blocks) and logical records, and how to specify blocking factors, is essential for optimizing file operations in mainframe environments. This guide covers physical and logical records, the BLOCK CONTAINS clause, blocking factors, and performance considerations.
Record blocking is the process of grouping multiple logical records into a single physical record (block) for storage and I/O operations. This technique:
Understanding the distinction between physical and logical records is crucial for effective file processing:
A logical record is the unit of data that your COBOL program processes:
123456701 CUSTOMER-RECORD. 05 CUSTOMER-ID PIC X(8). 05 CUSTOMER-NAME PIC X(30). 05 CUSTOMER-ADDRESS PIC X(50). 05 CUSTOMER-PHONE PIC X(12). *> This is a logical record - what your program processes
A physical record, also called a block, is what is actually stored on the storage device:
Consider a file with logical records of 80 characters each, blocked at 10 records per block:
Physical Block 1 (800 characters):
[Record 1: 80 chars][Record 2: 80 chars][Record 3: 80 chars]...
[Record 10: 80 chars]
Physical Block 2 (800 characters):
[Record 11: 80 chars][Record 12: 80 chars]...
[Record 20: 80 chars]
When your program reads Record 1, the system actually reads the entire first block (containing Records 1-10) into memory. Subsequent READ statements for Records 2-10 use the data already in memory, avoiding additional I/O operations.
The BLOCK CONTAINS clause in the File Description (FD) entry specifies how logical records are grouped into physical blocks. This clause is essential for optimizing I/O performance.
12345FD file-name LABEL RECORDS ARE STANDARD BLOCK CONTAINS [integer-1] [TO integer-2] {RECORDS | CHARACTERS | WORDS} RECORD CONTAINS [integer-3] [TO integer-4] CHARACTERS.
The most common approach is to specify the number of logical records per physical block:
12345678910111213FD CUSTOMER-FILE LABEL RECORDS ARE STANDARD BLOCK CONTAINS 20 RECORDS RECORD CONTAINS 100 CHARACTERS. 01 CUSTOMER-RECORD. 05 CUSTOMER-ID PIC 9(5). 05 CUSTOMER-NAME PIC X(30). 05 CUSTOMER-ADDRESS PIC X(50). 05 CUSTOMER-SALARY PIC 9(7)V99. 05 FILLER PIC X(5). *> This creates blocks of 2,000 characters (20 × 100) *> Each physical I/O operation transfers 20 logical records
You can also specify blocking by the total number of characters:
12345678FD DATA-FILE LABEL RECORDS ARE STANDARD BLOCK CONTAINS 1600 CHARACTERS RECORD CONTAINS 80 CHARACTERS. *> This creates blocks of 1,600 characters *> With 80-character records, this equals 20 records per block *> (1600 ÷ 80 = 20)
For variable-length records, you can specify a range:
12345678FD VARIABLE-FILE LABEL RECORDS ARE STANDARD BLOCK CONTAINS 100 TO 200 CHARACTERS RECORD CONTAINS 50 TO 100 CHARACTERS DEPENDING ON RECORD-LENGTH. *> Blocks can vary in size from 100 to 200 characters *> Each block contains one or more variable-length records
If you don't specify BLOCK CONTAINS, or specify BLOCK CONTAINS 0 RECORDS, each logical record becomes its own physical block:
12345678FD UNBLOCKED-FILE LABEL RECORDS ARE STANDARD BLOCK CONTAINS 0 RECORDS RECORD CONTAINS 80 CHARACTERS. *> Each logical record is a separate physical block *> Maximum I/O operations (one per record) *> Used when random access is more important than sequential performance
The blocking factor is the number of logical records contained in each physical block. Choosing the right blocking factor is crucial for optimal performance.
| File Type | Typical Blocking Factor | Reasoning |
|---|---|---|
| Sequential Files | 10-50 records | Optimize for sequential processing |
| Indexed Files (VSAM) | 5-20 records | Balance I/O efficiency with index performance |
| Large Batch Files | 50-100 records | Maximize I/O efficiency for bulk processing |
| Small Files | 1-10 records | Minimize memory usage |
| Tape Files | 20-100 records | Optimize for tape I/O characteristics |
Record blocking has a significant impact on file processing performance:
Consider a file with 10,000 records of 100 characters each:
| Blocking Factor | Block Size | Blocks Required | I/O Operations |
|---|---|---|---|
| 1 (unblocked) | 100 bytes | 10,000 | 10,000 |
| 10 records | 1,000 bytes | 1,000 | 1,000 (90% reduction) |
| 50 records | 5,000 bytes | 200 | 200 (98% reduction) |
| 100 records | 10,000 bytes | 100 | 100 (99% reduction) |
As you can see, increasing the blocking factor dramatically reduces the number of I/O operations, which directly improves performance.
While larger blocks reduce I/O operations, they also require more memory:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950IDENTIFICATION DIVISION. PROGRAM-ID. CUSTOMER-PROCESS. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT CUSTOMER-FILE ASSIGN TO "CUSTMAST.DAT" ORGANIZATION IS SEQUENTIAL ACCESS MODE IS SEQUENTIAL FILE STATUS IS WS-CUST-STATUS. DATA DIVISION. FILE SECTION. FD CUSTOMER-FILE LABEL RECORDS ARE STANDARD BLOCK CONTAINS 25 RECORDS RECORD CONTAINS 100 CHARACTERS. 01 CUSTOMER-RECORD. 05 CUSTOMER-ID PIC 9(5). 05 CUSTOMER-NAME PIC X(30). 05 CUSTOMER-ADDRESS PIC X(50). 05 CUSTOMER-STATUS PIC X. 05 FILLER PIC X(14). WORKING-STORAGE SECTION. 01 WS-CUST-STATUS PIC XX. 01 RECORD-COUNT PIC 9(6) VALUE ZERO. 01 EOF-FLAG PIC X VALUE 'N'. 88 END-OF-FILE VALUE 'Y'. PROCEDURE DIVISION. MAIN-PROCESS. OPEN INPUT CUSTOMER-FILE PERFORM READ-CUSTOMERS UNTIL END-OF-FILE CLOSE CUSTOMER-FILE DISPLAY "Total records processed: " RECORD-COUNT STOP RUN. READ-CUSTOMERS. READ CUSTOMER-FILE AT END MOVE 'Y' TO EOF-FLAG NOT AT END ADD 1 TO RECORD-COUNT PERFORM PROCESS-CUSTOMER END-READ. *> This example uses BLOCK CONTAINS 25 RECORDS *> Each physical block contains 25 logical records (2,500 characters) *> Reading 25 records requires only 1 I/O operation instead of 25
1234567891011121314151617FD TRANSACTION-FILE LABEL RECORDS ARE STANDARD BLOCK CONTAINS 50 RECORDS RECORD CONTAINS 80 CHARACTERS. 01 TRANSACTION-RECORD. 05 TRANS-DATE PIC 9(8). 05 TRANS-TIME PIC 9(6). 05 TRANS-TYPE PIC X. 05 TRANS-AMOUNT PIC 9(7)V99. 05 ACCOUNT-NUMBER PIC 9(10). 05 TRANS-DESCRIPTION PIC X(40). 05 FILLER PIC X(6). *> High-volume transaction processing benefits from larger blocking *> 50 records per block optimizes I/O for bulk processing *> Each block contains 4,000 characters (50 × 80) *> Reduces I/O operations by 98% compared to unblocked records
1234567891011FD REPORT-FILE LABEL RECORDS ARE STANDARD BLOCK CONTAINS 10 RECORDS RECORD CONTAINS 132 CHARACTERS. 01 REPORT-RECORD. 05 REPORT-LINE PIC X(132). *> Report files use smaller blocking for flexibility *> 10 records per block allows for efficient line-by-line processing *> Each block contains 1,320 characters (10 × 132) *> Suitable for both sequential and random access patterns
Imagine you're moving books from one bookshelf to another:
If you move books one at a time, you have to walk back and forth many times. That's like unblocked records - each record requires a separate I/O operation (walking to the bookshelf).
But if you use a box to carry many books at once, you make fewer trips. That's like record blocking - you put many logical records (books) into one physical block (box), so you make fewer I/O operations (trips).
The bigger the box (block), the fewer trips you make, but you need a stronger person (more memory) to carry it. So you need to find the right size box - not too small (too many trips) and not too big (too heavy to carry)!
Calculate the physical block size for each scenario:
Answer: Block size = Record size × Blocking factor. So: 800 characters, 2,500 characters, and 10,000 characters respectively.
A file contains 5,000 records of 120 characters each. Compare the number of I/O operations required for:
Answer: Unblocked: 5,000 I/O operations. Blocking factor 20: 250 I/O operations (95% reduction). Blocking factor 50: 100 I/O operations (98% reduction).
Write a File Description entry for a customer file with:
Hint: Use BLOCK CONTAINS 30 RECORDS and RECORD CONTAINS 150 CHARACTERS.
Analyze the performance impact of different blocking factors:
1. What is the difference between a physical record and a logical record?
2. What does BLOCK CONTAINS 20 RECORDS mean?
3. Why is record blocking important for performance?
4. What happens if you specify BLOCK CONTAINS 0 RECORDS?
5. Which blocking factor would be most appropriate for a large sequential batch file?
6. What is the relationship between block size, record size, and blocking factor?