COBOL DUPLICATES Clause
Master duplicate record handling in COBOL SORT and MERGE operations for reliable data processing and maintaining data integrity in mainframe applications.
Overview
The DUPLICATES clause in COBOL controls how duplicate records are handled during SORT and MERGE operations. It ensures predictable behavior when processing records that have identical sort key values, which is crucial for maintaining data integrity and producing consistent results in business applications.
Understanding duplicate handling is essential for data processing applications where the order of records with identical keys matters for business logic, audit trails, or regulatory compliance. The DUPLICATES clause provides control over whether the relative order of duplicate records is preserved during sorting operations.
This functionality is particularly important in mainframe environments where large volumes of data are processed and the stability of sort operations can affect downstream processing, reporting accuracy, and business decision-making processes.
Basic DUPLICATES Syntax
WITH DUPLICATES Clause
The WITH DUPLICATES clause ensures stable sorting behavior:
12345SORT SORT-FILE ON ASCENDING KEY CUSTOMER-ID WITH DUPLICATES IN ORDER USING INPUT-FILE GIVING OUTPUT-FILE.
This maintains the original relative order of records with identical CUSTOMER-ID values.
Standard Sort Without DUPLICATES
Without the DUPLICATES clause, order of duplicates is not guaranteed:
1234SORT SORT-FILE ON ASCENDING KEY CUSTOMER-ID USING INPUT-FILE GIVING OUTPUT-FILE.
This may change the relative order of records with identical keys for optimization purposes.
Complete Sort Examples
Customer Transaction Processing
Processing customer transactions while preserving chronological order:
123456789101112131415161718192021222324252627282930313233343536373839ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT TRANSACTION-FILE ASSIGN TO "TRANS.DAT". SELECT SORTED-TRANS-FILE ASSIGN TO "SORTED.DAT". SELECT SORT-WORK-FILE ASSIGN TO "SORTWORK". DATA DIVISION. FILE SECTION. FD TRANSACTION-FILE. 01 TRANSACTION-RECORD. 05 TRANS-CUST-ID PIC X(5). 05 TRANS-DATE PIC 9(8). 05 TRANS-TIME PIC 9(6). 05 TRANS-AMOUNT PIC 9(7)V99. 05 TRANS-TYPE PIC X(1). SD SORT-WORK-FILE. 01 SORT-RECORD. 05 SORT-CUST-ID PIC X(5). 05 SORT-DATE PIC 9(8). 05 SORT-TIME PIC 9(6). 05 SORT-AMOUNT PIC 9(7)V99. 05 SORT-TYPE PIC X(1). FD SORTED-TRANS-FILE. 01 SORTED-RECORD PIC X(25). PROCEDURE DIVISION. MAIN-SORT. SORT SORT-WORK-FILE ON ASCENDING KEY SORT-CUST-ID ON ASCENDING KEY SORT-DATE WITH DUPLICATES IN ORDER USING TRANSACTION-FILE GIVING SORTED-TRANS-FILE.
This preserves the chronological order of transactions for customers with multiple transactions on the same date.
Employee Payroll Processing
Sorting payroll records while maintaining entry sequence:
1234567891011121314151617181920212223SORT PAYROLL-SORT-FILE ON ASCENDING KEY EMP-DEPARTMENT ON ASCENDING KEY EMP-GRADE ON ASCENDING KEY EMP-ID WITH DUPLICATES IN ORDER INPUT PROCEDURE IS PAYROLL-INPUT-PROC OUTPUT PROCEDURE IS PAYROLL-OUTPUT-PROC. PAYROLL-INPUT-PROC. OPEN INPUT PAYROLL-INPUT-FILE PERFORM UNTIL END-OF-PAYROLL-INPUT READ PAYROLL-INPUT-FILE AT END MOVE "Y" TO EOF-FLAG NOT AT END PERFORM VALIDATE-PAYROLL-RECORD IF VALID-RECORD MOVE PAYROLL-RECORD TO SORT-RECORD RELEASE SORT-RECORD END-IF END-READ END-PERFORM CLOSE PAYROLL-INPUT-FILE.
This ensures employees with identical department, grade, and ID maintain their original processing sequence.
MERGE Operations with DUPLICATES
Multi-File Customer Data Merge
Merging customer data from multiple sources:
12345678MERGE CUSTOMER-MERGE-FILE ON ASCENDING KEY CUST-ID ON ASCENDING KEY CUST-REGION WITH DUPLICATES IN ORDER USING EAST-CUSTOMERS WEST-CUSTOMERS CENTRAL-CUSTOMERS GIVING MERGED-CUSTOMERS.
This preserves the source file order when customers appear in multiple regional files.
Sales Data Consolidation
Merging sales data while maintaining temporal relationships:
123456789101112131415161718192021222324MERGE SALES-MERGE-FILE ON ASCENDING KEY SALE-DATE ON ASCENDING KEY SALE-REGION ON ASCENDING KEY SALE-REP-ID WITH DUPLICATES IN ORDER INPUT PROCEDURE IS MERGE-INPUT-PROC OUTPUT PROCEDURE IS CONSOLIDATE-SALES. MERGE-INPUT-PROC. PERFORM PROCESS-MORNING-SALES PERFORM PROCESS-AFTERNOON-SALES PERFORM PROCESS-EVENING-SALES. CONSOLIDATE-SALES. PERFORM UNTIL NO-MORE-SALES-DATA RETURN SALES-MERGE-FILE AT END MOVE "Y" TO EOF-MERGE NOT AT END PERFORM CALCULATE-COMMISSION PERFORM UPDATE-SALES-TOTALS WRITE CONSOLIDATED-RECORD END-RETURN END-PERFORM.
This maintains the chronological sequence of sales within each date/region/rep combination.
Duplicate Detection and Removal
Identifying Duplicate Records
Using OUTPUT PROCEDURE to detect and handle duplicates:
123456789101112131415161718192021222324252627282930313233343536373839404142WORKING-STORAGE SECTION. 01 WS-PREVIOUS-RECORD. 05 WS-PREV-CUST-ID PIC X(5). 05 WS-PREV-CUST-NAME PIC X(30). 01 WS-DUPLICATE-COUNT PIC 9(5) VALUE ZERO. 01 WS-FIRST-RECORD-FLAG PIC X(1) VALUE "Y". PROCEDURE DIVISION. SORT-WITH-DUPLICATE-CHECK. SORT CUSTOMER-SORT-FILE ON ASCENDING KEY CUST-ID WITH DUPLICATES IN ORDER INPUT PROCEDURE IS READ-CUSTOMERS OUTPUT PROCEDURE IS CHECK-DUPLICATES. CHECK-DUPLICATES. PERFORM UNTIL NO-MORE-SORTED-DATA RETURN CUSTOMER-SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END IF WS-FIRST-RECORD-FLAG = "Y" MOVE "N" TO WS-FIRST-RECORD-FLAG PERFORM SAVE-CURRENT-RECORD ELSE PERFORM COMPARE-WITH-PREVIOUS END-IF WRITE OUTPUT-RECORD FROM SORT-RECORD END-RETURN END-PERFORM. COMPARE-WITH-PREVIOUS. IF CUST-ID = WS-PREV-CUST-ID ADD 1 TO WS-DUPLICATE-COUNT DISPLAY "Duplicate found: " CUST-ID " - " CUST-NAME END-IF PERFORM SAVE-CURRENT-RECORD. SAVE-CURRENT-RECORD. MOVE CUST-ID TO WS-PREV-CUST-ID MOVE CUST-NAME TO WS-PREV-CUST-NAME.
This approach identifies duplicates while preserving all records in the output.
Removing Duplicate Records
Creating a unique record set by eliminating duplicates:
1234567891011121314151617181920212223242526272829REMOVE-DUPLICATES. PERFORM UNTIL NO-MORE-SORTED-DATA RETURN PRODUCT-SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END IF WS-FIRST-RECORD-FLAG = "Y" MOVE "N" TO WS-FIRST-RECORD-FLAG PERFORM WRITE-UNIQUE-RECORD PERFORM SAVE-CURRENT-KEY ELSE IF PRODUCT-CODE NOT = WS-PREV-PRODUCT-CODE PERFORM WRITE-UNIQUE-RECORD PERFORM SAVE-CURRENT-KEY ELSE ADD 1 TO WS-DUPLICATES-REMOVED DISPLAY "Removing duplicate: " PRODUCT-CODE END-IF END-IF END-RETURN END-PERFORM. WRITE-UNIQUE-RECORD. WRITE UNIQUE-PRODUCT-RECORD FROM SORT-RECORD ADD 1 TO WS-UNIQUE-COUNT. SAVE-CURRENT-KEY. MOVE PRODUCT-CODE TO WS-PREV-PRODUCT-CODE.
This creates a file containing only the first occurrence of each unique product code.
Performance Considerations
Memory Usage Optimization
Optimizing sort performance with large datasets:
1234567891011ENVIRONMENT DIVISION. CONFIGURATION SECTION. SPECIAL-NAMES. SORT-CORE-SIZE IS 64000. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT SORT-WORK-FILE ASSIGN TO "SORTWORK" ORGANIZATION IS SORT ACCESS MODE IS SEQUENTIAL.
Configuring adequate sort memory helps maintain performance when preserving duplicate order.
Efficient Duplicate Processing
Minimizing overhead in duplicate handling:
12345678910111213141516171819202122WORKING-STORAGE SECTION. 01 WS-SORT-STATISTICS. 05 WS-RECORDS-READ PIC 9(8) VALUE ZERO. 05 WS-RECORDS-WRITTEN PIC 9(8) VALUE ZERO. 05 WS-DUPLICATE-SETS PIC 9(6) VALUE ZERO. EFFICIENT-DUPLICATE-PROC. PERFORM UNTIL EOF-SORT = "Y" RETURN SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END ADD 1 TO WS-RECORDS-READ IF CURRENT-KEY = PREVIOUS-KEY * Process duplicate efficiently PERFORM HANDLE-DUPLICATE-INLINE ELSE * New key group PERFORM START-NEW-KEY-GROUP END-IF END-RETURN END-PERFORM.
This approach minimizes comparisons and processing overhead for duplicate detection.
Hands-on Exercise
Exercise: Customer Order Processing
Create a program that sorts customer orders while preserving the chronological order of orders placed on the same day.
Requirements:
- Sort by customer ID and order date
- Preserve order sequence for same-day orders
- Count and report duplicate order scenarios
- Generate summary statistics
View Solution
12345678910111213141516171819202122232425262728293031SORT ORDER-SORT-FILE ON ASCENDING KEY SORT-CUST-ID ON ASCENDING KEY SORT-ORDER-DATE WITH DUPLICATES IN ORDER INPUT PROCEDURE IS READ-ORDERS OUTPUT PROCEDURE IS PROCESS-SORTED-ORDERS. READ-ORDERS. OPEN INPUT ORDER-INPUT-FILE PERFORM UNTIL EOF-INPUT = "Y" READ ORDER-INPUT-FILE AT END MOVE "Y" TO EOF-INPUT NOT AT END MOVE ORDER-RECORD TO SORT-RECORD RELEASE SORT-RECORD END-READ END-PERFORM CLOSE ORDER-INPUT-FILE. PROCESS-SORTED-ORDERS. OPEN OUTPUT SORTED-ORDER-FILE MOVE "N" TO WS-FIRST-RECORD PERFORM UNTIL EOF-SORT = "Y" RETURN ORDER-SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END PERFORM CHECK-SAME-DAY-ORDERS WRITE SORTED-ORDER-RECORD FROM SORT-RECORD END-RETURN END-PERFORM CLOSE SORTED-ORDER-FILE.
Quiz
Test Your Knowledge
1. What does WITH DUPLICATES IN ORDER guarantee?
2. Can the DUPLICATES clause be used with MERGE operations?
3. How are records considered duplicates in multi-key sorts?
View Answers
1. Preserves relative order of records with identical keys - The DUPLICATES clause maintains stable sorting behavior.
2. Yes, it works the same way - DUPLICATES can be used with both SORT and MERGE operations.
3. When ALL specified keys have identical values - Records are duplicates only when all sort keys match exactly.