Data masking means hiding or obscuring sensitive data—such as account numbers, names, or Social Security numbers—so that the output file can be used for testing, reporting, or sharing without exposing real values. In DFSORT you can mask data using INREC or OUTREC with the OVERLAY parameter: replace specific positions with a constant (e.g. asterisks, spaces, or X). You can mask entire fields, partially mask (show first few characters, mask the rest), or use conditional masking with IFTHEN so only certain records or record types get masked. FINDREP can replace patterns (e.g. digits with X) but can change record length; OVERLAY at fixed positions keeps the record layout stable. This page explains how to mask with OVERLAY, when to use INREC vs OUTREC, partial and conditional masking, and practical examples.
In batch and reporting jobs, you often need to produce files that look like production data but must not contain real sensitive values. Masking replaces those values with harmless characters—asterisks (*), spaces, or a single character like X—so that structure and length are preserved but the content is hidden. For example, a 16-digit account number might appear as ************1234 (last four visible) or **************** (fully masked). DFSORT does not have a dedicated "mask" verb; you achieve masking by overlaying the sensitive positions with a constant using OVERLAY in INREC or OUTREC.
The OVERLAY parameter lets you modify specific portions of a record without rebuilding the whole record. For masking, you overlay the sensitive field with a constant. Syntax can be position:constant—replace starting at position with the literal—or position:input_start,length,constant when you are copying from input and then overlaying (in some usages the constant is specified for the overlay). A common form is to overlay a range with a repeated character: e.g. OVERLAY=(30:30,10,C\'**********\')—replace the 10 bytes at output position 30 with 10 asterisks. The exact syntax may vary by product; check your manual. Typical forms:
| Syntax | Meaning | Example |
|---|---|---|
| pos:C'...' | Replace starting at pos with the character constant (e.g. asterisks, spaces) | 30:C'**********' masks 10 bytes at 30 |
| pos:start,len,C'...' | Overlay len bytes at output pos with the constant; start,len refer to input when copying | 25:25,16,C'****************' masks input 25–40 at output 25 |
| X'...' | Hexadecimal constant (e.g. X'00' for binary zeros) | 40:X'40404040' overlays 4 bytes with EBCDIC spaces |
Example: mask the 20-byte customer name at positions 21–40 with asterisks. The rest of the record stays unchanged.
12OPTION COPY OUTREC OVERLAY=(21:21,20,C'********************')
After this, every record has positions 21–40 filled with asterisks; the original name is no longer in the output. Record length is unchanged.
INREC runs before the sort and before INCLUDE/OMIT. So if you mask in INREC, the sort key and any INCLUDE or OMIT conditions see the masked record. Use INREC when you want the sort or filter to operate on masked data (e.g. so that sort order or selected records do not depend on the real sensitive value). OUTREC runs after the sort; it only affects what is written to SORTOUT. Use OUTREC when you need to keep the real data for sorting and filtering but want the output file to show masked values. In that case, the sort sees the real data; only the written record is masked.
Often you need to show only part of a field—e.g. last four digits of an account number—and mask the rest. With OVERLAY you overwrite only the bytes you want to hide. For example, account number in positions 1–16: show positions 1–4, mask 5–16. First ensure the record is built or copied (e.g. OPTION COPY or BUILD/FIELDS), then overlay only 5–16:
12OPTION COPY OUTREC OVERLAY=(5:5,12,C'************')
Positions 1–4 are unchanged; positions 5–16 become asterisks. So you get a partial mask without changing record length or shifting columns.
You can list multiple overlay items in one OVERLAY to mask several fields. For example mask name at 21–40 and SSN at 50–58:
12OUTREC OVERLAY=(21:21,20,C'********************', 50:50,9,C'*********')
Each item is applied in order. Non-contiguous ranges (e.g. mask 10–15 and 30–35) are done by specifying each range in the OVERLAY list. The record layout stays fixed; only the specified positions change.
Common choices and their effect:
The length of the constant must match the number of bytes you are overlaying when you use the form that specifies length (e.g. 20 bytes of asterisks for a 20-byte field). Some products allow a 1-byte constant to be repeated for a given length; check your manual.
You may want to mask only for certain record types or when a field has a value. Use IFTHEN with WHEN=(logical expression) and OVERLAY in that branch. For example: mask the 20-byte name at 30 only when record type (byte 1) is 'D' (detail):
123OUTREC IFTHEN WHEN=(1,1,CH,EQ,C'D'), OVERLAY=(30:30,20,C'********************'), WHEN=NONE,OVERLAY=(1:1,1,C' ')
Records with byte 1 = 'D' get the name masked; others (WHEN=NONE) get a single space at 1 (or you can omit OVERLAY in NONE to leave them unchanged). You can have multiple WHEN=(...) clauses for different conditions, each with its own OVERLAY.
FINDREP (find and replace) searches for a string and replaces it with another. For example you could try to replace every digit 0–9 with 'X' to mask numbers. A drawback is that FINDREP operates on the record as a whole and can change record length if the replacement string length differs from the find string, which shifts column positions. For stable, field-level masking, OVERLAY at fixed positions is preferred. Use FINDREP when you need pattern-based replacement and can control or accept length/position changes (e.g. within a fixed-length field only, if your product supports limiting FINDREP to a range).
For variable-length (VB) records, do not overlay bytes 1–4 (the RDW—Record Descriptor Word). Overlay only data positions (e.g. 5 onward). The RDW must remain valid so the system can interpret record length correctly.
Imagine you have a piece of paper with a secret word on it. You don't want to tear the paper or change its size—you just want to cover the word so nobody can read it. So you take a sticker (like a row of stars) and paste it right over the word. The paper is the same size; the word is just hidden. Data masking in DFSORT is like that: we put a "sticker" (a constant like asterisks) over the part of the record we want to hide. OVERLAY is the way we say "at this position, put these characters instead of what was there." We can cover the whole field or only part of it, and we can choose to cover only some lines (records) using IFTHEN.
1. What is the main DFSORT mechanism for masking a field by replacing its bytes with a constant?
2. When should you use INREC for masking instead of OUTREC?
3. How do you partially mask a field (e.g. show first 4 characters, mask the rest)?
4. What is a drawback of using FINDREP to replace digits with a character for masking?
5. Can you conditionally mask a field (e.g. mask only when record type is detail)?