MainframeMaster

Data Masking in DFSORT

Data masking means hiding or obscuring sensitive data—such as account numbers, names, or Social Security numbers—so that the output file can be used for testing, reporting, or sharing without exposing real values. In DFSORT you can mask data using INREC or OUTREC with the OVERLAY parameter: replace specific positions with a constant (e.g. asterisks, spaces, or X). You can mask entire fields, partially mask (show first few characters, mask the rest), or use conditional masking with IFTHEN so only certain records or record types get masked. FINDREP can replace patterns (e.g. digits with X) but can change record length; OVERLAY at fixed positions keeps the record layout stable. This page explains how to mask with OVERLAY, when to use INREC vs OUTREC, partial and conditional masking, and practical examples.

INREC Processing
Progress0 of 0 lessons

What Is Data Masking?

In batch and reporting jobs, you often need to produce files that look like production data but must not contain real sensitive values. Masking replaces those values with harmless characters—asterisks (*), spaces, or a single character like X—so that structure and length are preserved but the content is hidden. For example, a 16-digit account number might appear as ************1234 (last four visible) or **************** (fully masked). DFSORT does not have a dedicated "mask" verb; you achieve masking by overlaying the sensitive positions with a constant using OVERLAY in INREC or OUTREC.

OVERLAY: Replacing Positions with a Constant

The OVERLAY parameter lets you modify specific portions of a record without rebuilding the whole record. For masking, you overlay the sensitive field with a constant. Syntax can be position:constant—replace starting at position with the literal—or position:input_start,length,constant when you are copying from input and then overlaying (in some usages the constant is specified for the overlay). A common form is to overlay a range with a repeated character: e.g. OVERLAY=(30:30,10,C\'**********\')—replace the 10 bytes at output position 30 with 10 asterisks. The exact syntax may vary by product; check your manual. Typical forms:

OVERLAY forms used for masking
SyntaxMeaningExample
pos:C'...'Replace starting at pos with the character constant (e.g. asterisks, spaces)30:C'**********' masks 10 bytes at 30
pos:start,len,C'...'Overlay len bytes at output pos with the constant; start,len refer to input when copying25:25,16,C'****************' masks input 25–40 at output 25
X'...'Hexadecimal constant (e.g. X'00' for binary zeros)40:X'40404040' overlays 4 bytes with EBCDIC spaces

Example: mask the 20-byte customer name at positions 21–40 with asterisks. The rest of the record stays unchanged.

text
1
2
OPTION COPY OUTREC OVERLAY=(21:21,20,C'********************')

After this, every record has positions 21–40 filled with asterisks; the original name is no longer in the output. Record length is unchanged.

INREC vs OUTREC for Masking

INREC runs before the sort and before INCLUDE/OMIT. So if you mask in INREC, the sort key and any INCLUDE or OMIT conditions see the masked record. Use INREC when you want the sort or filter to operate on masked data (e.g. so that sort order or selected records do not depend on the real sensitive value). OUTREC runs after the sort; it only affects what is written to SORTOUT. Use OUTREC when you need to keep the real data for sorting and filtering but want the output file to show masked values. In that case, the sort sees the real data; only the written record is masked.

Partial Masking (Show First N, Mask the Rest)

Often you need to show only part of a field—e.g. last four digits of an account number—and mask the rest. With OVERLAY you overwrite only the bytes you want to hide. For example, account number in positions 1–16: show positions 1–4, mask 5–16. First ensure the record is built or copied (e.g. OPTION COPY or BUILD/FIELDS), then overlay only 5–16:

text
1
2
OPTION COPY OUTREC OVERLAY=(5:5,12,C'************')

Positions 1–4 are unchanged; positions 5–16 become asterisks. So you get a partial mask without changing record length or shifting columns.

Multiple Fields and Non-Contiguous Ranges

You can list multiple overlay items in one OVERLAY to mask several fields. For example mask name at 21–40 and SSN at 50–58:

text
1
2
OUTREC OVERLAY=(21:21,20,C'********************', 50:50,9,C'*********')

Each item is applied in order. Non-contiguous ranges (e.g. mask 10–15 and 30–35) are done by specifying each range in the OVERLAY list. The record layout stays fixed; only the specified positions change.

Choice of Mask Character

Common choices and their effect:

  • C\'*\' or C\'**********\' — Asterisks. Clearly indicate "masked" and are visible in reports. Use when you want it obvious that data was hidden.
  • C\' \' (spaces) — Blanks. Field looks empty. Use when you want the field to appear blank rather than showing a pattern.
  • C\'X\' — Single character repeated (e.g. XXXXX). Some shops use X to mean "redacted."
  • X\'00\' or X\'40\' — Binary zero or EBCDIC space (X\'40\'). Use when you need a specific hex value for downstream compatibility.

The length of the constant must match the number of bytes you are overlaying when you use the form that specifies length (e.g. 20 bytes of asterisks for a 20-byte field). Some products allow a 1-byte constant to be repeated for a given length; check your manual.

Conditional Masking with IFTHEN

You may want to mask only for certain record types or when a field has a value. Use IFTHEN with WHEN=(logical expression) and OVERLAY in that branch. For example: mask the 20-byte name at 30 only when record type (byte 1) is 'D' (detail):

text
1
2
3
OUTREC IFTHEN WHEN=(1,1,CH,EQ,C'D'), OVERLAY=(30:30,20,C'********************'), WHEN=NONE,OVERLAY=(1:1,1,C' ')

Records with byte 1 = 'D' get the name masked; others (WHEN=NONE) get a single space at 1 (or you can omit OVERLAY in NONE to leave them unchanged). You can have multiple WHEN=(...) clauses for different conditions, each with its own OVERLAY.

FINDREP and Pattern-Based Replacement

FINDREP (find and replace) searches for a string and replaces it with another. For example you could try to replace every digit 0–9 with 'X' to mask numbers. A drawback is that FINDREP operates on the record as a whole and can change record length if the replacement string length differs from the find string, which shifts column positions. For stable, field-level masking, OVERLAY at fixed positions is preferred. Use FINDREP when you need pattern-based replacement and can control or accept length/position changes (e.g. within a fixed-length field only, if your product supports limiting FINDREP to a range).

Variable-Length Records

For variable-length (VB) records, do not overlay bytes 1–4 (the RDW—Record Descriptor Word). Overlay only data positions (e.g. 5 onward). The RDW must remain valid so the system can interpret record length correctly.

Explain It Like I'm Five

Imagine you have a piece of paper with a secret word on it. You don't want to tear the paper or change its size—you just want to cover the word so nobody can read it. So you take a sticker (like a row of stars) and paste it right over the word. The paper is the same size; the word is just hidden. Data masking in DFSORT is like that: we put a "sticker" (a constant like asterisks) over the part of the record we want to hide. OVERLAY is the way we say "at this position, put these characters instead of what was there." We can cover the whole field or only part of it, and we can choose to cover only some lines (records) using IFTHEN.

Exercises

  1. Write OUTREC OVERLAY control to mask the 11-byte field at positions 45–55 with asterisks. Assume OPTION COPY is used.
  2. When would you use INREC for masking instead of OUTREC? Give a concrete scenario.
  3. How do you partially mask a 30-byte field at 10–39 so that only the first 6 bytes (10–15) are visible and the rest are asterisks?
  4. What is one advantage of OVERLAY over FINDREP for masking a fixed-length field?

Quiz

Test Your Knowledge

1. What is the main DFSORT mechanism for masking a field by replacing its bytes with a constant?

  • INCLUDE
  • OVERLAY with a constant (e.g. C'*****') at the field position
  • SUM
  • MERGE

2. When should you use INREC for masking instead of OUTREC?

  • Always use OUTREC
  • When the masked record must be used for sorting or INCLUDE/OMIT (INREC runs before the sort)
  • Only for MERGE
  • INREC cannot mask

3. How do you partially mask a field (e.g. show first 4 characters, mask the rest)?

  • You cannot
  • Use OVERLAY to replace only the positions you want to hide (e.g. overlay positions 35–50 with C'***************' while leaving 31–34 unchanged)
  • Use FINDREP only
  • Use INCLUDE

4. What is a drawback of using FINDREP to replace digits with a character for masking?

  • FINDREP does not exist
  • FINDREP can change record length if the replacement string length differs from the find string, shifting data
  • FINDREP only works in INREC
  • FINDREP cannot replace digits

5. Can you conditionally mask a field (e.g. mask only when record type is detail)?

  • No
  • Yes—use IFTHEN with WHEN=(condition) and OVERLAY in that branch so only records meeting the condition get the overlay
  • Only with INCLUDE
  • Only in OUTREC