VSAM Cluster Concept

In VSAM, the cluster is the single logical dataset you work with. When you allocate a VSAM file in JCL or open it in a program, you use the cluster name. Under the hood, that cluster is made up of one or two components—the data component and, for Key Sequenced Data Sets (KSDS), the index component. Understanding the cluster concept is essential: it defines what a VSAM "file" is, how attributes are applied at the cluster and component level, and how the cluster is created, used, and deleted over its lifecycle. This page explains the VSAM cluster concept in depth: definition, attributes, components, and lifecycle.

What Is a VSAM Cluster?

A VSAM cluster is the logical unit that represents a VSAM dataset. It has one name (e.g. USERID.CUSTOMER.VSAM.KSDS) that appears in the catalog and that you use in JCL as DSN= and in application programs when you open the file. The cluster is not a single physical file on disk in the way a sequential dataset might be one dataset with one extent. Instead, the cluster is a catalog object that ties together one or two components: the data component (which holds the actual records or, for Linear Data Sets, the byte stream) and, for KSDS only, the index component (which holds the B-tree style index that maps key values to control intervals in the data component).

So when someone says "VSAM file" or "VSAM dataset," they usually mean the cluster. You never open the data component or the index component directly; you open the cluster. The access method then uses the catalog to find where the data and index components reside and performs I/O on them. This separation lets VSAM support different organization types (INDEXED for KSDS, NONINDEXED for ESDS, NUMBERED for RRDS, LINEAR for LDS) with a consistent model: one cluster name, one or two components.

Cluster vs Data vs Index

It is important to keep the three ideas distinct. The cluster is the logical dataset. The data component is the physical dataset that holds the records (or the byte stream for LDS). The index component exists only for KSDS and holds the index structure (sequence set and index set) that allows key-based access. In the catalog you may see three entries: one for the cluster and one for each component. In JCL and in programs you ever only reference the cluster name. If you look at LISTCAT output, you will see the cluster and under it the data (and for KSDS the index) component entries. Component names are often derived from the cluster name—e.g. USERID.CUSTOMER.VSAM.KSDS.DATA and USERID.CUSTOMER.VSAM.KSDS.INDEX—but you do not put those component names in your DD statements.

Cluster vs components
LevelTypical parameters (summary)
ClusterNAME, VOLUMES, BLOCKS/TRACKS/CYLINDERS, INDEXED/NONINDEXED/NUMBERED/LINEAR, RECORDSIZE, KEYS (KSDS), FREESPACE, SHAREOPTIONS, REUSE, READPW/UPDATEPW, FOR/TO, CATALOG
DataNAME (data component name), RECORDSIZE, FREESPACE, CISZ, VOLUMES, etc.
IndexNAME (index component name), VOLUMES, CISZ; used only for KSDS

How a Cluster Is Defined

Clusters are created with the IDCAMS utility and the DEFINE CLUSTER command. You run a batch job with EXEC PGM=IDCAMS, supply SYSPRINT for messages, and pass the commands in SYSIN. DEFINE CLUSTER has parameters at three levels: cluster-level (apply to the whole cluster), data component level (apply only to the data component), and index component level (apply only to the index, KSDS only). Parameters at the cluster level include the cluster name (NAME), organization type (INDEXED, NONINDEXED, NUMBERED, LINEAR), record size (RECORDSIZE), for KSDS the key (KEYS), free space (FREESPACE), space allocation (CYLINDERS, TRACKS, or RECORDS), share options (SHAREOPTIONS), and optional security and retention (READPW, UPDATEPW, FOR, TO, REUSE, CATALOG). Parameters under the DATA clause apply to the data component (e.g. the data component name); parameters under the INDEX clause apply to the index component. If you omit the DATA and INDEX subparameters, IDCAMS generates component names from the cluster name.

Example: Defining a KSDS Cluster

jcl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//DEFCLU EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DEFINE CLUSTER ( - NAME(USERID.CUSTOMER.KSDS) - INDEXED - RECORDSIZE(200 300) - KEYS(10 0) - FREESPACE(10 5) - SHAREOPTIONS(2 3) - CYLINDERS(10 5) - VOLUMES(SYSVOL)) - DATA (NAME(USERID.CUSTOMER.KSDS.DATA)) - INDEX (NAME(USERID.CUSTOMER.KSDS.INDEX)) /*

This creates a cluster named USERID.CUSTOMER.KSDS. INDEXED means it is a KSDS, so it will have both a data and an index component. RECORDSIZE(200 300) sets average and maximum record length. KEYS(10 0) defines a 10-byte key at offset 0. FREESPACE(10 5) reserves 10% of each control interval and 5% of each control area for inserts. SHAREOPTIONS(2 3) allows cross-region and cross-system sharing. CYLINDERS(10 5) allocates 10 primary and 5 secondary cylinders. The DATA and INDEX clauses give explicit names to the components; if omitted, IDCAMS would generate names such as USERID.CUSTOMER.KSDS.DATA and USERID.CUSTOMER.KSDS.INDEX. After a successful run, the cluster is cataloged and ready to use. In JCL you reference DSN=USERID.CUSTOMER.KSDS, not the component names.

Cluster Attributes in Practice

Cluster-level attributes define the kind of dataset (INDEXED, NONINDEXED, NUMBERED, LINEAR), how big records are (RECORDSIZE), and for KSDS how the key is defined (KEYS length and offset). RECORDSIZE and KEYS are fixed at define time for most uses; you cannot change them later with ALTER. FREESPACE affects how much room is left in each control interval and control area for future inserts; it reduces how often VSAM must split CIs and CAs. SHAREOPTIONS control whether multiple jobs or regions can open the cluster and for what (read-only vs update). REUSE allows the cluster to be opened for output and reset to empty without deleting and redefining it. READPW and UPDATEPW add password protection. FOR and TO set retention so the dataset is not automatically scratched before a certain time. CATALOG specifies which catalog will hold the cluster entry; if omitted, the job's catalog (STEPCAT/JOBCAT or the default) is used.

Data component attributes include the data component name and any overrides (e.g. CISZ, FREESPACE) that apply only to the data component. Index component attributes apply only to the index (e.g. index component name, CISZ for index CIs). In many DEFINEs you specify only cluster-level parameters and let IDCAMS default or generate the rest; for KSDS you often explicitly name the DATA and INDEX components for clarity in LISTCAT and in documentation.

Lifecycle of a Cluster

The cluster has a clear lifecycle: define, use, optionally alter, and delete. The following table summarizes the phases.

Cluster lifecycle
PhaseAction
DefineIDCAMS DEFINE CLUSTER creates the cluster and components, allocates space, and catalogs the object.
UsePrograms open the cluster by name (JCL DSN=). VSAM uses the catalog to find components and performs I/O on data (and index for KSDS).
AlterIDCAMS ALTER changes attributes (e.g. SHAREOPTIONS, FREESPACE). Some attributes cannot be changed after creation (e.g. RECORDSIZE, KEYS).
DeleteIDCAMS DELETE name CLUSTER removes the cluster and components from the catalog and frees space. Use PURGE if the dataset has not expired.

Define is a one-time (or occasional) step: you run IDCAMS DEFINE CLUSTER and the cluster and its components are created and cataloged. Use is ongoing: every job or program that needs the file allocates or opens the cluster by name. Alter is used when you need to change attributes such as SHAREOPTIONS or FREESPACE; not all attributes are alterable. Delete removes the cluster and its components from the catalog and frees the space; if the dataset has a retention date and has not yet expired, you must use the PURGE option on the DELETE command to force deletion.

Deleting a Cluster

To remove a VSAM dataset you delete the cluster with IDCAMS. The command is DELETE cluster-name CLUSTER. This removes the cluster and its components from the catalog and frees the space on the volumes. If the cluster has a retention period (FOR/TO) and has not yet expired, the delete will fail unless you specify PURGE, which overrides the retention and deletes anyway. Other options include ERASE (overwrite the data before freeing) and FORCE (delete even if the catalog or space is in a special state). In normal use you run DELETE with the cluster name and CLUSTER; PURGE is added when the dataset is not expired.

jcl
1
2
3
4
5
6
//DELCLU EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DELETE USERID.CUSTOMER.KSDS CLUSTER PURGE /*

After a successful DELETE, the cluster name is no longer in the catalog. Any job that tries to allocate that dataset name will get a "dataset not found" type condition. To recreate the file you would run DEFINE CLUSTER again (and optionally load data with REPRO).

Attributes You Cannot Change After Define

Some cluster attributes are fixed at define time and cannot be changed with ALTER. RECORDSIZE (average and maximum) is fixed for the data component. For KSDS, KEYS (length and offset) is fixed. The organization type (INDEXED, NONINDEXED, NUMBERED, LINEAR) cannot be changed; to get a different type you must define a new cluster and move the data. Control interval size (CISZ) is also set at define time for the components. If you need different record sizes or key definitions, you define a new cluster with the desired attributes and use REPRO or an application to copy the data. Attributes such as SHAREOPTIONS, FREESPACE (in some implementations), READPW, UPDATEPW, and REUSE can often be altered.

Why the Cluster Matters for Applications

From an application perspective, the cluster is the only handle you have. Your COBOL SELECT assigns the cluster (via the DD name that points to it). Your JCL has one DD with DSN=cluster.name. You do not care how many components there are or what their internal names are. The access method and the catalog take care of resolving the cluster name to the right components and volumes. So when you design or document a VSAM file, you document the cluster name and its attributes (type, key, record size, etc.); the component structure is an implementation detail of how VSAM stores the data and index.

Key Takeaways

  • The cluster is the single logical VSAM dataset; you always use the cluster name in JCL and in programs.
  • KSDS has two components (data and index); ESDS, RRDS, and LDS have only a data component.
  • DEFINE CLUSTER creates the cluster and components; parameters can be specified at cluster, data, and index level.
  • Lifecycle: define (IDCAMS DEFINE CLUSTER), use (open by cluster name), alter (IDCAMS ALTER when allowed), delete (IDCAMS DELETE name CLUSTER).
  • RECORDSIZE and KEYS are fixed at define time; to change them you must define a new cluster and reload data.

Explain Like I'm Five

Think of the cluster as the name on the front of a filing cabinet. When you want to use the cabinet, you use that name. Inside there might be one drawer (just data) or two drawers (data and an index that helps find things by key). You never open "drawer one" or "drawer two" by their own names—you always ask for the cabinet by its name. The cluster is that cabinet name: one name, one place to go, and the system knows which drawers to use behind the scenes.

Test Your Knowledge

Test Your Knowledge

1. What do you reference in JCL when accessing a VSAM dataset?

  • The data component name
  • The index component name
  • The cluster name
  • The catalog name

2. Which VSAM type has two components (data and index)?

  • ESDS
  • RRDS
  • LDS
  • KSDS

3. How do you create a VSAM cluster?

  • JCL DISP=NEW with SPACE=
  • IDCAMS DEFINE CLUSTER
  • ISPF 3.2
  • IEBGENER
Published
Updated
Read time4 min
AuthorMainframeMaster
Reviewed by MainframeMaster teamVerified: IBM z/OS 2.5 documentationSources: IBM DFSMS Access Method Services, z/OS VSAM documentationApplies to: z/OS 2.5