The index set is the upper level (or levels) of the index component in a Key-Sequenced Data Set (KSDS). It sits above the sequence set. The sequence set has one entry per data control interval and points directly to the data component. The index set does not point to data; it contains separator keys and pointers that direct the search to the correct part of the sequence set (or to a lower index set level). So when you do a random read by key, VSAM starts at the top of the index set, compares the search key to the separators, follows the right pointer down, and repeats until it reaches the sequence set. The sequence set then gives the pointer to the data CI. This multi-level structure keeps the number of index reads small even when there are many data CIs. This page explains what the index set is, how it is organized (B-tree style), how many levels it can have, and how VSAM uses it for random access.
The index set is the part of the KSDS index that is above the sequence set. The index component as a whole has two kinds of levels: the sequence set (bottom) and the index set (top). The index set can have one or more levels. Each index set record (stored in an index CI) contains separator keys and pointers. A separator key is a value that divides the key space—for example, “all keys less than this go left, all keys greater or equal go right.” Each pointer points to an index CI at the next level down: either another index set level or the sequence set. So the index set is a tree: you start at the root (top index set record), compare your search key to the separators, follow one pointer down, and repeat until you reach the sequence set. The sequence set then gives the pointer to the data CI that might contain your key. So the index set’s job is to reduce the search: instead of scanning every sequence set entry, you do a few index set lookups and then one sequence set lookup.
If the sequence set had only a few entries (e.g. 10 or 20 data CIs), you could scan the sequence set to find the right CI without an index set. But when the sequence set has hundreds or thousands of entries, scanning would be slow. The index set groups the sequence set entries (or groups of them) and stores a separator and a pointer for each group. So one index set record might say: “keys A–M are in this branch, keys N–Z in that branch.” With one read you eliminate half (or more) of the sequence set. Another level of index set can subdivide further. So with two or three index reads you can narrow down to the one sequence set entry (and thus the one data CI) that might contain your key. That is why the index set exists: to make random key lookup efficient when there are many data CIs.
| Level | Content |
|---|---|
| Top index set | One index record (or few). Contains separator keys and pointers to the next level (lower index set or sequence set). |
| Lower index set (if any) | More index records. Each has separators and pointers to the next level down. Repeats until the sequence set is reached. |
| Sequence set | One entry per data CI; high key and pointer to data CI. The index set does not point to data; it points to the sequence set. |
The number of index set levels depends on how many sequence set entries (and thus data CIs) you have and how many entries fit in each index CI. A very small KSDS might have only the sequence set—no index set, or a single “root” that is just the sequence set. As the cluster grows, VSAM adds index set levels. Typically there are one to three index set levels. The top level (root) has one index record (one index CI) in the smallest case; that record points to the sequence set. When the sequence set grows too large to be reached in one step, another level is added: the root then points to several index CIs at the next level, each of which points to a range of sequence set entries. So the structure is like a B-tree: the root splits the key space, each child splits a subset, and the leaves (sequence set) point to the data. The exact number of levels is managed by VSAM; you do not specify it. You only define the cluster; VSAM builds and maintains the index set as the data grows.
Each index set record (in an index CI) contains multiple entries. Each entry typically has a separator key (or high key) and a pointer. The separator indicates the boundary: “keys in this range are in the branch pointed to by this pointer.” So when VSAM is searching for key K, it looks at the separators in the current index record, finds the entry whose range includes K (e.g. K is less than or equal to the separator, or between this separator and the next), and follows that entry’s pointer to the next level. The pointer is usually an RBA or similar—the location of the index CI (or sequence set CI) at the next level. So the index set is just “separators + pointers”; no data records are stored there. All the actual key and record data are in the data component; the index set is only for navigation.
The index component (index set + sequence set) is often described as a B-tree or B+-tree. In a B-tree, internal nodes (the index set) hold separators and pointers; the “leaves” (the sequence set) hold the final pointers to the data. The tree is balanced so that every path from the root to a leaf has roughly the same length. So the number of I/Os to find a key is proportional to the height of the tree—usually two, three, or four reads for the index, plus one read for the data CI. That is why random access stays fast even with millions of records: you do not scan the file; you follow a short path down the index. The index set is what makes that path short by grouping the sequence set into a hierarchy.
When your program does a READ with a key, VSAM does the following. It starts with the root of the index (the top index set record). Often this root is cached in memory. It compares the search key to the separators in the root and chooses the pointer that corresponds to the range containing the key. It follows that pointer to the next level (another index set CI or the sequence set). If it is an index set CI, it repeats: compare key to separators, follow pointer. When it finally reaches the sequence set, it finds the sequence set entry whose key range includes the search key. That entry’s pointer is the RBA of the data CI. VSAM then reads that data CI (if not already in a buffer), searches within the CI for the record with the exact key, and returns the record (or “not found”). So the index set is used only in this “top-down” traversal; the sequence set is the final step that points to the data.
For sequential access (e.g. READ NEXT in key order), VSAM typically uses the sequence set, which is already in key order. The application might START at a key and then READ NEXT; VSAM finds the position in the sequence set (using the index set if needed for the START) and then walks the sequence set forward, reading each data CI in turn. So the index set is used for the initial positioning (finding the starting sequence set entry); after that, sequential read follows the sequence set order and may not need to go back to the index set. So the index set is most important for random access; for sequential access the sequence set does most of the work.
As you add records to the KSDS, the data component may gain new control intervals (CI splits) and the sequence set gains new entries. When the sequence set or an index set level gets too large, VSAM may split index CIs or add a new index set level. This is all done automatically. You do not define the number of index set levels or the size of index CIs (in many IDCAMS implementations the index CI size has a default or is derived). VSAM keeps the tree balanced so that search remains efficient. So from an application perspective you just use the cluster; the index set (and sequence set) are maintained by the access method.
The sequence set has one entry per data CI and points directly to data CIs. The index set has one or more levels above the sequence set and points only to the sequence set (or to lower index set levels). The index set reduces the search space so that finding the right sequence set entry takes only a few index reads. Together they form the index component that makes key-based access efficient. Only KSDS has this structure; ESDS, RRDS, and LDS do not have an index component.
Think of the index set as the “table of contents” at the top of a big book. You don’t read every page to find a topic; you look at the table of contents (index set), which says “chapter 1 is here, chapter 2 is there,” and you jump to the right chapter. The “chapters” are like the sequence set: each one points to the actual pages (data CIs). So the index set is the quick way to get to the right part of the list; the sequence set is the list that points to the real data.
1. What does the index set point to?
2. What is the main purpose of the index set?
3. Can a KSDS have no index set level?