When you insert a record into a KSDS (or add data that expands a record), VSAM must place it in the correct control interval in key order. If that CI has no free space left, VSAM performs a control interval split: it moves about half of the records from the full CI to another CI in the same control area, then inserts the new record. If there is no free CI in the control area, VSAM must perform a control area split: allocate a new CA, move half the CIs from the full CA to the new one, and update the index. Splits cost extra I/O and can hurt performance. Understanding when splits happen and how FREESPACE reduces them helps you size and tune VSAM files. This page explains CI splits, CA splits, when they occur, and how to minimize them.
Splits occur only as a result of inserting new records or of updating an existing record so that it becomes longer. They apply to key-sequenced data sets (KSDS) and, in practice, to RRDS when you are filling slots. In a KSDS, every insert must go into the correct CI in key order. When that CI is full (no free space left from FREESPACE), VSAM cannot simply append the record—it must create room. It does that by splitting the CI. So the trigger is always: "need to put a record (or more data) in a CI that has no room." Deletes and reads do not cause splits. Sequential reads do not cause splits. Only inserts and record expansion do.
Entry-sequenced data sets (ESDS) do not have splits in this sense. Records are appended at the end of the file; there is no "insert in the middle" by key. So ESDS never performs CI or CA splits. Linear data sets (LDS) have no record structure, so there are no VSAM record inserts and no splits.
A CI split happens when the control interval where the new record belongs is full. VSAM finds a free CI in the same control area (one that was reserved by the FREESPACE CA percentage). It then moves approximately half of the records from the full CI to that free CI, in key order. After the move, the original CI has room for the new record, and VSAM inserts it. The sequence set (and possibly the index set) must be updated so that the index points to both CIs with the correct key ranges. So one insert that triggers a CI split results in several I/O operations: read the full CI, write the full CI (after moving some records out), write the other CI (with the moved records), and update the index. That is why CI splits are expensive and why FREESPACE in the CI is important: it gives room for many inserts before a split is needed.
Example: Suppose CI-1 in CA-1 holds records with keys 1, 2, 3, 4, 5 and has no free space left. You insert a record with key 4 (or a key that sorts between 3 and 5). VSAM must put it in CI-1 in key order, but CI-1 is full. So VSAM does a CI split. It might move records with keys 4 and 5 to a free CI (e.g. CI-2) in the same CA. Now CI-1 holds keys 1, 2, 3 and has room; CI-2 holds 4, 5 and the new record. The sequence set is updated to show the new key ranges for CI-1 and CI-2.
A CA split occurs when a CI split is needed but there is no free control interval in the current control area. All CIs in the CA are in use (either full of records or already used in prior splits). VSAM cannot do a CI split without a free CI. So it allocates a new control area (from the space allocated to the cluster). It then moves approximately half of the control intervals from the full CA to the new CA. After the move, the original CA has a free CI that can be used for the CI split. VSAM then performs the CI split as described above. The sequence set and index set must be updated to include the new CA and the new key distribution. A CA split is more expensive than a CI split because it involves moving many CIs (each of which may need read and write), allocating new space, and updating more index structure.
Example: CA-3 has CIs all in use; no free CI. You insert a record that belongs in CA-3. VSAM needs to do a CI split but there is no free CI in CA-3. So VSAM does a CA split first: allocate a new CA (e.g. CA-5), move half of the CIs from CA-3 to CA-5, update the index. Now CA-3 has free CIs. VSAM then does the CI split within CA-3 and inserts your record.
| Aspect | CI split | CA split |
|---|---|---|
| Trigger | Insert or lengthen record; no free space in the target CI | CI split needed but no free CI in the current CA |
| What moves | About half the records in the full CI move to another CI in the same CA | About half the CIs in the full CA move to a newly allocated CA |
| Index updated | Sequence set (and possibly index set) updated for the new CI | Sequence set and index set updated; new CA gets index entries |
| Cost | Multiple I/O: read full CI, write two CIs, update index | Higher: new CA allocation, move many CIs, more index updates |
When you define a cluster with FREESPACE(ci-percent ca-percent), you reserve space for future inserts. The CI percentage reserves that proportion of each control interval for new or expanded records. So when you insert, there is room in the CI for many records before it becomes full. The CA percentage reserves that proportion of each control area as free CIs. So when a CI does become full and a CI split is needed, there is an empty CI in the same CA to receive the records that are moved. If you use FREESPACE(0 0), every CI is filled to capacity at load time, and the first insert in the middle of the file may trigger a CI split immediately; if there are no free CIs, you get a CA split. For files that will grow after load, typical values are FREESPACE(20 10) or (25 15)—enough to absorb a lot of inserts before splits occur.
FREESPACE does not eliminate splits forever. As you keep inserting, the free space is consumed. Once it is gone, the next insert that would go in a full CI will cause a split. So FREESPACE delays and reduces splits; it does not remove the need for them if the file keeps growing. For very dynamic files, you may also need to reorganize periodically (REPRO to a new cluster with the same or higher FREESPACE) to regain free space.
Each CI split costs multiple I/Os: reading the full CI, writing two CIs (the split CI and the CI that receives the moved records), and updating the index. A CA split costs more: reading and writing many CIs, allocating the new CA, and updating the index at multiple levels. So applications that do many inserts (especially random inserts in key order) can see noticeable slowdowns if splits are frequent. Monitoring split activity (e.g. via LISTCAT or SMF) and tuning FREESPACE and load order can help. For batch loads, loading in key order and using adequate FREESPACE for later online inserts is a common approach.
Imagine a row of boxes (CIs) in a shelf (CA). When you add a new toy (record) to a box that is full, you have to take half the toys out and put them in an empty box on the same shelf—that's a CI split. If the shelf has no empty box, you get a new shelf, move half the boxes there, then put half the toys from the full box into one of the empty boxes—that's a CA split. Leaving some space in each box and some empty boxes on each shelf (FREESPACE) means you can add lots of toys before you have to do that work.
1. When does a CI split occur?
2. Which VSAM type has CI and CA splits?
3. How does FREESPACE help avoid splits?