Progress0 of 0 lessons

Dataset Types Explained

Mainframe datasets come in different types, each designed for specific purposes and access patterns. Understanding dataset types is essential for effective data management. This tutorial covers PS (Physical Sequential), PDS (Partitioned Data Set), PDSE (Partitioned Data Set Extended), VSAM (Virtual Storage Access Method), and GDG (Generation Data Group), explaining their characteristics, uses, and differences.

Each dataset type has unique properties that make it suitable for different scenarios. Understanding when to use each type helps you design efficient data structures and choose the right organization for your data.

Understanding Dataset Organization

Dataset organization (DSORG) determines how data is structured and accessed.

What is DSORG?

DSORG (Data Set Organization):

  • Specifies how data is organized in the dataset
  • Determines access methods available
  • Affects dataset capabilities
  • Is set when the dataset is created
  • Cannot be changed after creation

Common DSORG Values

Common DSORG values include:

  • PS: Physical Sequential
  • PO: Partitioned Organization (PDS)
  • PO-E: Partitioned Organization Extended (PDSE)
  • DA: Direct Access (VSAM)
  • Other specialized organizations

PS - Physical Sequential

PS (Physical Sequential) datasets are simple sequential files.

What is PS?

PS datasets:

  • Are simple sequential files
  • Store records one after another
  • Have no internal structure
  • Are accessed sequentially from start to end
  • Are the simplest dataset type

PS Characteristics

PS datasets have:

  • DSORG=PS
  • Single file structure
  • Sequential access only
  • No members or partitions
  • Simple structure

When to Use PS

Use PS for:

  • Simple data files
  • Report files
  • Log files
  • Data that is processed sequentially
  • Files that don't need internal structure

PS Example

Example PS dataset:

text
1
2
3
4
USERID.DATA.INPUT DSORG=PS RECFM=FB LRECL=80

This is a simple sequential data file.

PDS - Partitioned Data Set

PDS (Partitioned Data Set) contains multiple members in one dataset.

What is PDS?

PDS datasets:

  • Contain multiple members (individual files)
  • Have a directory structure
  • Each member is like a separate file
  • Are commonly used for source code libraries
  • Are the traditional partitioned format

PDS Characteristics

PDS datasets have:

  • DSORG=PO
  • Directory and data areas
  • Multiple members with names (up to 8 characters)
  • Fixed directory size
  • Requires compression to recover space

PDS Structure

PDS structure includes:

  • Directory: Contains member names and locations
  • Data Area: Contains actual member data
  • Directory has fixed size
  • Space management requires compression

When to Use PDS

Use PDS for:

  • Source code libraries (when PDSE not available)
  • Program libraries
  • Related files grouped together
  • Legacy compatibility requirements
  • When PDSE is not supported

PDS Example

Example PDS dataset:

text
1
2
3
4
5
USERID.SOURCE.COBOL DSORG=PO RECFM=FB LRECL=80 Members: PROG1, PROG2, PROG3, ...

This PDS contains multiple COBOL program members.

PDSE - Partitioned Data Set Extended

PDSE (Partitioned Data Set Extended) is an enhanced version of PDS.

What is PDSE?

PDSE datasets:

  • Are enhanced partitioned datasets
  • Contain multiple members like PDS
  • Offer better performance than PDS
  • Have automatic space management
  • Support more members than PDS
  • Have improved recovery capabilities

PDSE Characteristics

PDSE datasets have:

  • DSORG=PO-E
  • Dynamic directory growth
  • Automatic space recovery
  • Better performance
  • More members supported
  • Improved error recovery

PDSE Advantages

PDSE advantages over PDS:

  • Performance: Faster access and operations
  • Space Management: Automatic, no compression needed
  • Capacity: Supports more members
  • Recovery: Better error recovery
  • Efficiency: More efficient space usage

When to Use PDSE

Use PDSE for:

  • New source code libraries (preferred)
  • Program libraries
  • Any new partitioned dataset
  • When better performance is needed
  • When automatic space management is desired

PDSE Example

Example PDSE dataset:

text
1
2
3
4
5
USERID.SOURCE.COBOL DSORG=PO-E RECFM=FB LRECL=80 Members: PROG1, PROG2, PROG3, ... (many more supported)

This PDSE contains multiple COBOL program members with enhanced features.

PDS vs PDSE Comparison

Understanding the differences helps you choose the right type.

Key Differences

Comparison:

  • Performance: PDSE is faster
  • Space Management: PDSE is automatic, PDS requires compression
  • Capacity: PDSE supports more members
  • Directory: PDSE has dynamic directory, PDS has fixed
  • Recovery: PDSE has better recovery
  • Compatibility: PDS has wider compatibility

Choosing Between PDS and PDSE

Choose PDSE when:

  • Creating new datasets
  • Better performance is needed
  • Automatic space management is desired
  • Many members are expected

Choose PDS when:

  • Compatibility with older systems is required
  • Working with existing PDS datasets
  • PDSE is not available
  • Legacy requirements mandate PDS

VSAM - Virtual Storage Access Method

VSAM provides advanced data organization and access methods.

What is VSAM?

VSAM:

  • Stands for Virtual Storage Access Method
  • Provides indexed, direct, or sequential access
  • Offers advanced data management
  • Supports multiple access methods
  • Is more complex than PS or PDS

VSAM Types

VSAM dataset types include:

  • KSDS: Key Sequenced Data Set (indexed by key)
  • ESDS: Entry Sequenced Data Set (sequential with direct access)
  • RRDS: Relative Record Data Set (accessed by relative record number)
  • LDS: Linear Data Set (byte-addressable)

VSAM Characteristics

VSAM datasets have:

  • DSORG=DA (Direct Access)
  • Index structures for efficient access
  • Support for keys and alternate indexes
  • Advanced access methods
  • More complex structure

When to Use VSAM

Use VSAM for:

  • Databases and large data files
  • When indexed access is needed
  • When direct access by key is required
  • High-performance data access
  • Complex data structures

VSAM Example

Example VSAM dataset:

text
1
2
3
4
5
USERID.DATA.CUSTOMER DSORG=DA Type: KSDS Key: CUSTOMER-ID Indexed access by customer ID

This VSAM dataset provides indexed access to customer data.

GDG - Generation Data Group

GDG (Generation Data Group) organizes related datasets with generation numbers.

What is a GDG?

GDG:

  • Stands for Generation Data Group
  • Is a collection of related datasets
  • Shares a common base name
  • Uses generation numbers (.G0001V00, .G0002V00, etc.)
  • Maintains historical versions

GDG Structure

GDG structure:

  • GDG Base: The base name (e.g., USERID.REPORTS.DAILY)
  • Generations: Individual datasets with generation numbers
  • Relative References: +1 (next), 0 (current), -1 (previous), etc.
  • Limit: Maximum number of generations to keep

GDG Naming

GDG datasets are named like:

text
1
2
3
4
5
6
7
8
USERID.REPORTS.DAILY.G0001V00 (First generation) USERID.REPORTS.DAILY.G0002V00 (Second generation) USERID.REPORTS.DAILY.G0003V00 (Third generation) Referenced as: USERID.REPORTS.DAILY(+1) (Next generation) USERID.REPORTS.DAILY(0) (Current generation) USERID.REPORTS.DAILY(-1) (Previous generation)

When to Use GDG

Use GDG for:

  • Daily backups or reports
  • Historical data versions
  • Data snapshots
  • Versioned datasets
  • Time-based data organization

GDG Example

Example GDG for daily reports:

text
1
2
3
4
5
6
7
8
GDG Base: USERID.REPORTS.DAILY Limit: 7 generations Generations: .G0001V00 (Day 1 report) .G0002V00 (Day 2 report) .G0003V00 (Day 3 report) ...

This GDG maintains 7 days of daily reports.

Dataset Type Comparison

Comparing all dataset types helps you choose appropriately.

Comparison Table

Key characteristics:

  • PS: Simple, sequential, single file, no members
  • PDS: Multiple members, fixed directory, requires compression
  • PDSE: Multiple members, dynamic directory, automatic space management
  • VSAM: Advanced access methods, indexed, complex structure
  • GDG: Related datasets with generation numbers, versioning

Access Methods

Access methods by type:

  • PS: Sequential only
  • PDS/PDSE: Sequential and by member name
  • VSAM: Sequential, direct, indexed
  • GDG: Depends on underlying dataset type

Choosing the Right Dataset Type

Guidelines for choosing dataset types.

For Source Code

Use:

  • PDSE (preferred) or PDS
  • Multiple members for different programs
  • Easy member management

For Data Files

Use:

  • PS for simple sequential data
  • VSAM for indexed or complex data
  • GDG for versioned data

For Reports

Use:

  • PS for single reports
  • GDG for time-based report series

Best Practices

Following best practices helps you work effectively with dataset types:

  • Choose Appropriate Types: Select the right type for your needs
  • Prefer PDSE for New PDS: Use PDSE when creating new partitioned datasets
  • Understand Limitations: Know the limitations of each type
  • Plan for Growth: Consider future needs when choosing types
  • Use GDG for Versioning: Use GDG when you need historical versions
  • Consider Access Patterns: Choose types based on how data will be accessed
  • Maintain PDS Regularly: Compress PDS datasets regularly
  • Document Type Choices: Document why specific types were chosen

Explain Like I'm 5: Dataset Types

Think of dataset types like different kinds of storage containers:

  • PS (Sequential) is like a single long scroll. You write things on it one after another, and to read it, you start at the beginning and read to the end. It's like one big continuous piece of paper!
  • PDS (Partitioned) is like a filing cabinet with drawers. Each drawer has a label (member name), and you can open any drawer to get that specific file. It's like having a cabinet where you can organize many files!
  • PDSE (Extended Partitioned) is like a super-smart filing cabinet. It's like a regular filing cabinet, but it organizes itself automatically, can hold more files, and works faster. It's like having a magic filing cabinet that takes care of itself!
  • VSAM is like a super-organized library with an index. You can find things by looking them up in an index (like a book index), or you can go directly to a specific shelf. It's like having a library with a smart catalog system!
  • GDG (Generation Data Group) is like a series of photo albums for the same event. You have "Vacation Day 1", "Vacation Day 2", "Vacation Day 3", etc. Each album is numbered, and you can easily refer to "today's album" or "yesterday's album"! It's like having numbered versions of the same thing!

So dataset types are like different ways of organizing and storing your information, each designed for different needs—simple files, organized cabinets, smart systems, or versioned collections!

Practice Exercises

Complete these exercises to reinforce your understanding of dataset types:

Exercise 1: Identifying Dataset Types

Practice identification: view dataset attributes using Data Set Utility, identify DSORG values, determine dataset types, and understand type characteristics. Build familiarity with dataset types.

Exercise 2: Working with PDS and PDSE

Practice partitioned datasets: work with PDS and PDSE datasets, compare their behavior, understand member operations, and learn the differences. Master partitioned dataset concepts.

Exercise 3: Understanding VSAM

Practice VSAM: explore VSAM datasets if available, understand VSAM characteristics, learn about VSAM types, and understand when VSAM is appropriate. Learn VSAM concepts.

Exercise 4: Working with GDG

Practice GDG: identify GDG datasets, understand generation numbering, use relative references, and understand GDG concepts. Master GDG operations.

Exercise 5: Type Selection

Practice selection: identify which dataset type to use for different scenarios (source code, data files, reports, etc.), understand selection criteria, and make appropriate choices. Build decision-making skills.

Test Your Knowledge

1. What does PS stand for in dataset types?

  • Partitioned Sequential
  • Physical Sequential
  • Partitioned Set
  • Physical Set

2. What is the difference between PDS and PDSE?

  • They are identical
  • PDSE is an enhanced version with better performance
  • PDS is newer
  • PDSE is obsolete

3. What does VSAM stand for?

  • Virtual Storage Access Method
  • Variable Storage Access Method
  • Virtual Sequential Access Method
  • Variable Sequential Access Method

4. What is a GDG?

  • A dataset type
  • A collection of related datasets with generation numbers
  • A command
  • A utility

5. Which dataset type is best for source code?

  • PS
  • PDS or PDSE
  • VSAM
  • GDG

Related Concepts