Working with Permanent SAS data sets
There are important advantages to using permanent SAS data sets. These include faster execution and improved organization of your computing tasks. When working with large data sets, the faster execution often is very noticeable. The primary disadvantage is that a permanent SAS data set requires disk storage space.
This section explains how to
- Create a permanent SAS data set.
- Access it in a different SAS session than the one which created it.
This summary explains how to use a SAS data step to create a permanent SAS data set. Many SAS procedures also can create SAS data sets as output. For example, the regression procedure can create an output data set containing the regression coefficients and other statistics, and it can create another data set containing information like predicted values, prediction errors, and confidence intervals for predicted values. In addition, the Output Delivery System (ODS) will produce a SAS data set containing any statistic tabled by any SAS procedure.
The SAS data step starts with a data statement. The syntax of the data statement for creating one data set isdata <dataset-name>
Replace <dataset-name> with a valid name for the data set. (More than one output data set can be created by listing more than one dataset name after the data keyword, e.g. data dsone dstwo dsthree;).
There are two types of syntax for the <dataset-name>:
Replace <dataset-name> with the UNIX filename enclosed in quotes, for example:
The file extension must be sas7bdat. Here the connection between the SAS syntax and the UNIX filename is direct: A UNIX file called income.sas7bdat will be created or replaced in your current working directory.
- Use a SAS libname statement and a
libref on the data statement. The
second method is not quite so
transparent; here is an example:
libname survey '~/income_survey';
The libname statement establishes a libref called survey. In this example, the libref, survey, becomes a synonym or alias for the UNIX directory ~/income_survey which is a subdirectory of your home directory. (~/> is shorthand for your home directory.)
For method 2, the syntax for the <dataset-name> must contain two parts, separated by a period. The first part is the libref. The second part is the SAS dataset name which also becomes the root part of the UNIX file name.
Assuming the current working directory is ~/income_survey for method 1, both methods for this example produce a SAS dataset with the UNIX filename:
in your directory named income_survey.
Using method 2, SAS automatically adds the extension (sas7bdat) to the UNIX file name. This extension is required for SAS to access the file. So if you use UNIX commands to move or copy the file, be sure the new version retains the sas7bdat extension.
Note: The libref exists only during the current SAS session, but the UNIX file name is permanent.
Here are two complete examples, one for each of the two syntaxes for the file name.
data "income.sas7bdat"; infile 'income.data'; input gender race persinc; if persinc = 99 then persinc=.; run;
libname survey '~/income_survey'; data survey.income; infile 'income.data'; input gender race persinc; if persinc = 99 then persinc=.; run;
In both cases, the data step reads data from an ascii data file called income.data and creates a permanent SAS data set with a UNIX file name of income.sas7bdat. If the current working directory for method 1 is ~/income_survey, the SAS data set for both methods resides in the UNIX directory ~/income_survey.
As these examples illustrate, you may use either single or double quotes to contain a file name, but the quotes on each side of the file name must be the same.
There are several contexts for accessing an existing permanent SAS data set. These include the data= option on a SAS procedure statement and set and merge statements in a SAS data step.
In each context, give the name of the sas data set using one of the two formats described in How to Create a Permanent SAS Data Set. Here are three examples, each showing both ways to identify the SAS data set name:
- Accessing a permanent SAS data set
with a procedure. The name of the
procedure in these examples is glm.
- Enclose the UNIX file name of the SAS data set in quotes (current UNIX working directory is ~/income_survey) proc glm data="income.sas7bdat";
- Use a libname statement and a
libname survey '~/income_survey'; proc glm data=survey.income;
In both of these examples, the UNIX file containing the SAS data set resides in your directory ~/income_survey, and the unix file name is income.sas7bdat.
- Reading a permanent SAS data set in
a data step using a set statement.
- Enclose the file name of the
SAS data set in quotes (current
UNIX working directory is
data desert; set "rain.sas7bdat"; where annual < 10; run;
- Use a libname statement and a
libname rainfall '~/rainfall'; data desert; set rainfall.rain; where annual < 10; run;
In this example, the SAS data set name is rainfall.rain. The UNIX file name for rainfall.rain is rain.sas7bdat. It resides in your UNIX directory ~/rainfall. The data step creates a temporary SAS data set called desert defined by regions with less than 10 inches of annual rainfall. No permanent UNIX file is created by this program—when the job completes, the data set desert is removed.
- Enclose the file name of the SAS data set in quotes (current UNIX working directory is ~/rainfall)
- Merging two permanent SAS data
- Enclosing the file name of the
SAS data sets in quotes:
proc sort data="~/student_survey/data.sas7bdat" out=student; by schoolid; run; proc sort data="~/school_data/data.sas7bdat" out=school; by schoolid; run; data stu_sch; merge student school; by schoolid; run; proc glm; class schltype; model colplan = gpa schltype / solution; run;
- Using libname statements and
libname student '~/student_survey'; libname school '~/school_data'; proc sort data=student.data out=student; by schoolid; run; proc sort data=school.data out=school; by schoolid; run; data stu_sch; merge student school; by schoolid; run; proc glm; class schltype; model colplan = gpa schltype / solution; run;
There are two permanent SAS data sets in this example. The first referenced, student.data, resides in your UNIX file ~/student_survey/data.sas7bdat. The second, school.data, resides in your UNIX file ~/school_data/data.sas7bdat.
Note that the directory name is part of the file name string enclosed in quotes using method (a). This is necessary in at least one of the two sort statements since only one directory can be the current working directory.
The two proc sort statements read a permanent SAS data set, sort it by schoolid, and create a temporary SAS data set, student and school. The data step merges these two sorted data sets by schoolid. The statement, by schoolid;, means that SAS makes sure that the schoolid on each student record matches the schoolid on the school record—so school information matched to students comes from the school the student attends. The output of the data step is a temporary SAS data set called stu_sch which is deleted at the end of the SAS session.
- Enclosing the file name of the SAS data sets in quotes:
The following tabulation summarizes the examples in the preceding subsections.
|SAS libname statement:||libname survey '~/income_survey';|
|SAS data statement:||data survey.income;|
|SAS name for the data set:||survey.income|
|root part of UNIX filename:||income|
|extension part of the UNIX filename:||sas7bdat|
|Reference by UNIX filename||data "income.sas7bdat";|