As a dialect of SQL, PROC SQL can really play a lot of tricks in SAS, which are mostly carried out by DATA step or PROCs. Here I summarize the 10 interesting ones among them. Let's start with the free SAS help dataset SASHELP.CLASS. This dataset contains the weight, height, sex and other information of 19 fake teenagers. In this demo, I primarily paly with the WEIGHT variable from it.
data class; set sashelp.class; obs = _n_; run;
1. Calculate the median of a variable
This is the task usually done by PROC MEANS. With the aggregating HAVING clause, PROC SQL can fulfill this purpose as well.
proc sql; select avg(weight) from (select e.weight from class e, class d group by e.weight having sum(case when e.weight = d.weight then 1 else 0 end) ge abs(sum(sign(e.weight - d.weight)))); quit;
2. Draw a horizontal bar chart
It is a substitute in case that we want to have a look at the distribution of a variable but don’t know PROC GPLOT or other plotting procedures.
proc sql; select age, '|', repeat('*',count(*)*4) as frequency from class group by age order by age; quit;
3. Return the running total for a variable
DATA step and PROC IML both can get this job done, like what I have shown at a previous post. PROC SQL is the 3rd choice, just with the little help of a subquery at the SELECT clause.
proc sql; select name, weight, (select sum(a.weight) from class as a where a.obs <= b.obs) as running_total from class as b; quit;
4. Report data with subtotal
The first thing in my mind is the powerful REPORT procedure. However, PROC SQL is an alternative with the set operator.
proc sql; select name, weight from class union all select 'Total', sum(weight) from class; quit;
5. Find the column information from metadata
SAS stores the metadata at its DICTIONARY datasets. PROC SQL can visit the column information easily, without using the CONTENTS procedure.
proc sql; select name, type, varnum from sashelp.vcolumn where libname = 'WORK' and memname = 'CLASS'; quit;
6. Rank a variable
PROC RANK is a really handy tool for this functionality. Besides it, PROC SQL can do some simple ranking as well.
proc sql; select name, a.weight, (select count(distinct b.weight) from class b where b.weight <= a.weight) as rank from class a; quit;
7. Random sampling
PROC SURVEYSELECT involves with too much statistics. If we just need a simple random sampling, PROC SQL is a more popular option. For example, if I want to pick out 8 observations, I can use PROC SQL like:
proc sql outobs = 8; select * from class order by ranuni(1234); quit;
8. Replicate a data set without data
In PROC SQL, it is a fairly simple one-sentence statement to create an empty data set.
proc sql; create table class2 like class; quit;
9. Transpose data
DATA step ARRAY is a big headache for most people. Suppose that we want to list the names of the teenagers by their genders, PROC SQL has a way to do it, although it may require some complicated queries/subqueries.
proc sql; select max(case when sex='F' then name else ' ' end) as Female, max(case when sex='M' then name else ' ' end) as Male from (select e.sex, e.name, (select count(*) from class d where e.sex=d.sex and e.obs < d.obs) as level from class e) group by level; quit;
10. Count the missing values
The great thing in PROC SQL is that the NMISS and N functions under it work for both numeric and character variables. We can also write a macro based on them to search the missing values for all variables.
proc sql select count(*), nmiss(weight), n(weight) from class; quit;