Joins or merges two sets of vectors together, based on the values of sets of classifying keys (C.F. Johnston & D.B. Baird).
Options
NINDEX = scalar |
Number of index vectors in structures (up to 10); default 1 |
---|---|
METHOD = string token |
Type of join (inner , left , right , full ); default full |
REPEATS = string token |
How to handle repeats of matches (combinations , single ); default sing outputs one row per match |
INCLUDE = string token |
How to handle restrictions on the input vectors (all , nonrestricted ); default all uses all the data rows |
SORT = string token |
Whether NEWVECTORS should be sorted on the index vectors (ascending , descending , unsorted ); default unsorted keeps the same ordering as the input sets |
Parameters
LEFTVECTORS = pointer |
Pointer to a list of vectors in left set (keys and variables) |
---|---|
RIGHTVECTORS = pointer |
Pointer to a list of vectors in right set (keys and variables) |
NEWVECTORS = pointer |
Pointer to a list of output vectors (keys and variables) |
Description
This procedure can be used to produce a set of NEWVECTORS
, which is the result of joining (or merging) two sets according to index (or key) vectors in each set. JOIN
supports SQL style joins, as well as merges, as implemented in Genstat for Windows, SAS and SPSS.
The number of index vectors is given by the NINDEX
option (up to 10). Each of LEFTVECTORS
and RIGHTVECTORS
is a pointer to NINDEX
keys followed by any number of extra vectors. The NEWVECTORS
parameter is a pointer to NINDEX
keys followed by the total number of non-index vectors in the two input sets. The output order in NEWVECTORS
will be the combined keys from the left and right sets, then the non-index vectors from the left set, followed by the non-index vectors from the right set. You need not have declared the pointer already; it will be declared automatically if necessary. The vectors may be variates, factors or texts. Warnings are given if the types of index vectors in each set do not match, although a factor can be matched with a text. Attempting to match a text with a factor or variate will result in a fault.
The METHOD
option controls the type of join and determines which rows from each input set will be output. METHOD=inner
outputs only those rows where the keys from both sets match. METHOD=left
outputs all rows from the LEFTVECTORS
set and only those rows from RIGHTVECTORS
where the keys from both sets match. METHOD=right
outputs all rows from the RIGHTVECTORS
set and only those rows from LEFTVECTORS
where the keys from both sets match. METHOD=full
outputs all rows from both sets. Where keys do not match, missing values are inserted into the non-index vectors from the set without that key value.
The REPEATS
option determines what happens when both input sets have repeats of the same matching key values. REPEATS=single
outputs one row for each match, so that if there are M
repeats in LEFTVECTORS
and N
repeats in RIGHTVECTORS
, MAX(M,N)
rows will be output. This is the same behaviour as the merge statements of SAS and SPSS and the Merge Spreadsheets menu of Genstat for Windows. REPEATS=combinations
outputs all combinations of the repeats, giving M*N
rows. This is equivalent to an SQL join and may produce very large output sets.
The INCLUDE=nonrestricted
option allows the use of restrictions on the vectors in each input set to be used to subset the rows. The SORT=unsorted
option allows the resulting vectors to be returned in the original order of the input data set, or sorted on the key vectors in either ascending or descending direction.
Note: this procedure may take some time to complete for joins of large data sets. You should also ensure the data space is large enough for the resultant vectors, especially if using the option REPEATS=combinations
.
Options: NINDEX
, METHOD
, REPEATS
, INCLUDE
, SORT
.
Parameters: LEFTVECTORS
, RIGHTVECTORS
, NEWVECTORS
.
Method
The LEFTVECTORS
and RIGHTVECTORS
are sorted by the index variables. If there are restrictions on the vectors in either input set, this is used to subset the input vectors if INCLUDE=nonrestricted
. For each of the rows of the two sets, the keys are compared and output rows are appended according to the METHOD
option. If repeats of the same matching key values in both sets occur and REPEATS=combinations
the procedure loops through all combinations of the matching rows. The resulting vectors are then sorted into the order specified in the SORT
option.
Action with RESTRICT
Any of the input vectors may be restricted. If INCLUDE=nonrestricted
, only those rows which are not excluded by any restriction on vectors in each input set will be processed, otherwise the restrictions will be ignored.
See also
Directive: EQUATE
.
Procedures: APPEND
, STACK
, UNSTACK
, VEQUATE
.
Commands for: Calculations and manipulation.
Example
CAPTION 'JOIN example'; STYLE=meta VARIATE [NVALUES=7] K11,K12,V11 TEXT [NVALUES=7] T13 READ K11,K12,V11,V12,T13 1 1 1 6.2 'Red' 1 1 2 5.7 'Green' 1 1 3 4.5 'Blue' 1 2 4 7.3 'Red' 2 1 5 4.1 'Yellow' 3 2 6 5.1 'Blue' 3 3 7 1.9 'Black' : VARIATE [NVALUES=5] K21,K22,V21 READ K21,K22,V21 1 1 1 1 1 2 2 1 3 2 3 4 5 2 5 : PRINT K11,K12,V11,V12,T13; DECIMALS=3(0),1,0 & K21,K22,V21; DECIMALS=0 JOIN [NINDEX=2; METHOD=inner; REPEATS=single]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=left; REPEATS=single]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=right; REPEATS=single]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=full; REPEATS=single; SORT=descending]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=full; REPEATS=comb; SORT=ascending]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 RESTRICT K11; K11 < 3 JOIN [NINDEX=2; METHOD=full; REPEATS=comb; INCLUDE=nonrestricted]\ LEFT=!p(K11,K12,V11,V12); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,V3) PRINT K1,K2,V1,V2,V3; FIELDWIDTH=8; DECIMALS=0,0,0,1,0 RESTRICT K21; K21 < 3 JOIN [NINDEX=1; METHOD=full; REPEATS=single; INCLUDE=all] \ LEFT=!p(K11,V11,V12,T13); RIGHT=!p(K21,V21);\ NEW=!p(K1,V1,V2,T3,V4) PRINT K1,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,1,0,0