Thursday, November 1, 2018

How to Transpose / Pivot millions of records efficiently in SQL/Netezza

SAS has the transpose process, SQL server and Oracle has pivot function, while both  transpose and pivot functions are not supported in Netezza version. This short document is intended to introduce a very efficient way to query data to retrieve the pivot result. Please see sample below

CREATE TABLE STUDENTS (SID VARCHAR(11)) ;
INSERT INTO STUDENTS VALUES ('ADAM') ;
INSERT INTO STUDENTS VALUES ('BRIAN') ;
INSERT INTO STUDENTS VALUES ('CHARLES') ;
INSERT INTO STUDENTS VALUES ('DAVE') ;
SELECT * FROM STUDENTS;
 

SID
DAVE
CHARLES
BRIAN
ADAM

 

CREATE TABLE MARKS (SID VARCHAR(11), SUBJECT VARCHAR(11), MARK INT2) ;
INSERT INTO MARKS VALUES ('ADAM', 'ENGLISH', 20) ;
INSERT INTO MARKS VALUES ('ADAM', 'HISTORY', 40) ;
INSERT INTO MARKS VALUES ('ADAM', 'SCIENCE', 60) ;
INSERT INTO MARKS VALUES ('BRIAN', 'HISTORY', 60) ;
INSERT INTO MARKS VALUES ('BRIAN', 'SCIENCE', 80) ;
INSERT INTO MARKS VALUES ('CHARLES', 'ENGLISH', 60) ;
INSERT INTO MARKS VALUES ('CHARLES', 'SCIENCE', 20) ;
INSERT INTO MARKS VALUES ('DAVE', 'ENGLISH', 80) ;
INSERT INTO MARKS VALUES ('DAVE', 'HISTORY', 20) ;
INSERT INTO MARKS VALUES ('DAVE', 'SCIENCE', 40) ;
SELECT * FROM MARKS;

 
SID
SUBJECT
MARK
ADAM
ENGLISH
20
ADAM
HISTORY
40
ADAM
SCIENCE
60
BRIAN
HISTORY
60
BRIAN
SCIENCE
80
CHARLES
ENGLISH
60
CHARLES
SCIENCE
20
DAVE
ENGLISH
80
DAVE
HISTORY
20
DAVE
SCIENCE
40

 

SELECT S.SID, E.MARK AS ENGLISH, H.MARK AS HISTORY, C.MARK AS SCIENCE
FROM STUDENTS AS S
LEFT OUTER JOIN MARKS AS E ON (S.SID = E.SID AND E.SUBJECT = 'ENGLISH')
LEFT OUTER JOIN MARKS AS H ON (S.SID = H.SID AND H.SUBJECT = 'HISTORY')
LEFT OUTER JOIN MARKS AS C ON (S.SID = C.SID AND C.SUBJECT = 'SCIENCE')
ORDER BY 1 ;

 
SID
ENGLISH
HISTORY
SCIENCE
ADAM
20
40
60
BRIAN
NULL
60
80
CHARLES
60
NULL
20
DAVE
80
20
40

  

Actually, this sample above is the excerpt from Netezza support. However,the most efficient way is use the query below with two advantages:
1.    Only one table is used
2.    Not any joins are needed
3.    The performance is much better than approach above

 
SELECT
SID
,MAX(CASE WHEN SUBJECT = 'ENGLISH' THEN MARK ELSE NULL END) AS ENGLISH
,MAX(CASE WHEN SUBJECT = 'HISTORY' THEN MARK ELSE NULL END) AS HISTORY
,MAX(CASE WHEN SUBJECT = 'SCIENCE' THEN MARK ELSE NULL END) AS SCIENCE
FROM MARKS
GROUP BY SID
ORDER BY 1 ;

SID
ENGLISH
HISTORY
SCIENCE
ADAM
20
40
60
BRIAN
NULL
60
80
CHARLES
60
NULL
20
DAVE
80
20
40

 

No comments:

Post a Comment