CROSS TAB QUERY IN MYSQL

MySQL Multi-Aggregated Rows in Crosstab Queries

MySQL’s crosstabs contain aggregate functions on two or more fields, presented in a tabular format. In a multi-aggregate crosstab query, two different functions can be applied to the same field or the same function can be applied to multiple fields on the same (row or column) axis. Rob Gravelle shows you how to apply two different functions to the same field in order to create grouping levels in the row axis.

Today’s topic of discussion is crosstabs, which contain multiple aggregate functions in the row axis of a tabular resultset. Recall from the the All About the Crosstab Query article that an aggregate function is one that summarizes a group of related data in some way. Examples of aggregate functions include COUNT, SUM, AVG, MIN, and MAX. In a multi-aggregate crosstab query, two different functions can be applied to the same field or the same function can be applied to two or more fields. Today we’ll create a query that applies two different functions to the same field in order to create grouping levels in the row axis.

Recap of Crosstab Basics

In the Tips for Simplifying Crosstab Query Statements article, we took the complex SQL statement that we created in the All About the Crosstab Query article and simplified it to a more streamlined version:

SELECT CASE WHEN Month_Num IS NULL 
            THEN 'TOTAL' 
            ELSE Month 
       END      AS 'Month',
  REGION_1 AS 'REGION 1',  
  REGION_2 AS 'REGION 2',
  REGION_3 AS 'REGION 3',
  REGION_4 AS 'REGION 4',
  REGION_5 AS 'REGION 5',       
  TOTAL
FROM  (SELECT MONTH(CREATION_DATE)                          AS Month_Num,
     MONTHNAME(CREATION_DATE)                               AS Month,
     COUNT(CASE WHEN REGION_CODE ='01' THEN FEE_NUMBER END) AS REGION_1,
     COUNT(CASE WHEN REGION_CODE ='02' THEN FEE_NUMBER END) AS REGION_2,
     COUNT(CASE WHEN REGION_CODE ='03' THEN FEE_NUMBER END) AS REGION_3,
     COUNT(CASE WHEN REGION_CODE ='04' THEN FEE_NUMBER END) AS REGION_4,
     COUNT(CASE WHEN REGION_CODE ='05' THEN FEE_NUMBER END) AS REGION_5,
     COUNT(*)                                               AS TOTAL 
  FROM   TA_CASES
  WHERE  YEAR(CREATION_DATE)=1998
  GROUP BY Month_Num WITH ROLLUP) AS CA;

The subquery fetched all of the fields that we needed, plus the month number, for sorting.  We then selected from it by placing the code after the FROM of a second query. Performing a query in a two pass process in this way is called Pre (or Partial)-Aggregation. The first pass creates a derived table or resultset that performs most of the aggregation work, while the next pass does some formatting and any additional calculations that may be required.  The neccessitating of pre-aggregation here was the result of two factors: The ROLLUP GROUP BY modifier inserted a Null row into the resultset, which was difficult to replace with the “TOTAL” row header because of its late evaluation in the query process.  A second challenge was presented by the grouping on the output of date functions because grouping on the MONTHNAME() sorted the rows in alphabetical order, rather than chronological.

The above query produced the following desired crosstab, including chronological row sorting and totals:

Month REGION 1 REGION 2 REGION 3 REGION 4 REGION 5 TOTAL
April 13 33 76 2 47 171
May 17 55 209 1 143 425
June 8 63 221 1 127 420
July 13 104 240 6 123 486
August 18 121 274 9 111 533
September 25 160 239 2 88 514
October 9 88 295 2 127 521
November 2 86 292 2 120 502
December 1 128 232 6 155 522
TOTAL 106 838 2078 31 1041 4094

Multi-Aggregate Pivots

This query was not terribly complex as it only hit one table and pivoted between two fields: the Month and Region.  That’s called a Single Pivot.  If we wanted to breakdown the time periods further into other time periods, we would now be looking at a two-to-one pivot (eg: months and weeks per region).  On the column side, regions could likewise be broken down into specific cities, giving us a one-to-two pivot.  Adding the new fields to both the columns and rows would produce a true many-to-many multi-aggregate pivot.  The following chart illustrates the inherent complexety of multi-aggregate pivots:

A B C1 C2 Total column header 1
D1 D2 Total D1 D2 Total column header 2
E1 E2 Total E1 E2 Total E1 E2 Total E1 E2 Total column header 3
A1 B1
B2
Total row sub total
A2 B1
B2
Total row sub total
Total row total
row
header
1
row
header
2
col
sub
total
col
sub
total
col
sub
total
col
sub
total
col
sub
total
col
sub
total
col
total

 

Reporting on Two Row Fields

We’re going to create a Multi-Aggregate row crosstab by adding the Year to the row data, making it the new A row in the above diagram, thus displacing the months to the B field. That will allow us to report on multiple years, by calling the YEAR() function on the CREATION_DATE, much like we did for displaying the months. We’ll insert it as the first field in the SELECT list (new code appears in Red):

...
FROM  (SELECT YEAR(CREATION_DATE)              AS Year,
              MONTH(CREATION_DATE)             AS Month_Num,
              MONTHNAME(CREATION_DATE)         AS Month,
		 ...

We can also easily include totals for each year by adding it to the GROUP BY clause:

       GROUP BY Year, Month_Num WITH ROLLUP) AS CA;

In addition to needing a column heading for the years, we also need to alter the months CASE statement, because rows which display the yearly totals will contain a NULL Month_num value:

SELECT CASE WHEN Year IS NULL
            THEN 'GRAND TOTAL' 
            ELSE Year 
       END      AS 'Year',
       CASE WHEN Month_Num IS NULL     
            THEN CASE WHEN Year IS NULL
                      THEN ''                                         
                      ELSE CONCAT(Year, ' TOTAL')                      
                 END
            ELSE Month 
       END      AS 'Month',  
	 ...

Here then is the full SQL statement to include the Year rows.

SELECT CASE WHEN Year IS NULL
            THEN 'GRAND TOTAL' 
            ELSE Year 
       END      AS 'Year',
       CASE WHEN Month_Num IS NULL     
            THEN CASE WHEN Year IS NULL
                      THEN ''                                         
                      ELSE CONCAT(Year, ' TOTAL')                      
                 END
            ELSE Month 
       END      AS 'Month',  
       REGION_1 AS 'REGION 1',  
       REGION_2 AS 'REGION 2',
       REGION_3 AS 'REGION 3',
       REGION_4 AS 'REGION 4',
       REGION_5 AS 'REGION 5',       
       TOTAL
FROM  (SELECT YEAR(CREATION_DATE)                                    AS Year,
              MONTH(CREATION_DATE)                                   AS Month_Num,
              CONVERT(MONTHNAME(CREATION_DATE) USING latin1)         AS Month,
              COUNT(CASE WHEN REGION_CODE ='01' THEN FEE_NUMBER END) AS REGION_1,
              COUNT(CASE WHEN REGION_CODE ='02' THEN FEE_NUMBER END) AS REGION_2,
              COUNT(CASE WHEN REGION_CODE ='03' THEN FEE_NUMBER END) AS REGION_3,
              COUNT(CASE WHEN REGION_CODE ='04' THEN FEE_NUMBER END) AS REGION_4,
              COUNT(CASE WHEN REGION_CODE ='05' THEN FEE_NUMBER END) AS REGION_5,
              COUNT(*)                                               AS TOTAL 
       FROM   TA_CASES
       WHERE  YEAR(CREATION_DATE)>2003
       GROUP BY Year, Month_Num WITH ROLLUP) AS CA;

The revised SQL code produces the following result, which includes the YEAR headers in the first column, and the TOTAL summary row for each year:

Year Month REGION 1 REGION 2 REGION 3 REGION 4 REGION 5 TOTAL
2004 January 8 41 156 1 42 248
2004 February 1 38 140 0 29 212
2004 March 0 44 115 0 50 209
2004 April 4 45 119 0 42 210
2004 May 1 57 151 0 84 294
2004 June 2 63 142 0 48 259
2004 July 6 47 110 0 33 199
2004 August 10 38 150 1 53 256
2004 September 3 45 146 7 34 235
2004 October 2 55 112 0 34 204
2004 November 4 36 110 1 43 195
2004 December 1 30 165 0 31 227
2004 2004 TOTAL 42 539 1616 10 523 2748
2005 January 3 34 107 2 34 182
2005 February 0 24 103 0 15 144
2005 March 1 30 101 0 24 159
2005 April 3 27 149 0 27 212
2005 May 1 36 110 0 27 177
2005 June 3 52 113 0 28 196
2005 July 1 38 131 3 22 195
2005 August 4 51 146 1 36 238
2005 September 9 52 149 0 49 259
2005 October 10 29 100 0 47 186
2005 November 11 11 162 0 18 204
2005 December 4 66 142 0 19 231
2005 2005 TOTAL 50 450 1513 6 346 2383
2006 January 2 68 132 0 30 235
2006 February 1 43 94 0 24 162
2006 March 3 30 134 0 29 196
2006 April 1 47 129 0 21 199
2006 May 11 52 124 0 31 220
2006 June 9 49 126 0 30 214
2006 July 4 43 125 0 42 217
2006 August 3 50 132 0 51 241
2006 September 6 56 149 0 45 262
2006 October 5 31 121 2 25 184
2006 November 3 42 146 3 63 258
2006 December 0 47 115 0 23 194
2006 2006 TOTAL 48 558 1527 5 414 2582
2007 January 2 37 152 1 35 229
2007 February 1 35 86 2 31 156
2007 March 5 78 132 0 51 268
2007 April 2 41 125 1 41 211
2007 May 0 50 122 0 33 206
2007 June 1 63 107 0 38 209
2007 July 5 41 65 1 31 150
2007 August 0 63 110 2 43 219
2007 September 2 35 134 0 55 227
2007 October 2 39 120 2 52 225
2007 November 10 22 141 0 36 215
2007 December 51 26 60 0 17 156
2007 2007 TOTAL 81 530 1354 9 463 2471
2008 January 1 52 154 1 49 277
2008 February 0 12 84 1 34 136
2008 March 0 30 85 1 27 148
2008 April 3 22 73 0 40 168
2008 May 0 48 90 2 29 170
2008 June 10 63 122 0 24 225
2008 July 19 38 148 3 28 238
2008 August 54 50 105 0 20 230
2008 September 42 34 143 2 44 268
2008 October 73 41 112 1 26 253
2008 November 11 15 101 0 20 152
2008 December 285 62 160 2 29 541
2008 2008 TOTAL 498 467 1377 13 370 2806
2009 January 686 27 123 3 42 882
2009 February 2 9 87 0 22 120
2009 March 4 19 106 0 25 154
2009 April 0 10 95 0 12 117
2009 May 4 31 93 0 21 151
2009 June 63 71 94 0 21 251
2009 July 1 42 92 6 21 165
2009 August 3 53 116 3 19 195
2009 September 1 12 25 0 11 49
2009 December 0 0 1 0 10 17
2009 2009 TOTAL 764 274 832 12 204 2101
GRAND TOTAL   1483 2818 8219 55 2320 15091

Adding row fields is not all that difficult because queries naturally group data by rows. Therefore, all that is required is to add the new row to the SELECT and GROUP BY field lists and include a column for the row headers, as we did above. In the next installment, we will take on the more imposing challenge of adding a new column. It’s not as straighforward as might initially appear.

SOURCE: http://www.databasejournal.com/features/mysql/article.php/3865811/MySQL-Multi-Aggregated-Rows-in-Crosstab-Queries.htm

crosstab

PICTURE SOURCE: http://sqlfiddle.com/#!9/ca16b/76

 

Advertisements
This entry was posted in MYSQL. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s