Medium Challenge 4: Using LISTAGG() And LEVEL() Function

365 Days of Daily Coding: Day 6 and Day 7

I did not post for Day 6 as I was trying to crack the solution of Day 5 that had been submitted by others and I can finally say that I understand the codes. I was able to break down the query and was able to understand the logic of the queries.

I am afraid this is going to be a rather long post. But I hope it will be worth the read. Let’s begin with the first solution from Day 5.

Solution 1:

 SELECT LISTAGG(L1,'&') WITHIN GROUP (ORDER BY L1) 
 FROM (
     SELECT L1 FROM (
         SELECT LEVEL L1 
         FROM DUAL CONNECT BY LEVEL<=1000) 
     WHERE L1 <> 1 MINUS 
     SELECT L1 FROM (SELECT LEVEL L1 FROM DUAL CONNECT BY LEVEL<=1000) A, 
     (SELECT LEVEL L2 FROM DUAL CONNECT BY LEVEL<=1000) B 
     WHERE L2<=L1 and MOD(L1,L2)=0 AND L1<>L2 AND L2<>1);

Breaking down the code:

Query 1: Let’t begin with the subqueries.

(SELECT LEVEL L1 FROM DUAL CONNECT BY LEVEL<=1000) A; 
(SELECT LEVEL L2 FROM DUAL CONNECT BY LEVEL<=1000) B ;

This method in oracle is used to select numbers from 1 to 1000 from a dummy table called ‘Dual’.

Another efficient way of selecting row numbers is by using rownum.

 SELECT LEVEL L1 FROM DUAL CONNECT BY ROWNUM <=4;

Query 2:

 SELECT L1, L2 FROM 
 (SELECT LEVEL L1 FROM DUAL CONNECT BY LEVEL<=1000) A, 
 (SELECT LEVEL L2 FROM DUAL CONNECT BY LEVEL<=1000) B 
 WHERE L2<=L1;

L1	L2
1	1
2	1
2	2
3	1
3	2
3	3
4	1
4	2
4	3
4	4
5	1
5	2
5	3
5	4
5	5

Sample of output from Query 2

Query 3:

 SELECT L1, L2 FROM 
 (SELECT LEVEL L1 FROM DUAL CONNECT BY LEVEL<=1000) A, 
 (SELECT LEVEL L2 FROM DUAL CONNECT BY LEVEL<=1000) B 
 WHERE L2<=L1 AND L1<>L2 ;

L1	L2
2	1
3	1
3	2
4	1
4	2
4	3
5	1
5	2
5	3
5	4

Sample of output of Query 3

Query 4:

 SELECT L1, L2 FROM 
 (SELECT LEVEL L1 FROM DUAL CONNECT BY LEVEL<=1000) A, 
 (SELECT LEVEL L2 FROM DUAL CONNECT BY LEVEL<=1000) B 
 WHERE L2<=L1 AND L1<>L2 AND L2<>1;

L1	L2
3	2
4	2
4	3
5	2
5	3
5	4

Sample of output of Query 4

Query 5:

    SELECT L1, L2 FROM 
        (SELECT LEVEL L1 FROM DUAL CONNECT BY LEVEL<=1000) A, 
        (SELECT LEVEL L2 FROM DUAL CONNECT BY LEVEL<=1000) B 
     WHERE L2<=L1 AND L1<>L2 AND L2<>1 AND MOD(L1,L2)=0 ;

L1	L2
4	2
6	2
6	3
8	2
8	4
10	2
10	5
12	2

Sample of output of Query 5

So basically, this query is to generate a set of non-prime numbers. When this set is excluded from the list of 1 to 1000, we get the prime numbers.

The ultimate solution looks like below using the ‘LISTAGG’ function:

2&3&5&7&11&13&17&19&23&29&31&37&41&43&47&53&59&61&67&71&73&79&83&89&97&101&103&107&109&113&127&131&137&139&149&151&157&163&167&173&179&181&191&193&197&199&211&223&227&229&233&239&241&251&257&263&269&271&277&281&283&293&307&311&313&317&331&337&347&349&353&359&367&373&379&383&389&397&401&409&419&421&431&433&439&443&449&457&461&463&467&479&487&491&499&503&509&521&523&541&547&557&563&569&571&577&587&593&599&601&607&613&617&619&631&641&643&647&653&659&661&673&677&683&691&701&709&719&727&733&739&743&751&757&761&769&773&787&797&809&811&821&823&827&829&839&853&857&859&863&877&881&883&887&907&911&919&929&937&941&947&953&967&971&977&983&991&997

Solution 2:

 WITH numSel As (
     SELECT LEVEL num FROM dual CONNECT BY LEVEL <= 1000
 ),
 Primes AS (
     SELECT a.num p
      FROM numSel a, numSel b
      WHERE b.num <= a.num
      GROUP BY a.num
      HAVING COUNT(
      CASE a.num/b.num 
           WHEN Trunc(a.num/b.num) THEN 
           'Y' 
      END) = 2)
 SELECT LISTAGG(p, '&') WITHIN GROUP (ORDER BY p)
   FROM primes
 ;

 I got curious because of the two tables that were being selected without the use of any keyword 'JOIN'.
 SELECT a.num p
      FROM numSel a, numSel b

I did some dig up and got to know that this is another way of doing a joins without using join keywords.

When the join keyword is not used, the result would be a cross join also known as a cartesian join between the two tables. If a ‘WHERE’ clause follows ‘FROM’ then it becomes an inner join. This way is however, not encouraged for readability purpose and quite evidently by my frustration of not being able to understand as a beginner 😂.

Another reason why it is not encouraged is because “A Cartesian product always generates many rows and is rarely useful. For example, the Cartesian product of two tables, each with 100 rows, has 10,000 rows. Always include a join condition unless you specifically need a Cartesian product. If a query joins three or more tables and you do not specify a join condition for a specific pair, then the optimizer may choose a join order that avoids producing an intermediate Cartesian product.” Source

Another element to this query is 
'HAVING  
     COUNT(CASE a.num/b.num 
               WHEN Trunc(a.num/b.num) THEN 
                 'Y' 
               END) = 2);'
which I will explain as I break down the codes and reach that level. As for now, please understand the definition of prime numbers.

What is a prime number?

A positive integer that has exactly two positive divisors.
1 is not considered as a prime number.

Breaking down the code:

Query 1:

 WITH numSel As (
     SELECT LEVEL num FROM dual CONNECT BY LEVEL <= 1000
 ),
 primes AS (
     SELECT a.num p, b.num q
      FROM numSel a, numSel b
      WHERE b.num <= a.num
      GROUP BY a.num, b.num)
 SELECT p, q
   FROM primes;

p	q
1	1
2	1
2	2
3	1
3	2
4	1
4	2
4	3
4	4
5	1
5	2
5	3
5	4
5	5

Sample of output of Query 1

Query 2:

WITH numSel As (
     SELECT LEVEL num FROM dual CONNECT BY LEVEL <= 1000
 ),
 primes AS (
     SELECT a.num p, b.num q, a.num/b.num r, Trunc(a.num/b.num) s
      FROM numSel a, numSel b
      WHERE b.num <= a.num
      GROUP BY a.num, b.num
 )
 SELECT p, q, r, s
   FROM primes order by p, q;

p	q	r	s
1	1	1	1
2	1	2	2
2	2	1	1
3	1	3	3
3	2	1.5	1
3	3	1	1
4	1	4	4
4	2	2	2
4	3	1.33	1
4	4	1	1
5	1	5	5

Sample of output of Query 2

Query 3:

 WITH numSel As (
     SELECT LEVEL num FROM dual CONNECT BY LEVEL <= 1000
 ),
 primes AS (
     SELECT a.num p, b.num q, a.num/b.num r, Trunc(a.num/b.num) s, 
     CASE a.num/b.num WHEN Trunc(a.num/b.num) THEN 'Y' END t
      FROM numSel a, numSel b
      WHERE b.num <= a.num
      GROUP BY a.num, b.num
 )
 SELECT p, q, r, s, t
   FROM primes order by p, q;

p	q	r	s	t
1	1	1	1	Y
2	1	2	2	Y
2	2	1	1	Y
3	1	3	3	Y
3	2	1.5	1	N
3	1	3	3	Y
4	1	4	4	Y
4	2	2	2	Y
4	3	1.33	1	N
4	4	1	1	Y
5	1	5	5	N

Sample of output of Query 3

Query 4: The below query introduces the ‘HAVING’ clause to filter out the results from Query 3. Basically, those numbers ‘p’ that have count of ‘t’ equal to 2, are displayed which fits the definition of prime numbers, I outlined earlier i.e. prime numbers have only 2 divisors.

WITH numSel As (
      SELECT LEVEL num FROM dual CONNECT BY LEVEL <= 1000
  ),
  primes AS (
      SELECT a.num p
       FROM numSel a, numSel b
       WHERE b.num <= a.num
       GROUP BY a.num
       HAVING COUNT(
         CASE a.num/b.num 
             WHEN Trunc(a.num/b.num) THEN 
                 'Y' 
         END) = 2
  )
  SELECT p
    FROM primes order by p;

Sample of output Query 4

At this point, I would also like to raise another point of confusion I had on the ‘CASE’ logic. My particular confusion was on ‘a.num/b.num’ and ‘Trunc(a.num/b.num)’. I did some research again and found that in the above ‘CASE’ logic, the ‘a.num/b.num’ is an input expression being evaluated against ‘Trunc(a.num/b.num)’ and if found ‘True’ the row was assigned ‘Y’.

This type of case expression is called a simple case expression. Another type of case expression is called a searched case expression and involves evaluating a list of Boolean expressions.

Simple CASE expression example

 CASE e 
     WHEN e1 THEN
        r1 
     WHEN e2 THEN
        r2 
     WHEN en THEN
        rn 
     [ ELSE r_else ] 
 END

SELECT
   product_name,
   list_price,
   CASE category_id
     WHEN 1
     THEN ROUND(list_price * 0.05,2) -- CPU
     WHEN 2
     THEN ROUND(List_price * 0.1,2)  -- Video Card
     ELSE ROUND(list_price * 0.08,2) -- other categories
   END discount
 FROM
   products
 ORDER BY
   product_name;

Searched CASE expression example

CASE      
WHEN e1 THEN r1     
     [ WHEN e2 THEN r2]
     ...     
     [ELSE
         r_else]
 END

SELECT
   product_name,
   list_price,
   CASE
     WHEN list_price > 0 AND list_price  < 600         
        THEN 'Mass'        
     WHEN list_price >= 600 AND list_price < 1000         
        THEN 'Economy'     
     WHEN list_price >= 1000 AND list_price < 2000
         THEN 'Luxury'
     ELSE 
         'Grand Luxury'
   END product_group
 FROM
   products
 WHERE
   category_id = 1
 ORDER BY
   product_name;

Solution 3:

 SELECT LISTAGG(Numerator, '&') WITHIN GROUP (ORDER BY Numerator)
   FROM (
       SELECT A.Numerator
       FROM (SELECT LEVEL AS Numerator 
              FROM DUAL CONNECT BY LEVEL <= 1000) A
            ,(SELECT LEVEL AS Denominator 
              FROM DUAL WHERE LEVEL > 1 CONNECT BY LEVEL <= 1000) B
       WHERE A.Numerator >= B.Denominator
       AND MOD(A.Numerator , B.Denominator) = 0
       GROUP BY A.Numerator
       HAVING COUNT(*) <= 1
   );

Query 1: Here for the column ‘B’, 1 is excluded so we would have every other combination for columns ‘A’ and ‘B’ except for number ‘1’ and numbers ‘A’ that are less than or equal to ‘B’.

SELECT A.Numerator, B.Denominator
 FROM (SELECT LEVEL AS Numerator 
       FROM DUAL CONNECT BY LEVEL <= 1000) A,
      (SELECT LEVEL AS Denominator                
      FROM DUAL WHERE LEVEL > 1 CONNECT BY LEVEL <= 1000) B  
WHERE A.Numerator >= B.Denominator;

A.Numerator	B.Denominator
2	2
3	2
3	3
4	2
4	3
4	4
5	2

Sample of output of Query 1

Query 2

SELECT A.Numerator, B.Denominator, A.Numerator/B.Denominator
FROM (SELECT LEVEL AS Numerator 
FROM DUAL CONNECT BY LEVEL <= 1000) A 
,(SELECT LEVEL AS Denominator 
FROM DUAL WHERE LEVEL > 1 CONNECT BY LEVEL <= 1000) B 
WHERE A.Numerator >= B.Denominator
AND MOD(A.Numerator , B.Denominator) = 0
GROUP BY A.Numerator, B.Denominator
ORDER BY A.Numerator, B.Denominator;

A.Numerator	B.Denominator
2	2
3	3
4	2
4	4
5	5
6	2
6	3
6	6

Sample of output of Query 2

Query 3: Adding the below ‘HAVING COUNT(A.Numerator) <= 1’ clause and selecting only ‘A.Numerator’ displays results that have only been repeated once. Please do remember here that unlike our second solution where there were exactly two divisors, in this query we have removed 1 from the B.Numerator by adding a ‘WHERE LEVEL > 1’. Hence, the prime numbers have only one divisors.

SELECT A.Numerator
FROM (SELECT LEVEL AS Numerator 
FROM DUAL CONNECT BY LEVEL <= 1000) A 
,(SELECT LEVEL AS Denominator 
FROM DUAL WHERE LEVEL > 1 CONNECT BY LEVEL <= 1000) B 
WHERE A.Numerator >= B.Denominator
AND MOD(A.Numerator , B.Denominator) = 0
GROUP BY A.Numerator
HAVING COUNT(A.Numerator) <= 1
ORDER BY A.Numerator;

A.Numerator

Sample of output of Query 3

The below is another way to write down the Solution 3. I could remove the ‘WHERE LEVEL > 1’; change the ‘COUNT’ logic to take A.Numerator that are repeated only twice and that would give me all the prime numbers including ‘1’

SELECT A.Numerator
FROM (SELECT LEVEL AS Numerator 
FROM DUAL CONNECT BY LEVEL <= 1000) A 
,(SELECT LEVEL AS Denominator 
FROM DUAL WHERE LEVEL > 1 CONNECT BY LEVEL <= 1000) B 
WHERE A.Numerator >= B.Denominator
AND MOD(A.Numerator , B.Denominator) = 0
GROUP BY A.Numerator
HAVING COUNT(A.Numerator) <= 1 2
ORDER BY A.Numerator;

A.Numerator

Sample of output of Query 3 without the ‘WHERE’ clause and ‘COUNT <=2’

There’s one more solution that I had listed in my Day 5 post but I am just going to stop here since the above three solution does pretty good job of helping to understand the basic logic and functions that can be used to solve the challenge.

I did some python recap besides catching up on the solution but I will not be posting on this for today since I am quite tired and I expect to be pretty jammed up with work tomorrow. May be another post series on Python will follow but my main focus will be to continue my HackerRank SQL challenge for the remaining days.

Medium Challenge 4: Using LISTAGG() And LEVEL() Function – Part 2

Leave a comment Cancel reply

Medium Challenge 4: Using LISTAGG() And LEVEL() Function – Part 2

Share this:

Leave a comment Cancel reply