Introduction

The Average Population of Each Continent problem on HackerRank focuses on calculating the average population of cities within each continent. This problem requires using SQL INNER JOIN to combine data from two tables, the CITY and COUNTRY tables, and then applying the AVG aggregate function to compute the average population. To round off the result, we also use the FLOOR function.

Problem Statement

You are given two tables:

  • CITY: Contains information about various cities, including their population and associated country code.
  • COUNTRY: Contains information about countries, including their country code and continent.

The task is to calculate the average population of cities in each continent and round down the result to the nearest integer. To achieve this, you need to join the two tables on their respective country codes and group the result by continent.

Solution

The solution involves performing an INNER JOIN between the CITY and COUNTRY tables. Then, we use the AVG function to calculate the average population for each continent, followed by the FLOOR function to round the average population to the nearest integer. Finally, the query groups the results by the continent to provide the desired output for each continent.


SELECT COUNTRY.CONTINENT, FLOOR(AVG(CITY.POPULATION))
FROM CITY
INNER JOIN COUNTRY ON COUNTRY.CODE = CITY.COUNTRYCODE
GROUP BY COUNTRY.CONTINENT;

Explanation

Let's break down the query step-by-step:

  • SELECT COUNTRY.CONTINENT, FLOOR(AVG(CITY.POPULATION)): We select the CONTINENT from the COUNTRY table and the average population of the cities in that continent from the CITY table. The FLOOR function is applied to round down the average population to the nearest integer.
  • FROM CITY: We are starting with the CITY table, which contains the population data for various cities.
  • INNER JOIN COUNTRY ON COUNTRY.CODE = CITY.COUNTRYCODE: We use an INNER JOIN to combine the rows from the CITY table with those from the COUNTRY table where the COUNTRYCODE in the CITY table matches the CODE in the COUNTRY table. This join gives us access to the continent information for each city.
  • GROUP BY COUNTRY.CONTINENT: The GROUP BY clause groups the rows by continent, so that we can calculate the average population for each continent separately.

Example Test Case

Consider the following data in the CITY and COUNTRY tables:

-- CITY Table
+-------------------+------------------+------------+
| CITY              | COUNTRYCODE      | POPULATION |
+-------------------+------------------+------------+
| Casablanca        | MAR              | 2000000    |
| Cairo             | EGY              | 5000000    |
| Lagos             | NGA              | 8000000    |
| Nairobi           | KEN              | 3500000    |
| Sydney            | AUS              | 5000000    |
| Melbourne         | AUS              | 4000000    |
| Johannesburg      | ZAF              | 3000000    |
+-------------------+------------------+------------+

-- COUNTRY Table
+------------+-------------------+-----------+
| CODE       | NAME              | CONTINENT |
+------------+-------------------+-----------+
| MAR        | Morocco           | Africa    |
| EGY        | Egypt             | Africa    |
| NGA        | Nigeria           | Africa    |
| KEN        | Kenya             | Africa    |
| AUS        | Australia         | Oceania   |
| ZAF        | South Africa      | Africa    |
+------------+-------------------+-----------+

In this case, the cities in Africa are Casablanca, Cairo, Lagos, Nairobi, and Johannesburg, while Sydney and Melbourne are in Oceania. The result of the query would be:

Africa    4300000
Oceania   4500000

Real-World Use Case

This query is useful in situations where you need to analyze the population distribution across continents. For example, in demographic studies, organizations may want to know the average population size of cities in each continent to understand trends in urbanization. This query can also be useful for organizations working with international policies or those interested in comparing regional population growth rates.

Complexity Analysis

The time complexity of this query is O(n + m), where n is the number of rows in the CITY table and m is the number of rows in the COUNTRY table. The INNER JOIN operation requires scanning both tables to match rows based on the join condition. The GROUP BY operation will group the rows by continent and apply the AVG function to each group, which requires additional scanning of the grouped data.

The space complexity is O(1) as we only retrieve the results for each continent, which doesn't require additional space allocation beyond the result set.

Conclusion

The Average Population of Each Continent problem is an excellent way to practice using SQL aggregate functions like AVG and FLOOR. By joining tables and grouping data by continent, you can easily calculate and analyze the population distribution across continents. Mastering these concepts is essential for working with large datasets and performing data analysis tasks.

Related Problems for Practice