Introduction
The Average Population of Each Continent problem on HackerRank focuses on calculating the average population of cities within each continent. This problem requires using SQL INNER JOIN
to combine data from two tables, the CITY
and COUNTRY
tables, and then applying the AVG
aggregate function to compute the average population. To round off the result, we also use the FLOOR
function.
Problem Statement
You are given two tables:
CITY
: Contains information about various cities, including their population and associated country code.COUNTRY
: Contains information about countries, including their country code and continent.
The task is to calculate the average population of cities in each continent and round down the result to the nearest integer. To achieve this, you need to join the two tables on their respective country codes and group the result by continent.
Solution
The solution involves performing an INNER JOIN
between the CITY
and COUNTRY
tables. Then, we use the AVG
function to calculate the average population for each continent, followed by the FLOOR
function to round the average population to the nearest integer. Finally, the query groups the results by the continent to provide the desired output for each continent.
SELECT COUNTRY.CONTINENT, FLOOR(AVG(CITY.POPULATION))
FROM CITY
INNER JOIN COUNTRY ON COUNTRY.CODE = CITY.COUNTRYCODE
GROUP BY COUNTRY.CONTINENT;
Explanation
Let's break down the query step-by-step:
- SELECT COUNTRY.CONTINENT, FLOOR(AVG(CITY.POPULATION)): We select the
CONTINENT
from theCOUNTRY
table and the average population of the cities in that continent from theCITY
table. TheFLOOR
function is applied to round down the average population to the nearest integer. - FROM CITY: We are starting with the
CITY
table, which contains the population data for various cities. - INNER JOIN COUNTRY ON COUNTRY.CODE = CITY.COUNTRYCODE: We use an
INNER JOIN
to combine the rows from theCITY
table with those from theCOUNTRY
table where theCOUNTRYCODE
in theCITY
table matches theCODE
in theCOUNTRY
table. This join gives us access to the continent information for each city. - GROUP BY COUNTRY.CONTINENT: The
GROUP BY
clause groups the rows by continent, so that we can calculate the average population for each continent separately.
Example Test Case
Consider the following data in the CITY
and COUNTRY
tables:
-- CITY Table +-------------------+------------------+------------+ | CITY | COUNTRYCODE | POPULATION | +-------------------+------------------+------------+ | Casablanca | MAR | 2000000 | | Cairo | EGY | 5000000 | | Lagos | NGA | 8000000 | | Nairobi | KEN | 3500000 | | Sydney | AUS | 5000000 | | Melbourne | AUS | 4000000 | | Johannesburg | ZAF | 3000000 | +-------------------+------------------+------------+ -- COUNTRY Table +------------+-------------------+-----------+ | CODE | NAME | CONTINENT | +------------+-------------------+-----------+ | MAR | Morocco | Africa | | EGY | Egypt | Africa | | NGA | Nigeria | Africa | | KEN | Kenya | Africa | | AUS | Australia | Oceania | | ZAF | South Africa | Africa | +------------+-------------------+-----------+
In this case, the cities in Africa are Casablanca, Cairo, Lagos, Nairobi, and Johannesburg, while Sydney and Melbourne are in Oceania. The result of the query would be:
Africa 4300000 Oceania 4500000
Real-World Use Case
This query is useful in situations where you need to analyze the population distribution across continents. For example, in demographic studies, organizations may want to know the average population size of cities in each continent to understand trends in urbanization. This query can also be useful for organizations working with international policies or those interested in comparing regional population growth rates.
Complexity Analysis
The time complexity of this query is O(n + m), where n is the number of rows in the CITY
table and m is the number of rows in the COUNTRY
table. The INNER JOIN
operation requires scanning both tables to match rows based on the join condition. The GROUP BY
operation will group the rows by continent and apply the AVG
function to each group, which requires additional scanning of the grouped data.
The space complexity is O(1) as we only retrieve the results for each continent, which doesn't require additional space allocation beyond the result set.
Conclusion
The Average Population of Each Continent problem is an excellent way to practice using SQL aggregate functions like AVG
and FLOOR
. By joining tables and grouping data by continent, you can easily calculate and analyze the population distribution across continents. Mastering these concepts is essential for working with large datasets and performing data analysis tasks.