Introduction
The "Weather Observation Station 4" problem on HackerRank focuses on using aggregate functions like COUNT()
along with DISTINCT
to calculate how many city names appear more than once in the STATION
table.
Problem Statement
You are given a table named STATION with several city entries. Some cities may appear multiple times. Your task is to find the total number of non-distinct city entries — in other words, how many duplicates exist across the CITY
column.
Solution
The idea is to subtract the number of distinct city names from the total number of city records to determine how many duplicate city names exist.
SQL Query:
SELECT (COUNT(CITY) - COUNT(DISTINCT CITY))
FROM STATION;
Explanation:
COUNT(CITY)
counts all non-null entries in theCITY
column.COUNT(DISTINCT CITY)
counts only the unique city names.- The difference between the two gives the number of city names that appear more than once.
Example Test Cases
Test Case 1
Input Table:
ID | CITY ----------- 1 | Tokyo 2 | Osaka 3 | Tokyo 4 | Kyoto 5 | Osaka
Expected Output:
2
Explanation: The cities Tokyo and Osaka both appear twice. So, total entries = 5, distinct cities = 3, hence duplicates = 5 - 3 = 2.
Complexity Analysis
The time complexity is O(n) where n is the number of rows in the table. SQL engines efficiently optimize the DISTINCT
and COUNT
operations, especially if indexes are applied on the CITY
column.
Conclusion
This problem demonstrates how simple aggregate functions can be combined to extract meaningful insights from data. Knowing how to compare total counts versus unique values is a fundamental SQL skill for data analysis and cleaning.
Example Test Cases
Test Case 2
Input: All cities are unique.
ID | CITY ----------- 1 | Paris 2 | Berlin 3 | Madrid
Output:
0
Explanation: No city is repeated, so duplicates = 3 - 3 = 0.
Related Problems for Practice
- Weather Observation Station 3 – Use DISTINCT and filtering.
- Select By ID – Filtering specific row entries.
- Weather Observation Station 1 – Selecting basic columns from a table.