Introduction

The "Weather Observation Station 4" problem on HackerRank focuses on using aggregate functions like COUNT() along with DISTINCT to calculate how many city names appear more than once in the STATION table.

Problem Statement

You are given a table named STATION with several city entries. Some cities may appear multiple times. Your task is to find the total number of non-distinct city entries — in other words, how many duplicates exist across the CITY column.

Solution

The idea is to subtract the number of distinct city names from the total number of city records to determine how many duplicate city names exist.

SQL Query:


SELECT (COUNT(CITY) - COUNT(DISTINCT CITY))
FROM STATION;

Explanation:

  • COUNT(CITY) counts all non-null entries in the CITY column.
  • COUNT(DISTINCT CITY) counts only the unique city names.
  • The difference between the two gives the number of city names that appear more than once.

Example Test Cases

Test Case 1

Input Table:

ID | CITY
-----------
1  | Tokyo
2  | Osaka
3  | Tokyo
4  | Kyoto
5  | Osaka

Expected Output:


2

Explanation: The cities Tokyo and Osaka both appear twice. So, total entries = 5, distinct cities = 3, hence duplicates = 5 - 3 = 2.

Complexity Analysis

The time complexity is O(n) where n is the number of rows in the table. SQL engines efficiently optimize the DISTINCT and COUNT operations, especially if indexes are applied on the CITY column.

Conclusion

This problem demonstrates how simple aggregate functions can be combined to extract meaningful insights from data. Knowing how to compare total counts versus unique values is a fundamental SQL skill for data analysis and cleaning.

Example Test Cases

Test Case 2

Input: All cities are unique.

ID | CITY
-----------
1  | Paris
2  | Berlin
3  | Madrid

Output:


0

Explanation: No city is repeated, so duplicates = 3 - 3 = 0.

Related Problems for Practice