Deleting duplicate records from a table

Jun 03 2017 11:22 PM by Kishan Mashru

So, first of all, we will create a table and insert some records in it. Then we will write a query to delete the duplicate records from the table. In this article, we will show an example of deleting duplicate records from a table using three different techniques such as using GROUP BY and HAVING, an analytical function such as row_number, and the third one by using the correlated subquery. The reader can decide on which way he is comfortable with doing the job.

They key to delete duplicate records lies in the following two steps:

  1. Decide on your Primary Key(How will you consider a record duplicate, which column(s) data if repeated will be considered as duplicate)
  2. Find the duplicate records
  3. Delete them using rowid

So let's begin.

STEP-1 Create a table

-- Create  table

CREATE TABLE COUNTRY_MAS

COUNTRY_ID NUMBER,
COUNTRY_NAME VARCHAR2(50)
);

 

STEP-2 Insert some records in the table

--  insert some records with duplicates in the table
INSERT ALL
INTO COUNTRY_MAS VALUES(100,'INDIA')
INTO COUNTRY_MAS VALUES(101,'UK')
INTO COUNTRY_MAS VALUES(102,'USA')
INTO COUNTRY_MAS VALUES(103,'KSA')
INTO COUNTRY_MAS VALUES(100,'INDIA')
INTO COUNTRY_MAS VALUES(104,'UAE')
INTO COUNTRY_MAS VALUES(101,'UK')
INTO COUNTRY_MAS VALUES(105,'CANADA')
INTO COUNTRY_MAS VALUES(106,'GERMANY')
SELECT * FROM DUAL;

COMMIT;

Here I will consider a record as duplicate if the country_name column data is being repeated. Now, as you can see you have two records with COUNTRY_NAME as 'INDIA' and 'UK'.

 

1. DELETE DUPLICATE RECORD USING GROUP BY AND HAVING CLAUSE

 

So we would write a query to find the duplicate with GROUP BY and HAVING clause.

 

SELECT COUNTRY_NAME, COUNT(1) FROM COUNTRY_MAS GROUP BY COUNTRY_NAME;

--- OUTPUT

COUNTRY_NAME COUNT(1) 

---------------------------------------------------

GERMANY              1
USA                  1
CANADA               1
INDIA                2
UAE                  1
UK                   2
KSA                  1

 

Now, we know that for COUNTRY_NAME such as 'UK' and 'INDIA' the count(1) is 2, meaning they appear twice in the table. So now we can get all the records with count(1) > 1 and put them into an IN clause and delete them as below

 

DELETE FROM COUNTRY_NAME WHERE COUNTRY_NAME IN ('UK','INDIA');

 

If you want to take the game to the next level, instead of manually writing the COUNTRY_NAME in the IN clause, we can target the ROWID of all the duplicate records and delete them, What we will do is we will take the MIN(ROWID) for all the unique COUNTRY_NAME records.

SELECT MIN(ROWID)
FROM COUNTRY_MAS
GROUP BY COUNTRY_NAME;

-- output

ROWID
------------

AAAU3NAAEAAAAndAAI
AAAU3NAAEAAAAndAAC
AAAU3NAAEAAAAndAAH
AAAU3NAAEAAAAndAAA
AAAU3NAAEAAAAndAAF
AAAU3NAAEAAAAndAAB
AAAU3NAAEAAAAndAAD

As you can see we got 7 rowid, one for each record (there are 9 records in the table). Now we can delete all the extra records from the table as follows:

 

DELETE FROM COUNTRY_MAS WHERE ROWID NOT IN (
SELECT MIN(ROWID)
FROM COUNTRY_MAS
GROUP BY COUNTRY_NAME);

 

-- 2 records deleted

 

2. DELETE DUPLICATE RECORD USING CORRELATED SUBQUERY

The user can find the duplicate even by using the correlated subquery such as below:

SELECT COUNTRY_NAME FROM COUNTRY_MAS C2 WHERE ROWID IN ( 
SELECT MIN(ROWID)
FROM COUNTRY_MAS C1
WHERE C1.COUNTRY_NAME = C2.COUNTRY_NAME);

Now delete the duplicate:

DELETE FROM COUNTRY_MAS WHERE ROWID NOT IN (
SELECT ROWID FROM COUNTRY_MAS CM WHERE ROWID IN (
SELECT MIN(ROWID)
FROM COUNTRY_MAS C
WHERE CM.COUNTRY_ID = C.COUNTRY_ID
AND CM.COUNTRY_NAME = C.COUNTRY_NAME));

 

3. DELETE DUPLICATE USING ANALYTICAL FUNCTION

The user can find the duplicate even by using the  analytical function such as below:

SELECT ROWID FROM (
SELECT ROWID, ROW_NUMBER() OVER(PARTITION BY COUNTRY_ID, COUNTRY_NAME ORDER BY COUNTRY_ID) RANK
FROM COUNTRY_MAS)
WHERE RANK=2;

Now delete the duplicate:

DELETE FROM COUNTRY_MAS WHERE ROWID IN (
SELECT ROWID FROM (
SELECT ROWID, ROW_NUMBER() OVER(PARTITION BY COUNTRY_ID, COUNTRY_NAME ORDER BY COUNTRY_ID) RANK
FROM COUNTRY_MAS)
WHERE RANK=2);