Log in

View Full Version : Deleting duplicate email addresses from a database


slloyd
21-08-09, 12:01 PM
We have a very small DB with 15K rows made up of email addresses for mailshots, and have another spreadsheet with 3K rows
of unsubscribed requests, with email addresses included.

I want to be able to strip this information from the main DB, thus not sending a mailshot to those who have requested not to receive this information.

Does anyone have any code, formula, script or CSV parser to carry out such command.

As I have opened the files in excel, I've been looking at VLOOKUP but can't get my head around.

thanks in advance.

Baph
21-08-09, 12:28 PM
My advice, would be to add a column in the DB to allow each row to specify mailshot preferences, lets call it "allowMailShot" for now, with a boolean value.

Then you just run:
SELECT mailAddr FROM table WHERE allowMailShot=1;

Importing the data from CSV should be pretty simple, even if it's an MSAccess "database."

HTH.

ooger
21-08-09, 02:12 PM
Ah, the MS office solution:

If you've opened them in Excel and are having a can't be *rsed Friday afternoon (and can reimport back to the db) assuming you don't want to use SQL:

Make column A the values from the 15K rows
Make column B the formula =MATCH(A1,C:C,0)
Make column C the 3K unsubscribed peoples emails.

Copy and paste row B down to as many rows as column C has

On any row the number displayed in column B indicates on which row that email address is found in column A, if it is found - therefore, convert all formulas to their text results, replace all blanks in column B with zeroes , sort the range A,B,C by column B and the values that are >=1 are the matches where an unsubscribed email is in the 15K list - delete them and reimport.


Failing that, write the lot in Lotus Notes and be done with it!

Kinvig
21-08-09, 03:30 PM
SELECT DISTINCT EMAILADDRESS FROM TABLE

will do it.

slloyd
21-08-09, 03:44 PM
[QUOTE=ooger;2011808]Ah, the MS office solution:

If you've opened them in Excel and are having a can't be *rsed Friday afternoon (and can reimport back to the db) assuming you don't want to use SQL:
QUOTE]

Ooger thanks for this, you must know my Friday afternoon antics.

Defo, especially after 2hrs of Friday lunchtime watering hole ventures ;)

Anyway, managed to the numbered results in column B but a few are reporting back as #N/A is this correct? or does this mean no matches found.

Now need to do the following but not sure how to:

convert all formulas to their text results, replace all blanks in column B with zeroes , sort the range A,B,C by column B and the values that are >=1 are the matches where an unsubscribed email is in the 15K list - delete them and reimport

slloyd
21-08-09, 03:46 PM
Baph & Kinvig, thanks for your input and advice but right now my SQL knowledge and workings are a little rusty so the Excel option seems like the one for me, especially on a Friday afternoon - when I wanted to spend my time looking for a replacement end can :(

Kinvig
21-08-09, 03:48 PM
what format's the data in at the moment?

ooger
21-08-09, 03:50 PM
#n/a means no matches found for the MATCH formlua, its fine as long as you can identify them as being seperate. I'd replace them with zeroes though cos itll make the sorting better.

Now you've done that select all, and choose copy, the select all and choose "Paste Special" and past the VALUES of the calcs only into the sheet ontop of whats there already. That removes the formulas, then you can begin sorting and deleting without worrying about it trying to recalc the lot as you work.

HTH

ooger
21-08-09, 03:51 PM
what format's the data in at the moment?


Courier New, size 14. Why?


:p

slloyd
21-08-09, 03:54 PM
CSV sitting in Excel 2007

Kinvig
21-08-09, 03:56 PM
CSV sitting in Excel 2007

So - you just want a list of unique email addresses - is that right?

slloyd
21-08-09, 04:10 PM
Well sort of...just want to delete email addresses in Column A from existing Email Addresses from Column B

slloyd
21-08-09, 04:15 PM
Ah, the MS office solution:

...sort the range A,B,C by column B and the values that are >=1 are the matches where an unsubscribed email is in the 15K list - delete them and reimport.

Can you just break this down, and I'll be home free...thanks.

slloyd
21-08-09, 04:22 PM
Sort the range A,B,C by column B and the values that are >=1 are the matches ???? where an unsubscribed email is in the 15K list - delete them and reimport.