How do you know they are duplicates? Because of the timestamp alone? Because of some other id (device id - i'm guessing here) and timestamp? If so, create a UNIQUE KEY on those two or three fields. The insert/update will then fail on duplicates of those fields.
Link:
Creating UNIQUE KEY using CREATE TABLE
Link:
Creating UNIQUE KEY using ALTER TABLE
"Each field" in the record is EXACTLY like the one before (when it duplicates) - except for the primary key which is created on the server.
I am guessing that a UNIQUE key is a secondary index on the table. Yes, I could create one based on several fields, but would a 'REPLACE INTO' work here as well (with second index) ?
I don't want the batch to fail (not insert anything when dupe is found - which happens now on non indexed tables - without the REPLACE INTO statement), but REPLACE the same record and carry on with new, valid stuff.
I have another table that doesn't use a PK (primary key, auto inc). REPLACE INTO works well since an AI index (primary key) does not interfere with it. YET, good practice states that each table SHOULD have a pk to be optimized.
Essentially, add all NEW data. Don't duplicate if record ( a series of fields have the exact same data - a new index to maintain). Overwrite (REPLACE) if it does exist (no harm, no foul).
In my experience, (other, secondary) indexes are subject to corruption. When they get out of sort - all hell breaks loose. I try and keep data retrieval and puts simple ( updates, inserts) .
Also, as seen in the past, the additional indexes (can) grow larger than the table they are working on...???
I know, this is open ended.. What is best for what.
Thanks for your expert input and still seek alternatives (as Erels post above).