FTS Index Size: Fix legacy data - Missing primary key + duplication record + large table

2012年3月9日星期五

Fix legacy data - Missing primary key + duplication record + large table

We have a large table which is very old and not much ppl take care about, recently there is a performance problem from the report need to query to this table. Eventally we find that this table have primary key missing and there is duplicate data which make "alter table add primary key" don't work

Besides the data size of this table require unacceptable time to execute something like "insert into new_table_with_pk from select distinct * from old table"

Do you have any recommendation of fixing this? As the application run on oracle , sybase and sql server, is that cross database approace will work?How big is ur table?.have u tried to create non clustered index on the field(s) of the tables which ur report using in filter condition.U have to do this modification when ur database have less load.|||How big is ur table?.have u tried to create non clustered index on the field(s) of the tables which ur report using in filter condition.U have to do this modification when ur database have less load.

It is about 2 millions rows. We haven't create non clustered index as we like to remove duplication and create primary key. So you mean first create index then do [insert into new_table from select distinct * from old_table]?|||first of all 2m records is not that big.distinct keyword is the killer here,it will affect the perfomance of ur query
Some options are,

change ur database option to 'true' for 'select into/bulkcopy' before u doing select * into.

exec sp_dboption 'urdatabase','select into/bulkcopy',true

this will improve the perfomance of the query 'select into' .

u should try checksum function in sql to remove duplicate.
before that add new column in exist table with identity,primary key

alter table urtable add [id] int identity(1,1)
CONSTRAINT id_pk PRIMARY KEY

-- select checksumvalues
eg:count duplicate in city,state,zip,contract in authors table.And delete duplicates

use pubs
go
delete authors from (
select max(au_id) as au_id,checksumvalue from
(
select au_id,au_lname,checksum(city,state,zip,contract) as checksumvalue from authors
) as t group by checksumvalue
having count(*)>1
) as t1 where authors.au_id=t1.au_id and authors.au_lname=t1.au_lname
--Note u should compare atleast one column from duplicate columns in where --clause(because checksum functiont is not 100% acuurate) as u can see , I --used 'au_lname' field in the above query

Same way u should compare with ur new [id] column which u created.

finally u have to build index on that table for better perfomance.

come back to me,if have doubts on how to remove duplicates using checksum function. try to put tables DDL also.

NB: dont forgot change back database option to,

exec sp_dboption 'urdatabase','select into/bulkcopy',false

after finish this job.

2012年3月9日星期五

Fix legacy data - Missing primary key + duplication record + large table

没有评论:

发表评论

FTS Index Size

Blog Archive

About Me