I have created/modified a data import and format process for reports (web pages displaying statistical outputs, *not* Crystal or SSRS) on our intranet, but I am having trouble streamlining the process and as a result the thing is failing on me due to a lack of disk space for the TempDB in SQL Server. I am asking for help on how to improve what I have done.
A quick bit of background....
We have a series of tables on our SQL Server (SQL2K) that are used by our website, which are refreshed nightly by copying down data from our iSeries AS/400. Separate from the website, but related to the data content, we have a series of reports that need to be run. The previous developer set up a few DTS packages that would copy down AS/400 data into a second set of tables on the SQL Server. I noticed that the tables for the web are basically 80% of the data being copied into these other tables. So, by adding a few columns to some of the original tables, creating a couple smaller tables to grab the rest, and modifying the DTS packages that copy down the "web data", I could avoid the need to go back to the AS/400 for the second run. The idea here is to have as small a time footprint on the 400 as possible.
What I have done....
I have tried to modify the way this process had been set up under the previous person for getting the data to run these reports. That process would copy some data down from the 400, have the SQL Server churn it over, then copy the new data back up to the 400 into a temp table there, have the 400 do some churning, copy the new data down from the 400 again, and so on. I wanted to grab the data from the 400 (as most of it was for the web anyway, as I mentioned) then let the SQL Server format it up, letting the 400 go on about its way.
These reports deal with our business data formatted from a specific point of view, as it were. To speed up the displaying of the information in the reports, there is a base table that holds the formatted data that each report can query from. To get the appropriate data into the base table I set up a three step process involving a Temp table (2 columns), Difference table (2 columns), the actual base table (many columns), and the other tables that contain the actual business data.
Step1) Insert into the Temp table a pair of identifiers that will be used to single out data in the "business tables":
INSERT INTO All_Clai_Temp(Clai_Numb, Proc_Numb)
SELECT DISTINCT
t.TCLAIM
, x.TPROC#
FROM SQLTRAN t
JOIN SQLUB92 u ON ( t.TCLAIM = u.UCLAIM
AND t.TMBR# = u.UMBR# )
JOIN SQLTRANXTR x ON ( t.TCO# = x.TCO#
AND t.TICO# = x.TICO#
AND t.TMBR# = x.TMBR#
AND x.TPROC# > 0
AND x.TPAYDT >= '2001-10-01' )
JOIN SQLMEMB m ON t.TMBR# = m.MMBR#
WHERE t.TSTATS = '3'
Step2) If there were any records inserted into the Temp table from the step above, then copy only those rows from the Temp table into the Difference table where the second identifier is not in the main table:
IF (SELECT Count(Clai_Numb) FROM All_Clai_Temp) > 0
BEGIN
INSERT INTO All_Clai_Diff(Clai_Numb, Proc_Numb)
SELECT a.Clai_Numb, a.Proc_Numb
FROM All_Clai_Temp a
WHERE a.Proc_Numb NOT IN (SELECT b.Proc_Numb
FROM All_Clai b)
END
Step3) Lastly, and only if there were any records copied in the previous step, copy a select set of business data from three tables matching the identifiers within the Difference table:
IF (SELECT Count(Clai_Numb) FROM All_Clai_Diff) > 0
BEGIN
INSERT INTO All_Clai(Clai_Numb, Busi_Line_Numb, Proc_Numb, Prov_Refe_Numb, Memb_ID, Serv_Prov_ID, Bill_Prov_ID, Refe_Prov_ID, Bill_Type, Place_of_Serv, Tier_Leve, From_DOS, To_DOS, Admi_Date, Admi_Hour, Admi_Type, Admi_Sour, Disc_Hour, Pati_Stat, Reve_Code, Diag_Code_1, Diag_Code_2, Diag_Code_3, Diag_Code_4, Diag_Code_5, Diag_Code_6, Diag_Code_7, Diag_Code_8, Diag_Code_9, E_Code, Admi_Diag_Code, Prin_Proc_Code, Prin_Proc_Code_Date, Proc_Code, Proc_Modi_1, Proc_Modi_2, Othe_Proc_Code_1, Othe_Proc_Code_1_Date, Othe_Proc_Code_2, Othe_Proc_Code_2_Date, Othe_Proc_Code_3, Othe_Proc_Code_3_Date, Othe_Proc_Code_4, Othe_Proc_Code_4_Date, Othe_Proc_Code_5, Othe_Proc_Code_5_Date, Clai_Rece_Date, Paid_Date, Quan, Unit, Bill_Amou, Auth_Amou, Othe_Paid_Amou, Tota_Paid, Clai_Stat, Clai_Acti_Code, Auth_Numb, Form_Type, Invo_Numb, Chec_Numb, Crea_By, Crea_Date, Upda_By, Upda_Date)
SELECT t.TCLAIM
, 1 AS Busi_Line_Numb
, x.TPROC#
, x.TREF#
, t.TMBR#
, t.TADOC#
, t.TPAYE#
, x.TRFBY#
, u.UBTYPE
, x.TSRVPL
, x.TLEVEL
, t.TSRVDT
, t.TSRVTDT
, u.UADMDT
, u.UADMHR
, u.UADMTY
, u.UADMSR
, u.UDISHR
, u.UPSTAT
, t.TREVCD
, x.TDIA1
, x.TDIA2
, x.TDIA3
, x.TDIA4
, x.TDIA5
, x.TDIA6
, x.TDIA7
, x.TDIA8
, x.TDIA9
, u.UECODE
, u.UADMDG
, u.UPPROC
, u.UPPROD
, t.TPROC
, t.TMOD2
, t.TMOD3
, u.UOPRC1
, u.UOPRD1
, u.UOPRC2
, u.UOPRD2
, u.UOPRC3
, u.UOPRD3
, u.UOPRC4
, u.UOPRD4
, u.UOPRC5
, u.UOPRD5
, t.TRCVDT
, t.TPADDT
, t.TQTY
, x.TRVSU
, t.TCHRGS
, t.TAUTH
, t.TCOB
, t.TFLD16
, t.TSTATS
, t.THOLD
, t.TAUTH#
, x.TFORM
, t.TINV#
, t.TCHK#
, x.TLOGUS
, x.TLOGDT
, t.TADDUS
, t.TADDDT
FROM SQLTRAN t
JOIN SQLTRANXTR x ON (t.TCO# = x.TCO# AND t.TICO# = x.TICO# AND t.TMBR# = x.TMBR#)
LEFT JOIN SQLUB92 u ON (t.TCLAIM = u.UCLAIM AND t.TMBR# = u.UMBR# )
JOIN All_Clai_Diff d ON (t.TCLAIM = d.Clai_Numb AND x.TPROC# = d.Proc_Numb)
END
Columns and Indexes:
SQLMemb table has 33 columns
UNIQUE CLUSTERED INDEX [CL_SQLMEMB_ID] ON [dbo].[SQLMEMB]([SQLMEMB_ID]) ON [PRIMARY]
INDEX [NCL_SQLMEMB_MCo#,MMbr#] ON [dbo].[SQLMEMB]([MCO#], [MMBR#]) ON [PRIMARY]
INDEX [NCL_SQLMEMB_MMbr#] ON [dbo].[SQLMEMB]([MMBR#]) ON [PRIMARY]
SQLTran table has 34 columns
UNIQUE CLUSTERED INDEX [CL_SQLTRAN_ID] ON [dbo].[SQLTRAN]([SQLTRAN_ID]) ON [PRIMARY]
INDEX [NCL_SQLTRAN_TCo#,TMbr#,TaDoc#,TPaye#,TsRvDt,THold] ON [dbo].[SQLTRAN]([TCO#], [TMBR#], [TADOC#], [TPAYE#], [TSRVDT], [THOLD]) ON [PRIMARY]
SQLTranXtr table has 24 columns
UNIQUE CLUSTERED INDEX [CL_SQLTRANXTR_ID] ON [dbo].[SQLTRANXTR]([SQLTRANXTR_ID]) ON [PRIMARY]
INDEX [NCL_SQLTRANXTR_TCo#,TICo#,TMbr#] ON [dbo].[SQLTRANXTR]([TCO#], [TICO#], [TMBR#]) ON [PRIMARY]
INDEX [NCL_SQLTRANXTR_TCo#,TICo#,TMbr#, TProc#] ON [dbo].[SQLTRANXTR]([TCO#], [TICO#], [TMBR#], [TPROC#]) ON [PRIMARY]
INDEX [NCL_SQLTRANXTR_TPayDt] ON [dbo].[SQLTRANXTR]([TPAYDT]) ON [PRIMARY]
SQLUB92 table has 27 columns
UNIQUE CLUSTERED INDEX [CL_SQLUB92_ID] ON [dbo].[SQLUB92]([SQLUB92_ID]) ON [PRIMARY]
INDEX [NCL_SQLUB92_UCo#, UICo#, UMbr#, UClaim] ON [dbo].[SQLUB92]([UCO#], [UICO#], [UMBR#], [UCLAIM]) ON [PRIMARY]
All_Clai table has 81 columns
CONSTRAINT [PK_All_Clai] PRIMARY KEY CLUSTERED ([Clai_Numb],[Busi_Line_Numb],[Proc_Numb])
INDEX [All_Clai1] ON [dbo].[All_Clai]([Proc_Numb], [Clai_Numb], [Memb_ID]) ON [PRIMARY]
INDEX [All_Clai3] ON [dbo].[All_Clai]([Memb_ID]) ON [PRIMARY]
I have tried to eliminate as many unnecessary steps and queries as possible, I have run the Index Tuning Wizard on all the tables based on the workload, but I am running into issues. The first two steps each take several minutes (around 10 minutes or so) to complete and seem to be pretty intensive to run. I watched the TempDB .mdf file spin up to over 3GB in size by the time those two stored procs had finished.
But the failure comes in step 3.....it fails due to lack of disk space. I only have 30GB on the disk. This is our Dev SQL Server that I am working on, and the production box has 10 times that amount, but I don't need to run queries that could possibly expand the TempDB to hundreds of Gigs in size. I watched the free space on the disk drop to 10Meg after many, many minutes of step 3 running, and then fails out. This thing runs and runs until I run out of space, could be 20 mins or more.
I call upon the SQL Server gurus out here for help on how to make this process better. What can I do to speed it up the queries, use less disk space in the TempDB, reduce the processing cost, etc? Your input is greatly appreciated.
-- Andrew
My new philosophy!
Life's journey is not to arrive at the grave safely in a well preserved body, but rather to skid in sideways, totally worn out, shouting "...holy ***...what a ride!"