2012年3月26日星期一

Flatfiles with garbage data

I was testing my packages today and my packages were running sucessfully when I didnt have any valid data in the flat file. One the reason was as not rows were returned from the flat file none of the validation script components returned any err.

How do I count the # of rows which were read from flat file from the package and continue only if there is more than one row.

I tried using conditional split but as I wont have the row count value availble till the dataflow task runs this didnt help.

Is it best for me to have two dataflow tasks one resturns the count of records from flat file and the other starts if there are any rows. Now my problem is if I have rows to process how to I transfer the flatfile data to validate from DataFlowTask1 to DataFlowTask2?

I have a script task whcih counts the rows and decides to the TaskResult but once the TaskResult is sucess how do I use the values read in DataFlowTask1?

Appreciate ur help in advance

ray_newbie_SSIS wrote:

I was testing my packages today and my packages were running sucessfully when I didnt have any valid data in the flat file. One the reason was as not rows were returned from the flat file none of the validation script components returned any err.

How do I count the # of rows which were read from flat file from the package and continue only if there is more than one row.

I tried using conditional split but as I wont have the row count value availble till the dataflow task runs this didnt help.

Is it best for me to have two dataflow tasks one resturns the count of records from flat file and the other starts if there are any rows. Now my problem is if I have rows to process how to I transfer the flatfile data to validate from DataFlowTask1 to DataFlowTask2?

You are on the right track with using two data-flows. the answer to your predicament is to use raw files to transfer data between the data flows.

ray_newbie_SSIS wrote:

I have a script task whcih counts the rows and decides to the TaskResult but once the TaskResult is sucess how do I use the values read in DataFlowTask1?

Appreciate ur help in advance

Have you considered using the RowCount component to count the number of rows?

-Jamie

|||Have your first data flow have a flat file input and a rowcount operator, and then have the second data flow have all the processing. If the reading of the flat file is very expensive, i.e. over a slow network, you could store the data read in data flow 1 to a raw file and then use the raw file as the source in data flow 2. You can then put a conditional clause between the two data flows based on the variable used in the rowcount in the first data flow.|||

This is what I have as of now

DataFlowtask1 has a FlatFileSource and RowCount Component

Script Task which will check the value of Row count from DataFlowTask1 and decide is the PackageResult Sucess or Failure

If Sucess I have to start validating the Flat file, and blah blah

So you guys suggest read the data into Rawdata files and use it in DataFlowTask2. Never knew I could do that,I will research on this...

IS this possible?

In one of my packages where I am doing similar flat file processing but with one dataflow

I have Flat file Source and script component and olddb destination

In scriptcomponent can i not declare a variable in preexecute and increment in Input0Buffer and check this val in post execute and if this value is 0 then raise an err....dont u think this is much simpler for me than using two dataflows...Need ur suggestion
Regards

|||

ray_newbie_SSIS wrote:

Script Task which will check the value of Row count from DataFlowTask1 and decide is the PackageResult Sucess or Failure

Don't do that. Instead, put a conditional precedence constraint between DF1 and DF2 that checks the rowcount value. If its greater than 0, DF2 executes.

Precedence Constraints: http://www.sqlis.com/default.aspx?306

-Jamie

|||

The sugestion basically is to read the file twice. Once to find out how many rows are in the file; the second to actually move the data.

I would recommend you to use an expression in the precedence constraint that goes to the 2nd dataflow (using the variable) that holds the number of rows in the file to decide whether to execute the 2nd dataflow or not; as opposed of using the script task to fail the package.

http://msdn2.microsoft.com/en-US/library/ms140153.aspx

Rafael Salas

|||

Rafael Salas wrote:

The sugestion basically is to read the file twice. Once to find out how many rows are in the file; the second to actually move the data.

Rafael,

That's not quite true. The suggestion is to read the file once and pass the data between data-flows using raw files.

-Jamie

|||

Jamie,

I don't understand how to 'pass the data between data-flows using raw files'...at some point the data gets read a second time right? Could you please clarify?

Rafael Salas

|||

Rafael Salas wrote:

Jamie,

I don't understand how to 'pass the data between data-flows using raw files'...at some point the data gets read a second time right? Could you please clarify?

Rafael Salas

Correct. When you said "The sugestion basically is to read the file " I read that to mean you read the same file twice. I wanted to clarify for the original poster that you read the sourcec file once and the raw file once.

Sorry for the confusion.

-Jamie

|||

It is clear now. Thanks for the clarification and sorry for the confusion

Rafael Salas

没有评论:

发表评论