2012年3月26日星期一

Flat File With Fixed Length Header and No Delimeter

Hi,

I'm trying to extract data from a Flat File which is as fixed length as they come. The file has a header, which simply contains the number of records in the file, followed by the records, with no header delimeter (No CR/LF, nothing).

For example a file would look like the following:

00000003Name1Address1Name2Address2Name3Address3

So this has 3 records (indicated by the first 8 characters), each consisting of a Name and Address.

I can't see a way to extract the data using a flat file connection, unless we add a delimeter for the header (not possible at this stage). Am I wrong?

Any suggestions on possible solution would be much appreciated - I'm thinking Ill have to write a script to parse the file manually.

Thanks in advance,

Scott

Do you need the data in the first row?

You can just ignore it by setting the "Header rows to skip" setting to 1.

K

|||

Yes. Essentially the file is just one row..Which would include the header details (number fo records) then all of the fixed length records follow (on the same line)

Scott

|||

Scott,

Given the unstructured nature of this file I think you will have to parse it out yourself in a script task. This isn't as daunting as it sounds. First clue I can give you is that it will have to be an asynchronous script task.

You can still import it into the pipeline using a Flat File Connection Manager though. It'll be a 1-column, 1-row file that's all.

-Jamie

|||

Using a script component of type source should be easiest. A little example follows.

Here is my sample file, representing an 8 byte header, followed by three rows of two columns, 10 and 20 bytes respectively.

00000003A234567890B234567890C234567890D234567890E234567890F234567890G234567890H234567890I234567890

The results table will look a bit like this-

Name Address
A234567890 B234567890C234567890
D234567890 E234567890F234567890
G234567890 H234567890I234567890

You will need to create the two columns in the script component , Name and Address as DT_WSTR 10 and 20 in length.

Now the code-

Public Class ScriptMain

Inherits UserComponent

Private stream As StreamReader

Public Overrides Sub CreateNewOutputRows()

Dim headerRecordCount As Integer

Dim recordCount As Integer = 0

'// Get filename from connection, using full acquire method

Dim filename As String = CType(Me.Connections.Connection.AcquireConnection(Nothing), String)

'// Open source file

stream = New StreamReader(filename)

'// Reader header block, 8 characters

Dim headerBuffer(7) As Char

If stream.ReadBlock(headerBuffer, 0, 8) = 8 Then

'// Store record count for later use in validation

headerRecordCount = CType(New String(headerBuffer), Integer)

Else

Throw New Exception("Invalid file format, header not valid.")

End If

With Output0Buffer

While stream.Peek > 0

'// Add data rows

.AddRow()

.Name = ReadColumn(10)

.Address = ReadColumn(20)

recordCount = recordCount + 1

End While

'// Close down buffer

.SetEndOfRowset()

'// Check record count

If recordCount = headerRecordCount Then

Me.Log(String.Format("Header row count ({0}) matched toital rows found.", headerRecordCount), 1, Nothing)

Else

Throw New Exception(String.Format("Invalid file format, header row count ({0}) not equal to rows found ({1}).", headerRecordCount, recordCount))

End If

End With

End Sub

Private Function ReadColumn(ByVal length As Integer) As String

Dim buffer(length - 1) As Char

If stream.Read(buffer, 0, length) = length Then

Return New String(buffer)

Else

Throw New Exception("Invalid file format, full column length not found.")

End If

End Function

End Class

|||Thanks for the answers guys..

Have gone with using a script component of type source, as per code above. With one change...Just ensured that the stream is closed after processing to ensure the resources are released...

I also added an extra Output for the script which holds the header details - my real data file has extra (useful) details in the header.

Thanks again..

Scott

没有评论:

发表评论