Hi,
I'm trying to extract data from a Flat File which is as fixed length as they come. The file has a header, which simply contains the number of records in the file, followed by the records, with no header delimeter (No CR/LF, nothing).
For example a file would look like the following:
00000003Name1Address1Name2Address2Name3Address3
So this has 3 records (indicated by the first 8 characters), each consisting of a Name and Address.
I can't see a way to extract the data using a flat file connection, unless we add a delimeter for the header (not possible at this stage). Am I wrong?
Any suggestions on possible solution would be much appreciated - I'm thinking Ill have to write a script to parse the file manually.
Thanks in advance,
Scott
Do you need the data in the first row?
You can just ignore it by setting the "Header rows to skip" setting to 1.
K
|||Yes. Essentially the file is just one row..Which would include the header details (number fo records) then all of the fixed length records follow (on the same line)
Scott
|||Scott,
Given the unstructured nature of this file I think you will have to parse it out yourself in a script task. This isn't as daunting as it sounds. First clue I can give you is that it will have to be an asynchronous script task.
You can still import it into the pipeline using a Flat File Connection Manager though. It'll be a 1-column, 1-row file that's all.
-Jamie
|||
Using a script component of type source should be easiest. A little example follows.
Here is my sample file, representing an 8 byte header, followed by three rows of two columns, 10 and 20 bytes respectively.
00000003A234567890B234567890C234567890D234567890E234567890F234567890G234567890H234567890I234567890
The results table will look a bit like this-
Name Address
A234567890 B234567890C234567890
D234567890 E234567890F234567890
G234567890 H234567890I234567890
You will need to create the two columns in the script component , Name and Address as DT_WSTR 10 and 20 in length.
Now the code-
Public Class ScriptMain
Inherits UserComponent
Private stream As StreamReader
Public Overrides Sub CreateNewOutputRows()
Dim headerRecordCount As Integer
Dim recordCount As Integer = 0
'// Get filename from connection, using full acquire method
Dim filename As String = CType(Me.Connections.Connection.AcquireConnection(Nothing), String)
'// Open source file
stream = New StreamReader(filename)
'// Reader header block, 8 characters
Dim headerBuffer(7) As Char
If stream.ReadBlock(headerBuffer, 0, 8) = 8 Then
'// Store record count for later use in validation
headerRecordCount = CType(New String(headerBuffer), Integer)
Else
Throw New Exception("Invalid file format, header not valid.")
End If
With Output0Buffer
While stream.Peek > 0
'// Add data rows
.AddRow()
.Name = ReadColumn(10)
.Address = ReadColumn(20)
recordCount = recordCount + 1
End While
'// Close down buffer
.SetEndOfRowset()
'// Check record count
If recordCount = headerRecordCount Then
Me.Log(String.Format("Header row count ({0}) matched toital rows found.", headerRecordCount), 1, Nothing)
Else
Throw New Exception(String.Format("Invalid file format, header row count ({0}) not equal to rows found ({1}).", headerRecordCount, recordCount))
End If
End With
End Sub
Private Function ReadColumn(ByVal length As Integer) As String
Dim buffer(length - 1) As Char
If stream.Read(buffer, 0, length) = length Then
Return New String(buffer)
Else
Throw New Exception("Invalid file format, full column length not found.")
End If
End Function
End Class
|||Thanks for the answers guys..Have gone with using a script component of type source, as per code above. With one change...Just ensured that the stream is closed after processing to ensure the resources are released...
I also added an extra Output for the script which holds the header details - my real data file has extra (useful) details in the header.
Thanks again..
Scott
没有评论:
发表评论