显示标签为“csv”的博文。显示所有博文
显示标签为“csv”的博文。显示所有博文

2012年3月25日星期日

Flat File Source Problem...

I have a weird thing happening.
I have an .csv file. When I try to load it into a table, I can do it easily in DTS 2000. But when I am trying to do it in SSIS 2005 with exactly the same settings (like Text qualifier, row delimiter etc.), I am getting an error: "The last row in the sampled data is incomplete. The column or the row delimiter may be missing or the text is qualified incorrectly." I looked at the file and it looks complete to me.

What could be the problem ?

P.S. DTS 2000 is on 32-bit Windows, and SSIS 2005 is on 64-bit Windows 2003. Could that we a problem ?

Victor_V wrote:

I have a weird thing happening.
I have an .csv file. When I try to load it into a table, I can do it easily in DTS 2000. But when I am trying to do it in SSIS 2005 with exactly the same settings (like Text qualifier, row delimiter etc.), I am getting an error: "The last row in the sampled data is incomplete. The column or the row delimiter may be missing or the text is qualified incorrectly." I looked at the file and it looks complete to me.

What could be the problem ?

P.S. DTS 2000 is on 32-bit Windows, and SSIS 2005 is on 64-bit Windows 2003. Could that we a problem ?

I suppose it could be a problem, but DTS has historically been more forgiving when processing files. Are you sure you don't have a carriage/line-feed as the last line in the file? Sometimes this happens, and SSIS will think that is an incomplete row (It is, technically).|||It is true that the parsing behavior in SSIS is very different from DTS 2000. DTS was very forgiving as Phil mentioned, and there could be an extra carriage/line-feed which is causing this. Also DTS used to process files even if not all columns were populated in a row. However, SSIS would bail out on such files.|||

How do I check for this extra carriage return character ? It's not really visible. What editor would I use ?

Also, I usually get those "pipe" delimited files from Excel files, saving them as .CSV files.

Thanks,

Victor.

|||

Victor_V wrote:

How do I check for this extra carriage return character ? It's not really visible. What editor would I use ?

Also, I usually get those "pipe" delimited files from Excel files, saving them as .CSV files.

Thanks,

Victor.

Go to the last line of the file. Does the cursor sit at the end of a row of data, or on it's own line? That's one way.|||It sits on it's own line (at the beginning of a new line)..|||

Victor_V wrote:

It sits on it's own line (at the beginning of a new line)..

Hit backspace on that line and then save the file. Try running it again. Or try a ragged-right format.|||

Actually, I just opened this file in UltraEdit in HEX view, and the last 2 characters in the file are "OD OA", which is the end of the line, just like at the end of every row...

That's very strange...

|||

Victor_V wrote:

Actually, I just opened this file in UltraEdit in HEX view, and the last 2 characters in the file are "OD OA", which is the end of the line, just like at the end of every row...

That's very strange...

That's not strange, that's the point. SSIS sees that as a row, but there aren't any columns in it.|||No, what strange is that those 2 characters ARE the last characters in the row (and each row), and there is nothing else there. That's how it looks like in UltraEdit HEX editor. But when I open the same file in Notepad and hit Ctrl+End to go to the end of the file, that's when the cursor goes to the new row, instead of stopping at the end of the last row.|||I want to be sure on a few things...

The format is: Delimited
The text qualifier is: " (if needed)
The Header row delimiter is: {CR}{LF}

Under columns:

The Row delimiter is: {CR}LF}
The Column delimiter is: Comma {,}|||

The format is: Delimited
The text qualifier is: <none>

The Header row delimiter is: {CR}{LF}

Under columns:

The Row delimiter is: {CR}LF}
The Column delimiter is: Comma {|}

|||Phil ? Anyone ? Any thoughts ?|||

Victor_V wrote:

Phil ? Anyone ? Any thoughts ?

Can you post a sample line from the csv file?|||

Here are the last 5 rows in my .csv file:

12|Community Network Services, Inc|MI|C|T1020-Low|Personal care services, per diem |Blank||Per Diem||49|13461|13461|359677.92|26.72|26.72||0|13,461|13,461|0|615013.79|99.60%||0.00%|433.3|0.07%|2031.64|0.33%|617,479|0.00|45.69|0.00|0.03|0.15|45.87|24.96|20.63|33.51|22.81|29.7|same method from prior year
12|Community Network Services, Inc|MI|C|T1020-Moderate|Personal care services, per diem |TF||Per Diem||29|7391|7391|619809.26|83.86|83.86||0|7,391|7,391|0|490943.23|99.63%||0.00%|265.36|0.05%|1554.76|0.32%|492,763|0.00|66.42|0.00|0.04|0.21|66.67|68.16|53.58|68.53|54.88|67.43|rates reflect similar to prior years
12|Community Network Services, Inc|MI|C|T2003|Non Emergency Transportation|||Trip||2|148|148||0.00|0.00||0|148|148|0|544.16|100.00%||0.00%||0.00%||0.00%|544|0.00|3.68|0.00|0.00|0.00|3.68|0|7.99|0|7.92|0|
12|Community Network Services, Inc|MI||GF Pharmacy|GF Pharmacy||||||||||||||||||||||||753,848||||||||||||
12|Community Network Services, Inc|MI||H0038|Drop-in Center||||||||||||||||||||||||447744.18||||||||||||

Thanks, Phil.

2012年3月22日星期四

Flat file CSV problem

Hi all,

I hope someone can help with a problem i'm having.
I want to process a large number of CSV files into various tables in an SQL database.
The CSV file contains entries on a row by row basis relating to specific events (indicated by an eventID in column 0).
Eventually i think i want to be using a conditional split to process each row seperately depending on the eventID but before i get this far i am having a problem with the source data.

Each event can have varying amounts of columns filled in in the CSV file. And each CSV file can have multiple event types in it.

The flat file manager seems to merge a number of different rows into one within the preview pane. It seems to ignore the end of row delimiter of CrLf.

Can anyone please help me to sort this so that each row is on its own and will allow me to pass the structured data set to a conditional split task?

Many thanks in advance,

GrantOk,

I seem to have managed to set all the data for each row into one component and have a script to extract the EventID to one output column and the remaining parameters to a second column. I need to perform a check on the EventID by passing it and another variable into a stored procedure. How do i go about doing this in the data flow section? is it possible or do i have to look at using the control flow section for this?

Thanks,

Grant|||

Grant,

What kind of check does that procedure performed? Keep in mind that any operation you define in the dataflow will be performed in row by row basis; so a call to a procedure in a data flow will be executed as many times as rows you have in the file. Since you already succeed on separating the eventID from the rest I would try to use a conditional split transformation based on the EventID value and then to performed specific transformations/checks to every data pipeline.

Rafael Salas

|||Hi Rafael,

Thanks for the response. The stored procedure i was talking about returns a value based on the event ID. Having thought about what you said i can set this value manually after the conditional split has been carried out and i know what the event ID is. Does that sound more plausible?

I am trying to rewrite a windows application that processes these files, initially this used a stored procedure to process each row of data. The main stored procedure calls other sub procedures within it. The problem was that the stored procedure itself was getting very unwieldly with a large number of if and nested if statement which meant that the addition of new events was complex and time consuming. Using the SSIS package a believe i can make this a much easier process to manage.

Is it still possible to call a stored procedure in the dataflow task once the conditional split has been performed? i realise that there may be other stored procedures required once i have my data row?

I have just attempted to run some SQL code on a per row basis and have discovered that i cannot seem to user variables or parameters in the OLE DB Command task. How would i go about either returning a value dependant on if the current eventID exists in another table in the database or indeed; how to insert data into a specific table if it doesn't exist.
After i do that then i can get on to processing the event data to the table where this is stored. I hope that makes sense.

Many thanks,

Grant|||

Wow a lot of questions!

First to all let me clarify that there is nothing wrong with having OLE DB Command tasks in your Dataflow; it is just that as personal pratice I try to use bulk operations against the DB when possible.

If what you are trying to use thr stored procedures for is to check if a row exists in a table; you can use the lookup transformation in your dataflow; then use the error output as your insert pipeline and the output as the 'already exists' kind of pipeline (or just not use it if you want to ignore them). The lookup can be also returned other columns from your lookup table if that is what you need

Rafael Salas

|||I tend to ask a lot of them yes :)

I looked into the lookup transformation which i can see how i would use the error output etc. I was then using an OLEDB destinbation to insert rows to the table. The problem with the OLEDB destination is that i cannot loop back to the lookup. With an oledb command would i script the insert command and then be able to loop back to the lookup transformation.
E.g.

Should this be ok to enter as a SQL Command:

if (select count(*) from SerialPartRev Where SerialPartRev.SerialNo = ? and SerialPartRev.PartNo = ?) = 0
begin
Insert into SerialPartRev (SerialNo, PartNo, RevisionNo)
Values(?,?,?)
end

The reason i want to do this is so that the SerialPartRev table is update automatically. I require to do the same thing with a Username table. The big problem is how to check again for the entry before processing further. As i have found looping back to the lookup isn't possible due to it only accepting one input path.

I'm quite happy to accept that i am doing this wrong and that maybe someone could suggest another process for implementing this.

Cheers,

Grant|||Rafael,

I have been reading you're previous post again and whilst i understand what you are saying about using the output and error output in the pipeline depending on if the row exists but one question still remains. Firstly i have to say that there will be a couple of instances where i have to check if data from the columns in the flat file exists in SQL tables. Regardless of if i have to insert the row manually or if it already exists after both of these operations the output still has to go to the one conditional split task. Effectively it splits the path in two and then rejoins after carrying out the necessary functionality. Is this possible without the use of scripting? I believe i can achieve this via a script although it will mean exposing a password in plain taxt so that the database connection string will work properly.

Thank you,

Grant|||Actually, forget it. I have managed to call the stored procedures form the OLE DB commands. I have no idea what i was doing wrong previously but it seems to be working now.

Thanks for you're help on this matter it was most appreciated. I'm sure i'll have more questions in time.

Cheers,

Grant

2012年3月21日星期三

Flat file (CSV) source format

I have a SSIS package loading a lot of CSV file, which first line is the column head. Some file are ordered differently. However, package still try to load the file use predefined column order (it seems it doesn't check the head of each file see if it matchs the predefined column order).

Any way to force the package the check each file's head? or I had to manually check it using VB.Net script?

No, the Flat File connection manager cannot adapt to dynamic file formats. You can workaround with script. These threads should be helpful.

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1408850&SiteID=1
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1438953&SiteID=1