Maddening problem parsing text file, with reproduceable example

porthos

New member
Local time
Today, 05:31
Joined
May 23, 2010
Messages
1
Hello-
I've setup a database that parses large amounts of incoming text feeds. It works great but I've noticed that occasionally some corrupt data comes in. However when looking back through a specific example it actually appears the data is fine and when access is importing the file (via the transfertext method), it mysteriously drops a character. I traced back to the file in question and have figured out how to easily replicate the problem, and it does it every time. For the life of me I can't figure out why. I'm wondering if perhaps there are special characters in here that I cannot see that is causing that single line not to parse correctly? I would be extremely grateful if anyone could point me in the right direction. And if you don't want to try and recreate it, any general advice on a text viewer that could help me rule out special characters (i.e., new line / carriage return) would be helpful. I'm using notepad/wordpad.

Here's how to reproduce the problem:
Take the attached file, do a new table import in access. Default settings, first line is headers, comma delimited. The file imports fine and all the data is ok EXCEPT one line... I believe it's line 159, but the easiest way to know for sure is to do a find for the number "1598486731" which appears only once in the "orderID" column. The field that ends up being incorrect in access is the first number in that row, the "price" field. The text file clearly shows the value to be 110999999.94 but access imports it as 10999999.94. I.e., it drops the 1 in the front. If you view it in excel or wordpad it looks fine.

help! Thanks so much in advance.


Note: I am running this file through a perl script before it gets to access, so I can probably manipulate it to filter out what's causing the parsing to fail, if I knew what it was :/
 

Attachments

Last edited:
there appears to be something wrong with the text file.

it is just showing as a single line, rather than multi line

it has small square characters in there every so often, which will be non printing chars, of some sort.
 
I actually can't get it to import at all into Acess2007 because I get a message saying it would create duplicate values in an index. This happens even after I remove the indexes from every field in the import.:confused:

The file is Unix format with line breaks as CR only without LineFeed. These are the little square characters. Microsoft prefers CRLF line breaks. Hence it appears as a single line. I think if you look carefully you might find other errors in your import.

I replaced the CR with CRLF and it imports fine without loosing the digit.

Here is the very crude Sub I knocked up.

Code:
Sub fixfile()
 
Const ForReading = 1
Const ForWriting = 2
 
Dim fso As Object
Dim infile As Object
Dim outfile As Object
Dim intext As String
Dim outtext As String
 
   Set fso = CreateObject("Scripting.FileSystemObject")
   Set infile = fso.OpenTextFile("c:\test.txt", ForReading)
   Set outfile = fso.CreateTextFile("c:\out.txt", ForWriting)
 
   intext = infile.Readall
   outtext = Replace(intext, vbCr, vbCrLf)
   outfile.writeline (outtext)
 
Set infile = Nothing
Set outfile = Nothing
Set fso = Nothing
 
End Sub
 
Last edited:

Users who are viewing this thread

Back
Top Bottom