Validate csv file for illegal delimiter before import

Morten

Registered User.
Local time
Today, 18:16
Joined
Sep 16, 2009
Messages
53
Hi

I have a big csv/text file that contains 148 fields and around 160.000 records at the moment. The file contains a ; as a delimiter and a " as a text qualifier.

Some of the records look like this:

"Text";"text and; text"

As you can see there is an extra delimiter within the quotations, which result in records that is not imported correct to my table in Access.

My questions is:

1. How can I validate the csv/text file for the extra delimiter and present the records that contains the extra delimiter in a query?
2. How can I then clean up the csv/text file in Access, so I get rid of the extra delimiter?

Hope you can help me.

Best regards
Morten
 
it shouldn't. the text delimiter should treat everything between the "" as text

given

"Text";"text and; text"

you should end up with

field 1: Text
field 2: text and; text
 
I have a similar issue where some fields were quoted and some weren't and that threw off the import field positioning. Unfortunately, the code is at my other job. I will post the code solution on Monday.
 
I agree with Dave that quoted text should prevail. As long as that "extra delimiter" is within the quoted text, it should NOT be considered a delimiter. The delimiter(field delimiter/separator) resides between strings of quoted text (field values).

"AAAAAAA";"bbb";"WWWW;XXX;YY"
only the ; should be delimiters.

Do you have some specific examples that "fail"?

Are you using the Import wizard?
 
out of interest, there is another " character that has a different ascii code.

it slopes slightly differently, but the 2 can be confused.

Are you picking the wrong character. The "normal" quote char is ascii 34.
 
Do you have some specific examples that "fail"?

Are you using the Import wizard?

I can upload some examples on monday, because the file is at my work.

Yes, I'm using the import wizard and link the csv file to the database.
 
I just tested an import through the import wizard.

raw data in file testsemi.txt

Code:
"field1";"field2"
"Text";"text and; text"
"textstring";"some;; semicolons;;"

resulting table

Code:
ID	field1	field2
1	Text	text and; text
2	textstring	some;; semicolons;;
 
Hi again,

What a fool I am...I found out the problem is not the delimiter ;. Instead one of the columns (long text) contains an extra qualifier " inside the text.

E.g.

"short text";"long long" long text";"text"

Best regards
Morten
 
not stupid. It probably should have occurred to us as well.

It's very difficult to deal with problems like that.

If you are in charge of generating the csv, you may be able to find a way to fix it.

I would use a delimiter that will never be found in the text (maybe a pipe symbol), and then use no text delimiter which will avoid problem with embedded quotes.
 
Last edited:
Here is the code. What it does is take the csv file and use excel automation to convert to an excel file as excel seems to be able to parse the quotes within as part of the field rather than leading to unbalanced delimiters. I call it from a button on a form, but can be called directly.

Since took out some parts specific to our process, if an error occurs, I may have missed something.

This is the button on the form click event
Code:
Private Sub btnUpdateLinks_Click()
'Convert CSV to XLSX to get around quotation issue
'since link doesn't work well if quotes appear as
'part of the field rather than as part of the
'delimiter.
'
'--------Example these are ok:--------
'xyz,"Find the Fraud | 12345", abc
'xyz,Find the Fraud | 12345, abc
'-------------------------------------
'------Example these not are ok:------
'xyz,"Find the "Fraud" | 12345", abc
'xyz,Find the "Fraud" | 12345, abc
'-------------------------------------
'Converting and formating excel file
'seems to resolve this issue. Since this is a
'linked table, to avoid notifications, delete
'the table before recreating it with the new data
'20160216
'delete the existing file before importing the new data
stfilename = "ExistingFileName.xlsx"
            Kill stfilename
    Call ExcelToText(stfilename)
    
End Sub
This next part goes in a global module. You may not need to public variable, but left it in. Also, left in some of the formatting and renaming of field headers in case need an idea of that to help get around excel and access assumptions about data format. If field headers are properly named, don't need to do that here. With this code, you do not need to set a reference to excel.

Code:
Public globalintExcelVer As Integer

Public Sub ExcelToText(ByVal stfilepath As String)
'http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28277710.html
'http://www.developark.com/1529_17747527/
'http://www.mrexcel.com/forum/excel-questions/235867-using-column-numbers-define-range-visual-basic-applications.html
'Change the formatting of an excel file to text for all used cols (last col)
'Save as to another name (filename + txt.xlsx)
'Need to set a reference to Excel (tools references)
'20150228 ss
'Incorporated Version Check so file does not need to be reopened before import
'20150521
'Changed so no longer needs to set reference to excel
'Found that last col wasn't getting all data, switched to .cells without
'range in order to capture entire sheet.
'Changed to convert all reformatted files into Excel 2007-2013 format xlsx.
'-previously kept reformat in original version, but was causing issues down
'the road.  Simplifies import process as all imported files are now the
'same format.
'201506??

    Dim objapp As Object
    Dim wb As Object
    Dim lastcol As Long
    Dim ExcelVersion As Long
    
    On Error Resume Next
    Set objapp = CreateObject("Excel.Application")
    objapp.Visible = True
    
    If Dir(stfilepath) Then
        Set wb = objapp.workbooks.Open(stfilepath, True, False)
    End If
    With wb.sheets(1)
        .cells.NumberFormat = "@"
        .Name.Delete    'Delete Named range
        .cells(1, 1) = "PrimaryKey"
        .cells(1, 2) = "FirstName"
        .cells(1, 3) = "MiddleName"
        .cells(1, 4) = "LastName"
        .Columns(5).NumberFormat = "General"
        .cells(1, 5) = "Age"
        .cells(1, 6) = "CType"
        .cells(1, 7) = "EventCode"
        .cells(1, 8) = "EventTitle"
        .Columns(9).NumberFormat = "m/d/yyyy"
        .cells(1, 9) = "StartDate"
    End With
        'http://www.rondebruin.nl/win/s5/win001.htm
        'filefomatstuff 20150405
    wb.SaveAs filePath(stfilepath) & FileNameNoExt(stfilepath) & "txt.xlsx", FileFormat:=51
    ExcelVersion = wb.FileFormat
    
    wb.Close 'savechanges:=False
  
        objapp.Quit

    Set objapp = Nothing
    Select Case ExcelVersion
        Case 39
            ExcelVersion = 5 'excel7
        Case 50, 51
            ExcelVersion = 9 'excel12 (2007-2013, xlsx)
        Case 56
            ExcelVersion = 8 'excel8 (97-2003 format in Excel 2007-2013, xls)
    End Select
    globalintExcelVer = ExcelVersion
End Sub
 
I would never ever take a csv thru excel. much better to use the csv directly (docmd.transfertext) - you can design a import specification sheet with a csv, which you can't do with excel.


xyz,"Find the "Fraud" | 12345", abc
xyz,Find the "Fraud" | 12345, abc

given this, then use a field separator of comma, and no text qualifier, and your data should come in OK.

The problem then is if any of the strings have embedded commas.
 
While I agree setting up the import spec is better, we often get data that doesn't cleanly fit into a spec and found that going this route allowed some flexibility - (don't have to constantly create/edit specs or go into the csv and try to clean it up for import which is time consuming and may still miss something) - since this converts all cols to text, except for the ones we specify otherwise.

The problem then is if any of the strings have embedded commas.
We do have the embedded comma issue, which is what led to the excel conversion.
 

Users who are viewing this thread

Back
Top Bottom