External Data Parsing

PaulO · May 17, 2011

I've been working steadily on a project involving the connection of an Access database to some external RFID hardware, invoking and then reading a .txt file and then parking parsed data in my database.

All has gone well thus far albeit I was working under the misunderstanding that there would only be one dataset to read consisting of one Header record and many Data records. My queries treated the Header record separately to all others and everything was sweet ...

However, it transpires that there could be multiple (unlimited) datasets and the resulting .txt file doesn't have common code between header and data records that I can use, simply stacking the data, presenting me with the difficulty of how to group/separate individual datasets.

An example .txt file is attached ...

Visually, it's easy to see which data belongs to which set and where the dataset breaks fall, but my question to you all is how can I write a Query (much preferred to VB code) to assign the same prefix to associated Header and Data records.

The prefix I'd like to use is the 56th character in each header row i.e. in my example data file this would be "T", "1", "2"

Thanks for taking the time to look at this ...

Galaxiom · May 17, 2011

Sorry but this task really requires code.
Best show an example of how you want the completed data to appear.

PaulO · May 17, 2011

OK good point ... modified (desired) .txt file attached

Galaxiom · May 17, 2011

You seem to be aiming at preprocessing the textfile so you can import it as External data. I expect then you will be manipulating it further to get it into a relational structure.

It is probably better to parse the data straight from the original text file directly into the final structure by opening the textfile as a TextStream and writing into recordsets.

PaulO · May 17, 2011

Galaxiom said:
You seem to be aiming at preprocessing the textfile so you can import it as External data. I expect then you will be manipulating it further to get it into a relational structure.

It is probably better to parse the data straight from the original text file directly into the final structure by opening the textfile as a TextStream and writing into recordsets.

Yes ... ultimately I end up with multiple fields of data extracted from the long text strings ... this part of the process is complete and works and even works with the three datasets now being worked with ... but ... they are not referenced correctly to their Headers which is the current problem.

gemma-the-husky · May 17, 2011

i would do this
since the structure is

Code:

header
  detail
  detail

you need to read the data in sequence, and process accordingly

sort of

Code:

read first line
while not eof
   if header then
      processheader
  end if

  if line then
     processline (using stored details of last header)
  end if

  read next line
wend

PaulO · May 17, 2011

That looks kinda logical, but you're talking to a right VB numpty!

Looking again at my text file, and referring to the hardware manufacturer's Protocol document to which I have to refer to, it seems that Characters 56 and 57 define the Header set (not just character 56). However these characters, in the detail records, can also be populated, so maybe it's best not to use these characters to define the Headers.

Instead, the first 19 characters i.e. "RPRASO0217000008273" will be the same in each header row. So where a replication of the same 19 characters arises this defines a new recordset. Characters 56 and 57 can then be used to differentiate between the headers.

How would the VB code look, given this updated view?

gemma-the-husky · May 17, 2011

I am just assuming the rows beginning RP are header rows, and the rows beginning GB are associated data for that header.

the bracketed numbers, and the stars on row1, indicate it might be slightly trickier - but it's your data - so you should know how it is structured.

Have you added the bracketed numbers, or are they part of the data stream?

PaulO · May 17, 2011

Apologies, the original .txt file is attached to my first posting. The second attachment shows how, perhaps, I'd like to amend that file to look like.

I'm afraid that we can't assume that "RP*" defines a header row, nor that " GB" prefixes each detail record, as all these codes may vary.

What I deduce from the protocol is that the first 19 characters, where replicated, establish a separate (unique) recordset header.

Galaxiom · May 17, 2011

The crux of any parsing is a solid definition of what constitutes the difference between a header and a detail line.

Would the absence of whitespace in the first few characters be a reliable indicator of a header?

gemma-the-husky · May 17, 2011

It doesn't change things. If you "know" what the structure looks like, you can tell Access/VBA how to process it.

eg - the headers have something in character 1 - and the detail lines don't. But if you cannot explain the format of the data, you will not be able to process it.

PaulO · May 17, 2011

I have a full data dictionary/map in the Protocol, which I'm following as best I can.

However the text file itself is a bit of a mash with no clearly defined or consistent structure (especially when looking at the detail records which on occasion hold 'like' data in different character positions, and sometimes has leading spaces and sometimes not. Nonetheless I have managed to tell Access how to process the detail data records correctly using some select queries based on rules from the Protocol.

What's missing is the clear definition and addition of a Header 'tag' to each detail record.

VB's not my specialty but in my laymans VB code I'm looking for:-

first record must be recognised as [header1]
next record ... if left([nextrecord],19) <> left([header1],19), then update to mid([header1],56,2)&")"&[nextrecord]
next record ... if left([nextrecord],19) <> left([header1],19), then update to mid([header1],56,2)&")"&[nextrecord]
etc (loop)
else [nextrecord] must be recognised as [header2]
etc
end

hope that makes some sense?

PaulO · May 18, 2011

Thanks to both of you for your interest and assistance ... much appreciated.

For now I've developed a classic 'PaulO' workaround whereby I parse and load all the data into a holding table, linking to the first header row, and then display all the parsed records in a Form and allow the user to select/change an alternative header as required.

In most cases there will only be the ONE dataset so the user will perform a visual validation of the data before clicking a POST button to transfer the records from the holding table into the master data table. Where there are two or more datasets, the number of records in each dataset will on most occasions be less than 10 so it's not too onerous a task for the user.

Not perfect, I know, but hey we can only do what we know.

PaulO · Jun 15, 2011

Unfortunately my workaround is not workable ... it puts too much onus on the user to manually delete unwanted records ... so I'm back looking for a solution that identifies 'header' records (the first 19 characters will be exactly the same e.g. in our example "RPRASO0217000008273") and then associates relevant 'detail' records

e.g.

first record must be recognised as [header1]
next record ... if left([nextrecord],19) <> left([header1],19), then update to mid([header1],56,2)&")"&[nextrecord]
next record ... if left([nextrecord],19) <> left([header1],19), then update to mid([header1],56,2)&")"&[nextrecord]
etc (loop)
else [nextrecord] must be recognised as [header2]
etc
end

Hope someone can translate that into workable code ... ?

Beetle · Jun 15, 2011

Not sure if this will handle all possible scenarios in your source text file, but using the example files you posted, the following code will import the text file into a temporary table, then loop through the records and add the correct prefix to each row based on the "header" row as defined by the first 19 characters of each row. It allows for the fact that the desired prefix character is in position 56 in the first "header" row and in position 57 in the other "header" rows (this would need to be modified as well if the desired prefix character could be in a position other than 56 or 57). I've tested it and it results in a record set that looks exactly like the second text file you posted, minus some white space.

Code:

Dim strFileName As String
Dim rs As DAO.Recordset
Dim strHeader As String
Dim strPrefix As String
Dim intPos As Integer
 
strFileName = "C:\YourFolder\racedata.txt"
 
'import the text file
DoCmd.TransferText , , "tblTemp", strFileName
 
'create a recordset based on the Temp table
Set rs = CurrentDb.OpenRecordset("tblTemp", dbOpenDynaset)
 
If rs.RecordCount <> 0 Then
'loop through the recordset
With rs
    .MoveFirst
    strHeader = Left(!F1, 19) 'determine the header from the first row
    Do While Not .EOF
        If Left(!F1, 19) = strHeader Then
            'set the position for the prefix character
            If Mid(!F1, 56, 1) <> "0" Then
                intPos = 56
            Else
                intPos = 57
            End If
        'get the prefix character
        strPrefix = Mid(!F1, intPos, 1) & ")"
        End If
        'update the current row
        rs.Edit
        !F1 = strPrefix & !F1
        rs.Update
        .MoveNext
    Loop
End With
End If
 
Set rs = Nothing
 
End Sub

gemma-the-husky · Jun 15, 2011

paulo - given that this is called "racedata" can you give an indication of what sort of race

human race, horse race, or does race imply some socio-political thing.

my first thought was it was horse-race data - but can you confirm?

----
to this end, it might be worthwhile you confirming what the sample represents, in general

and in particular
1. what are the three "header" rows
2. what are the subrows associated with these headers
3. why is the very first header row not formatted like the others.

PaulO · Jun 15, 2011

Sean

I'll give your code a whirl and report back!

Dave

Welcome to the world of PIGEON racing!

The three header rows represent a Training flight plus two Race flights
The detail records represent information about each pigeon timed in each flight
The first header record is a Training flight (indicated by "T") and training flight header records simply don't hold as much data as Race flights

gemma-the-husky · Jun 15, 2011

PaulO said:
Sean

I'll give your code a whirl and report back!

Dave

Welcome to the world of PIGEON racing!

there you are then - the same as horse racing, only slightly faster.

so what does the data actually represent. Is rows 1-4 a result, or something else. It looks like subsequent sets are individual races. but maybe each set is set of timings for a single bird.

we need some more info!

can you give us a link to the explanation?

PaulO · Jun 15, 2011

Apologies Dave I edited my last posting #17 giving a bit more info ... happy to answer any additional questions you have

PaulO · Jun 15, 2011

Sean

DELIGHTED!! Running your code gets me the following:-

T)RPRASO0217000008273120511140759************************T
T)GB10 N099529HC92A20B100O12233012050000000000000000000000000000000000000000
T)GB10 N099507HC924E08400M13091312050000000000000000000000000000000000000000
T)GB10 N099519HC927209400J13103612050000000000000000000000000000000000000000
T)GB05 J035642HC916D64000M09441713050000000000000000000000000000000000000000
2)RPRASO021700000827312051114075900000000000000000000000002 14/05 MESSAC BBCR
2)GB07 F032063HC915164300X18264814050000000000000000000000000000000000000000
2)GB07 F032097CC919163D00J20480014050000000000000000000000000000000000000000
2)GB09 N029333HC910D64200J08384115050000000000000000000000000000000000000000
2)GB09 N029325CC912763E00M00000000000000000000000000000000000000000000000000
2)GB10 N075114CC92A605A00M00000000000000000000000000000000000000000000000000
2)GB09 R046446CC916164100J00000000000000000000000000000000000000000000000000
2)GB04 B008868CC915F64000J00000000000000000000000000000000000000000000000000
2)GB10 N099517CC92E409F00.00000000000000000000000000000000000000000000000000
2)GB08 J005938CC91EF4EE00M00000000000000000000000000000000000000000000000000
2)GB10 R044282CC92DA09400M00000000000000000000000000000000000000000000000000
1)RPRASO021700000827312051118454800000000000000000000000001 14/05 FALAISE BIFCR
1)GB06 N016107HC915764100(13242914050000000000000000000000000000000000000000
1)GB10 N099508HC924C09200.13254714050000000000000000000000000000000000000000
1)GB04 B008702HC91A363E00P13392914050000000000000000000000000000000000000000
1)GB10 N099514HC92CA06500O14005214050000000000000000000000000000000000000000
1)GB08 J005926HC91CB64100M16521014050000000000000000000000000000000000000000
1)GB10 N099501HC924A0D700N17023914050000000000000000000000000000000000000000
1)GB09 N005332HC912562900M18113414050000000000000000000000000000000000000000
1)GB07 F032068HC91F563E00P00000000000000000000000000000000000000000000000000
1)GB10 R044281CC924E09C00M00000000000000000000000000000000000000000000000000

So, using the groupings T) 1) and 2) I can now amend my many Select Queries that go on to split these data strings into readable and meaningful output. Because it's a RACE program I'm reading the data into I can now completely IGNORE the Training flight data (never required) and focus on the birds that have arrived from the races (in this case two races). I think the maximum number of races likely to be encountered is FIVE, but using the existing coding we already have the scope and growth for 9.

The one adjustment I'd probably want to make to your code is to store the racedata.txt file location in a table field, viewed and accessed from a Form Field call '[filelocation]'. This is just in case the file location changes. Are you able to advise the coding change to achieve this?

Once again DELIGHTED

External Data Parsing

Registered User.

Attachments

Super Moderator

Registered User.

Attachments

Super Moderator

Registered User.

Super Moderator

Registered User.

Super Moderator

Registered User.

Super Moderator

Super Moderator

Registered User.

Registered User.

Registered User.

Duly Registered Boozer

Super Moderator

Registered User.

Super Moderator

Registered User.

Registered User.

Similar threads

Users who are viewing this thread