Manipulating huge string / importing xml to Access (1 Viewer)

katerina

Registered User.
Local time
Today, 03:46
Joined
Aug 30, 2011
Messages
11
Hello everyone :)

I have a large (90Mb) text file in xml format to import into Access. I can’t simply use the XML importer, because the xml is in this format:

Code:
<record>
   <recordid>id</recordid>
   <attribute.1>value</attribute.1>
   <attribute.n>value</attribute.n>
      <group1...n>
         <attribute1...n>value</attribute1>
      </group>
</record>
If Access’s inbuilt xml importer is used, then ‘group’ tables are created with no foreign key linking them back to the pk in the ‘record’ table.
There are ~15k records, and both attributes and groups are optional and repeatable, so there’s no set pattern. My ‘solution’ is to import the file to string, manipulate to the file below, which can then obviously be imported as I’ve added the keys.

Code:
<record>
   <recordid>id</recordid>
   <attribute.1>value</attribute.1>
   <attribute.n>value</attribute.n>
      <group1...n>
        <recordid>id</recordid>
         <attribute1...n>value</attribute1>
      </group>
</record>
However the speed it’s running seems to indicate it will take a day to finish!

So:
1/ can anyone think /know of a better way to import xml into access and have the tables linked
2/ what is the way to deal with 100Mb strings optimally?

Current code, which I accept may be no good (although it works!)!:

Code:
'sXML: entire file
'sXMLPart: single record

Loopage:

'isolate this record markers
iStartRecord = InStr(iStart, sXML, sRecordOpener) + Len(sRecordOpener) 'VBA InStr returns 0 if not found
iEndRecord = InStr(iStartRecord, sXML, sRecordClose)

k = iStartRecord - Len(sRecordOpener)
If k = 0 Then: k = 1
sXMLpart = Mid$(sXML, k, iEndRecord - iStartRecord + Len(sRecordOpener) + Len(sRecordClose))

'VBA Instr returns 0 if not found
If iStartRecord > (0 + Len(sRecordOpener)) Then
      
      'find recordid
      iStart = InStr(iStartRecord, sXML, sRecordOpener) + Len(sRecordOpener)
      iEnd = iStart + 13
      sRecordID = Mid$(sXML, iStart, iEnd - iStart)
      'Debug.Print sRecordID
      

      
      For x = 1 To UBound(sTagToAddKeyTo) 'for each tag we want to add a key to

        sFind = sTagToAddKeyTo(x) 'the opening tag we're looking for
        sFindEnd = sTagToAddKeyToClose(x) 'the closing of that tag
        iPointer = iStartRecord 'temp marker
        
        Do While InStr(iPointer, sXMLpart, sFind) < iEndRecord And InStr(iPointer, sXMLpart, sFind) <> 0 'do while instances are found
          iStart = InStr(iPointer, sXMLpart, sFind) 'where occurance is found
          sXMLpart = Left$(sXMLpart, iPointer - 1) & Replace$(sXMLpart, sFind, sFind & sRecordOpener & sRecordID & sRecordClose, iPointer, 1) 'add key
          iPointer = iStart 'move pointer
          iPointer = InStr(iPointer, sXMLpart, sFindEnd) + Len(sFindEnd) 'move pointer beyond the key we've added
        Loop 'loop per instance of tag
      
      Next x 'next tag within record
      
iStart = iEndRecord


  vff = FreeFile
  Open sOutput For Append As #vff
  Print #vff, sXMLpart
  Close #vff

GoTo Loopage 'next record
Thanks in advance; any help appreciated!
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 03:46
Joined
Sep 12, 2006
Messages
15,713
you can use an XSLT file to analyse the XML file - but this is an area in which I have no experience.

What I do, is treat XML files as large "trees" and walk them using functionality from the MSXML library

This example generates an XML structure table, ie the "node pattern" in the XML file which may help. I have updated it slightly since, so if it isn't quite right, let me know.

http://www.access-programmers.co.uk/forums/showthread.php?t=166861
 

katerina

Registered User.
Local time
Today, 03:46
Joined
Aug 30, 2011
Messages
11
Hi,

Thanks for that. Although it works fine for a smaller file, it simply doesn't load the 96Mb. It passes over the 'oxmldoc.Load importfile' line much too fast (when stepping through), and the oxmldoc.xml value is either empty, or rarely, just the XML decs pre-header.

Any ideas?

ps Just FYI, that line needs to have a wait on it because, without stepping through the codes, it seems to run before it's loaded, giving a '"Tree Produced - 0 nodes" msg. Step though and the table is populated fine. There are a few DLookups where you need to change the Expr to recid rather than id, and the 's' string for the INSERT SQL needs a slight change.
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 03:46
Joined
Sep 12, 2006
Messages
15,713
I wasn't sure about the upload. As I say, I have tweaked it slightly since uploading it. I will upload a new version as soon as I can.

However, I hadn't tried to use this with a very large file. I use this sort of stuff generally with XML files that happen to be maybe 1-2Mb, and it has never errored.

Is there any way I can get hold of your large file to test it?
 

katerina

Registered User.
Local time
Today, 03:46
Joined
Aug 30, 2011
Messages
11
Yea other than that it worked well, very nice.

Variants of the file can be found from here: [wanted to insert link!][FONT=&quot] they're ONIX-format xml files.

Hmm, can't post the link as have less than 10 posts... Any ideas?! The file can be FTP'd from Springer (publisher)'s website.
[/FONT]
 

katerina

Registered User.
Local time
Today, 03:46
Joined
Aug 30, 2011
Messages
11
I'm happy to post the link by circumventing the restriction, but don't want to if it'll get me banned..?!
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 03:46
Joined
Sep 12, 2006
Messages
15,713
I can't see a download sample on their website. Please let me have some details about how to obtain the file. You won't get banned!


OK - I selected this file from a downloads area, which came in about 70Mb.

Springer_MARC_20110901_201636.xml (name, not a link)

is this the sort of file I am looking for
 
Last edited:

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 03:46
Joined
Sep 12, 2006
Messages
15,713
I have started running this file against my analyser. It is certainly not erroring straight away, but is processing the file.

However, it is a large file that needs parsing a record at a time, so I guess it will take a fair while. I will leave it to complete and see what happens.

I think a progress indicator for large files would be useful - to do this, you would need to add a record count at an appropriate point and include a doevents statement so you can be sure it is progressing, and is not "stuck"

More to follow.


OK - it finished after about an hour, examining 1.3m nodes - with only 6 tag types seen. Does that sound like the correct file?
 
Last edited:

katerina

Registered User.
Local time
Today, 03:46
Joined
Aug 30, 2011
Messages
11
Hi,

Thanks for your time!

Not sure which file that is, and I'm not on the machine which has a link to my actual file, but at springer dot com > Services > Booksellers > Datafeeds there is a 'sample files' zip under the Format Samples header. In this is an xml file, albeit only a Mb. C&Ping the <product>..</product> tags 10k (or less...) times and you've got a rough example of the file.

Having said all that, any large file will do I guess!
 

katerina

Registered User.
Local time
Today, 03:46
Joined
Aug 30, 2011
Messages
11
There should be more tags - but I guess the principle remains that it did it, which is what matters.


How did you 'know' it was working? - when I ran against a large file and stepped through the code, it didn't wait on the 'oxmldoc . Load importfile' line at all. After that I did wait a while, although probably not an hour.
 

katerina

Registered User.
Local time
Today, 03:46
Joined
Aug 30, 2011
Messages
11
I can send the link to the actual file via PM (except I can't see a PM system?!?) if you're OK with that - I'd need to post the ftp pw which I don't overly want to publicly
 

Users who are viewing this thread

Top Bottom