XML Import (1 Viewer)

namliam

The Mailman - AWF VIP
Local time
Today, 01:05
Joined
Aug 11, 2003
Messages
11,695
Anyone happen to have a (semi) generic piece of code that knows how to handle (semi) random XML files and structures, preferably creating tables and columns on the fly...

Or some code that will read an XML tree and genericaly will add the data to the tables it can find?

Failing that I will have to go make it myself, but am hoping to safe myself a day or so worth of work if someone has some general idea or place to start :)
 

WayneRyan

AWF VIP
Local time
Today, 00:05
Joined
Nov 19, 2002
Messages
7,122
namliam,

We've had to import several large XML datasets.
One in particular contains about 20 tables worth of data.

I have never seen a product that could import these acceptably.
This includes Excel, Access, MatLab, Altova XML Spy.
It doesn't really seem to be a fault of theirs, just the way the XML is constructed.

Sometimes child tables are completely divorced in the XML.

Sometimes the introductory tag is not only a tag, it contains additional data such
as a security classification. These tend to get lost in the import.

About the only thing I can suggest is to:

1) Have a rough idea of what the data should look like.
2) Eyeball the XML data for obvious flaws.
3) Import using some tool (like Access) and see what tables are produced.

In some cases, you can pre-parse (rewrite) the XML and produce something that can be
automatically imported.

In other cases, you'll just have to write your own parser.

hth,
Wayne
 

namliam

The Mailman - AWF VIP
Local time
Today, 01:05
Joined
Aug 11, 2003
Messages
11,695
I am trying to process (import and export) SEPA files, due to their nature I cant actually share the file due to privacy issues and making them contain giberisch is "kinda shitty" ....

Two links for their description
http://docs.oracle.com/cd/E16582_01/doc.91/e15104/sepa_dir_debit_xml_appx.htm
https://www.abnamro.nl/nl/images/Ge...Betaalvereniging_IG_SEPA_Direct_Debit_5-0.pdf

The second also contains a sample at the end.

My question though is way more broad, I am/was trying to think in a general setup kinda way....
Trying to contemplate on how one could more genericaly work with these stupid XML files/formats and extract the usefull information from them.
In the past I made my own parsers simply reading the file line by line and where need by tag by tag or character by character handling each tag as and when it comes along.
Development along those lines is rather detailed and kindoff messy to "get it right" using a long If/then/elseif/elseif/etc or Select case statements to handle each instance.

The shitty parts in XML files are these kinds of things
Code:
            <PmtTpInf>
                <SvcLvl>
                    <Cd>SEPA</Cd>
                </SvcLvl>
                <LclInstrm>
                    <Cd>CORE</Cd>
                </LclInstrm>
                <SeqTp>FRST</SeqTp>
            </PmtTpInf>
Where tags like Cd are repeating in name but under a different "containter".
Making it (near) impossible to really work off the tag itself, but in cases like this you need to trigger of the container...
And if made Generic (enough) you probably need to either make a "translation" table that "maps" the tags / containers.

Another possible solution I was considering was making a table with column names like:
PmtTpInf/SvcLvl/Cd

Which would assume the full containers in the column names but that would (potentially) result in very long column names not always the best kindoff names either.
Also when re-exporting the XML one could base the nesting and XML structure from the column names.... However the long names make me worry you will run out of the 64 char limit on the column names.
Considering using other conventions has its issues with the uniqueness of individual tags.

I can envision working with the default import of XML kindoff, and tweak that a little.... The default import will generate 15 or so seperated tables, the only tweak would be to add in key fields during a manual import.... again using the line by line/tag by tag/char by char. But actually splitting it into sub tables and re-generating the actual record using a query.

All in all, in 2 words or less, I am spinning my wheels trying to get some sort of "standard" that will work quite easily and genericaly, some sort of best practice I guess....

Kind regards
 

Users who are viewing this thread

Top Bottom