import data from a XML file

teja

Registered User.
Local time
Today, 09:41
Joined
Nov 28, 2011
Messages
19
Hi all,

I have data in a XML file. I have to import the data into the Access database. In the XML file each node has 13 child nodes. Hence, when I try to import the data, 13 new tables are formed.

Since no 2 tables have a common column I am finding it difficult to merge the data into 1 big master table.

Consider the example if 2 students are enrolled into the 6 subjects each and the first tab contains the list of students and the second tab contains the list of subjects, then while importing 2 tables are formed. The first table has 2 rows of students and the second table has 12 rows of their corresponding subjects. If I were to create a master table having students and their corresponding subjects I think I need a way to recognize the students with their corresponding subject using some unique identifier. Here we know the number of subjects hence we can split it into 6 and 6. but if the number of subjects is unknown then i think we need a unique identifier.

My question - Is there any other way to make the master table with the corresponding student data without the unique identifier if the number of subjects each takes in variable.
 
If the relation between students and courses is not in the XML data, or perhaps elsewhere, then I fail to see how Access should automagically know what data to marry to what :-)
 
i wanted to know is there a way in Access to import the XML file data line by line into a variable and fill the tables indivisually.
 
If the relation between students and courses is not in the XML data, or perhaps elsewhere, then I fail to see how Access should automagically know what data to marry to what :-)


Is there a way to import the data line by line through VBA code????
 
I haven't looked into working with XML in the Access environment so my knowledge may not be relevant ;) But I hear a lot of people struggle to import XML into Access databases. Seamless in other bigger db engines though :)

How is your XML formed? Is it Element centric or Attribute centric?

Perhaps a quick example of what it looks like.
 
i don't know if importxml has changed, but i had the same issue. access splits an xml file in a not very helpful way.

OK - you cannot read an xml file as a text file becuase an xml file is not structured as a line-by-line entitiy (well you could, but it would be very hard - since you have to find the corresponding closing tag for each opening tag, so you can't do it line by line, as you could with a normal text file/ csv file, say - you would end up duplicating much of the following )

you need to use recursion, which makes an otherwise very complex task ridiculously easy. it's basically this, consider each tag pair [mytag] [/mytag] as a node, which may contain either data, (node.txt or node.xml) or further embedded nodes (node.childnodes)

So - I use the msxml library, which is downloadable form MS, and wrote a recursive algorithm to walk the tree of node tags, and extract data manaually in the format I wanted.

Code:
sub treewalk(thisnode as node)
dim childnode as node
    if thisnode.haschildren then 
         for each childnode in thisnode.childnodes
               treewalk(childnode)
        next
    else
        process thisnode
    end if
end sub
 
sub main
dim root as node
    open the xml file
    ensure it is a valid xml file (msxml includes a test for this)
    set the root node
    treewalk(root)
end sub

this is aircode, and the syntax might be slightly off, but the basic tree walk code is not very long
 
Last edited:
I haven't looked into working with XML in the Access environment so my knowledge may not be relevant ;) But I hear a lot of people struggle to import XML into Access databases. Seamless in other bigger db engines though :)

How is your XML formed? Is it Element centric or Attribute centric?

Perhaps a quick example of what it looks like.


Here is the sample data

<?xml version="1.0"?>
<STUDENT>
<StudentInfo>
<RollNumber>1</RollNumber>
<Name>DON COBB</Name>
</StudentInfo>
<StudentSubjects>
<SUB1>PHYSICS</SUB1>
<SUB2>CHEMISTRY</SUB2>
<SUB3>MATHEMATICS</SUB3>
<SUB4>ENGLISH</SUB4>
</StudentSubjects>
<StudentInfo>
<RollNumber>2</RollNumber>
<Name>ALBERT EINSTEIN</Name>
</StudentInfo>
<StudentSubjects>
<SUB1>PHYSICS</SUB1>
<SUB2>CHEMISTRY</SUB2>
<SUB3>MATHEMATICS</SUB3>
</StudentSubjects>
</STUDENT>

here the StudentInfo will form 1 table and StudentSubjects will be in another table. Without a common identifier, i need to map the two tables such that in the master table roll number 1 will have his 4 subject and 2 will have his 3 subjects displayed.

Also, is there a way i can import the data row by row from XML to Access database tables
 
My understanding is Access struggles to read Attribute centric XML whereas XML based on Elements (where the data value is between the tags) are much easier.

I'm guessing the latter is the type that gemma-the-husky was working with.
 
i don't know if importxml has changed, but i had the same issue. access splits an xml file in a not very helpful way.

OK - you cannot read an xml file as a text file becuase an xml file is not structured as a line-by-line entitiy (well you could, but it would be very hard - since you have to find the corresponding closing tag for each opening tag, so you can't do it line by line, as you could with a normal text file/ csv file, say - you would end up duplicating much of the following )

you need to use recursion, which makes an otherwise very complex task ridiculously easy. it's basically this, consider each tag pair [mytag] [/mytag] as a node, which may contain either data, (node.txt or node.xml) or further embedded nodes (node.childnodes)

So - I use the msxml library, which is downloadable form MS, and wrote a recursive algorithm to walk the tree of node tags, and extract data manaually in the format I wanted.

Code:
sub treewalk(thisnode as node)
dim childnode as node
    if thisnode.haschildren then 
         for each childnode in thisnode.childnodes
               treewalk(childnode)
        next
    else
        process thisnode
    end if
end sub
 
sub main
dim root as node
    open the xml file
    ensure it is a valid xml file (msxml includes a test for this)
    set the root node
    treewalk(root)
end sub

this is aircode, and the syntax might be slightly off, but the basic tree walk code is not very long

which version of MSXML do i need to download for access 2007????
 
I am shortly facing the same problem as OP: need to import related data, and have just discovered, that Access has no clue about the relations (if that is based on the node structure of the XML-file by itself).

Albert D. Kallal has a very interesting idea for dealing with this, outlined in his comments here: http://www.accessmonster.com/Uwe/Forum.aspx/access/126534/Import-XML-to-Access

His idea is to use that what Access does do well, and avoid wrting code for parsing the XML . Either split the XML in small lumps, ie. one parent record and only the children of that parent, so that we know that they are related, or alternatively tag each parent in the xml, and add the parent's tag to each child. In the latter case the imported child table would then contain the FK to link the children to the parent.
 
which version of MSXML do i need to download for access 2007????

any - (ie just accept the latest)

it may very well already be included in your library references. look for microsoft xml.

lots of help on line
 
I didn't notice your last post.

In addition to gemma-the-husky's advice, perhaps the following link might give you some pointers:

Upload a sample XML file and (if I have time later) I will explore the import options available in Access.

In the link you sent, the tables mentioned have a unique identifier PRBLM_ID. But i do not have any unique identifier. Hence is it possible to import the data as GEMMA-the-husky because in the net I ahev read that MSXML comes with MS Office 2007 but when i run the code is is throwing the error

user defined type MSXML.DOMDocument not defined.
 
Is this a one-off exercise? If so, I would simply read the XML into word ( as TEXT file, not XML!), make a macro that adds the RollNumber to each child's data, and save the file. Then when you import it, each record in each child table is tagged with the parent's id. If not one-off, I'd look at Kallal's code in the link.
 
In the link you sent, the tables mentioned have a unique identifier PRBLM_ID. But i do not have any unique identifier. Hence is it possible to import the data as GEMMA-the-husky because in the net I ahev read that MSXML comes with MS Office 2007 but when i run the code is is throwing the error

user defined type MSXML.DOMDocument not defined.
gemma-the-husky will be able to advise you about his approach.

Upload a sample text file.
 
i still think it comes back to the point that XML is not quite the panacea it was supposed to be, and that handling XML files is very difficult.

[edit
just to take this further - the fact that an xml file contains it's own definition is only helpful to automate processing the file, if you have foreknowledge of that structure]

becuase they carry so much text (descriptive tags) they are also far bigger than an equivalent csv file for instance.

without reading through the thread, Albert Kallal's idea of splitting the XML ionto sub records, is surely only applicable for "simple" xml files. if the xml files themselves include sub-structures, then you face the same problem within each substructure. .. so you bring yourself back to a recursive algortihm.

Trying to amend xml files by modfiying the text must be an enormous exercise in itself, as it involves reading and writing potentially large files.

the truth is that if access cannot give you an automated way of importing the data in a "normalised manner", the only way, it seems to me, is to do this yourself.

- you can do this with an XSLT file if you are proficient, although I think you need to be pretty expert in XML to do this, from what I have seen

- if not then examine the structure in code, and manually deal with each node (tag-pair) within the structure.


[edit again

[i deleted some comments here - as I had misread the structure of the XML file]

[The positive side of doing whichever solution you decide to use, is that once you have constructed the tree-walker, or the XML file cleaner, etc - you will find the code is generally re-usable]


[with regard to the MSXML library, you need to tick one of the examples (i would use the highest one you can see), to include the library in your project.]
 
Last edited:
Trying to amend xml files by modfiying the text must be an enormous exercise in itself, as it involves reading and writing potentially large files.
I would normally do this with a scripting language, e.g. VBScript or Perl (if it's installed on the host machine).
 
The file presented thus far is simple. But yes, maybe something else is hiding there - only OP knows. And Word can gobble up substantial sizes, and then you have a programming environment that is easy to access, and a macro is no biggy. But of course, I dont't know the size of the file or how often the process is to be repeated.

So which method is the best suited depends on the exact circumstances, presumably shortly to be revealed :)

Parsing the XML in code has the huge advantage that you can stuff the data where it belongs. Looking at the records we can see that eg the student's choices are listed as separate columns, which, if using built-in import, will require some additional postprocessing to normalise.
 
Last edited:
vbaInet: Nay, I just look at the sample in #7.And read the text in #1, implying that there is a bunch of different children for each parent, but no more levels than that.
 

Users who are viewing this thread

Back
Top Bottom