Hey guys, Im currently studying in my third year of a computing degree, and as part of my final year dissertation i have decided to use database software that would predict the possibility of students passing the course based on their first year results. My idea was to produce a database that would hold students information and course information, course module information and also hold the students results in that particular module, these would be the tables (Im fine with this part). I need help with the predictive part, any ideas on how I could do this iv researched possibly connecting my database to external software such as sql or maybe connecting it to programming software such as Visual Basic, but before I actually do anything I needed to be sure so if anybody has any idea that would be great.
What are you looking for exactly? A specific methodology? I think only a person familiar with program could do that. I can provide a high level idea of how I would accomplish it.
You know what data you have, you know what the output will be. The trick now is figuring out the recipe and how to mix your ingredients. I'd make a list of components that are indicative of success in the course. Maybe that's attendance, maybe its test scores, maybe its overall module grades. After that list is complete I'd rank how indicative each of those components are of overall success.
Then I'd go through the process of assigning each component a percentage of overall importance so that the overall importance adds up to 100%. From there I'd look at each component and try and create a scoring system (from 0-100) such that you can determine a sub-score for each component using the data you have. Once you can determine each sub-score you simply add everything up:
okay, that all makes sense, obviously i would rank overall grade as highest importance, I understand the technique you just explained, so would all that work in access then? I was convinced that i would need to use some additional software because i couldnt think how i could possibly do it all just in access. Where would i store these components anyway, would they be stored within the tables or would it involve using a seperate tables for just the components?
There are two issues here: a minor one of storing/handling data and the major one of creating a predictive model. Access is good at storing and retrieving data. During development of the model you may wish to use Excel, as Excel is much better at playing with numbers. Once you have your model, you can write it up it in VBA, and use Access to run it, and to manage data and reports.
obviously i would rank overall grade as highest importance
There is nothing obvious about this. Intuitively perhaps, but mathematically no. You'd need to construct a predictive model based on past data, see eg. http://en.wikipedia.org/wiki/Multiple_correlation. This involves postulating the relevant variables, and checking how large a role - your "ranking" - they actually play. Then there is the issue of evaluating whether your variables truly are independent, otherwise you'll get cross-correlation issues that need to be catered for.
You could do it all in text files and javascript. VBA would be sufficient to do any programming work you would need on this. I'd start with spike's suggestion of hammering out the system in excel then building the logic you determine you want in VBA.
Grab data for 5 people--a horrible student, a bad student, an average student, a good student and an excellent student. Build your system using one of the student's data. Then run the data for the other 4 using that system to see where they get spit out at. If the horrible student has a greater predictive score than the average student then you know your system needs some tweaking. Keep tweaking until the system spits out scores that seem correct for your 5 students.
Both Spikepl and plog are pointing you to the right direction. In short – they are asking you to first analyse the available data, and see if you can work our some pattern/correlation/algorithm.
I want to add to the suggestion that: if by magic somehow you can get hold of a large bunch of old student data (e.g. last few years) of different parameters, e.g.:
Code:
Student ID (anonymous)
Nationality
Age
Sex
Course Info
Module1
Total number of Attendance
Assignment1 Grade
Assignment2 Grade
...
Lab1 Grade
Lab2 Grade
...
Exam1 Grade
Exam2 Grade
...
Final Exam Grade
PassFail
You can put them all in an Access table or Excel. Now spotting the relationship between pass or fail with that many variables is sometime impossible with human eyes/brain. So if you are unable to figure out a model based on the above parameters, then your need a self-learning algorithm based on the data that you have, and let it works it out itself. Neutral Networks (NN) are good at this kind of situation because
a) you have the historic input data
b) you know the output data (pass/fail – historically)
You can use all the past data to train the NN program to distinguish between a pass or fail. The NN will adjust itself as it goes along, assigning “weights” and “correlations” from the inputs.
Once training is done then you can feed it some random old data again to check if the NN’s guesses are correct, or maybe it needs fine tuning or more variables.
You should have no problem in finding resources about NN on the internet. Most electronic engineering students are taught with applying this method, so you might like to have a word with them and seek collabrations/ideas. I recently came across with the following link and it was extremely useful.
There is an example that explained detecting cancer patients, and it brings light to the subject as to how multi-variable, multi-space problems can be solved by using NN.
But there are many more parameters that may be of predictive significance: address (ie location), marital status and changes therein, kids (or not), funding (grants, selfsponsored, family) ... etc etc. So if this model is to actually have a predictive strength then that's no small task, me thinks
If this is an excercise in creating a DB-application, then you could include some more fun factors: Zodiac-sign, number of letters in the first name, hyphenated (or not) surname, shoe size , height, etc, and see how all that correlates