See attached example, same line item repeated for the Student in the original Excel file.Step 1: Identify and remove duplicates: in what sense are they duplicates?
See attached example, same line item repeated for the Student in the original Excel file.Step 1: Identify and remove duplicates: in what sense are they duplicates?
Thank you! I will use this to create the new database.A star schema might look like below, however I do not have the subject matter expertise to be sure about the placement of data in the tables (eg AdmissionID and UNSWStudentID (I took some liberty with some names)).
The dimension tables would generally contain every combination of the attributes logically possible, or at least that occur within your data set. Where an item can be attributed to a student in the 1131 course (a fact) - such as RAW ATAR score - then it should be placed there. I don't think it is applicable to the UAC table where you originally had it (for eg)
Note that the OtherEnrolledCourse table does not follow this pattern as there is an unknown number of other courses the student may have enrolled in. A query would need to run to obtain the course codes for OtherEnrolledCourses for say the first 10 sorted by course code or other filter.
UNSW Maths course would, on the basis you only are interested in Course 1131 holds only one record, which you could migrate into the UNSW_Maths table as four columns. It just gives a structure for analysis focussing on any course where the data has been extracted using the same rules.
Ignore/adjust the data type assignments
I hope this gives you a start to the design you need.
View attachment 118255
From the picture there are 6 duplicates of each of the 2 records shown, however the data set is much wider than shown. Are the columns shown the only ones of interest in determining a duplicate?See attached example, same line item repeated for the Student in the original Excel file.
SELECT [Student Profile].UNSW_MATHS_COURSE_CODE
,Grade.GRADE_DESCR
,Qualifications.SCHOOL_LEAVER_IND
SELECT [Student Profile].UNSW_MATHS_COURSE_CODE
,Grade.GRADE_DESCR
,Qualifications.SCHOOL_LEAVER_IND
FROM [Student Profile]
INNER JOIN (
UAC INNER JOIN (
Student INNER JOIN (
Qualifications INNER JOIN (
[High School] INNER JOIN (
Grade INNER JOIN (
Flags INNER JOIN (
Enrolled INNER JOIN (
Admit INNER JOIN [Student Profile] ON Admit.ADMIT_ID = [Student Profile].FK_ADMIT_ID
) ON Enrolled.ENROLLED_ID = [Student Profile].FK_ENROLLED_ID
) ON Flags.FLAGS_ID = [Student Profile].FK_FLAGS_ID
) ON Grade.GRADE_ID = [Student Profile].FK_GRADE_ID
) ON [High School].HS_ID = [Student Profile].FK_HS_ID
) ON Qualifications.QUALS_ID = [Student Profile].FK_QUALS_ID
) ON Student.STUDENT_DEMO_ID = [Student Profile].FK_STUDENT_DEMO_ID
) ON UAC.UAC_ID = [Student Profile].FK_UAC_ID
) ON [UNSW Maths].MATHS_ID = [Student Profile].FK_MATHS_ID
WHERE [Student Profile].UNSW_MATHS_COURSE_CODE = 'Math1131'
AND Grade.GRADE_DESCR = 'Fail'
AND Qualifications.SCHOOL_LEAVER_IND = "School Leaver";
Yes - all the inner joins are unnecessary if the desired Student Profile records required always contain valid FK references. Any Student Profile records that do not match to the other table on the FK (eg a null FK in Student Profile) are excluded in the result.upload your db for anyone to see.
creating unnecessary Joins will make the query slow.
Try this - I do not think you need to use UNSW_Maths table - appears to be trying to retrieve course code twice. The focus is to id the students in the course, that are school leavers and failed the course.
Code:SELECT [Student Profile].UNSW_MATHS_COURSE_CODE ,Grade.GRADE_DESCR ,Qualifications.SCHOOL_LEAVER_IND SELECT [Student Profile].UNSW_MATHS_COURSE_CODE ,Grade.GRADE_DESCR ,Qualifications.SCHOOL_LEAVER_IND FROM [Student Profile] INNER JOIN ( UAC INNER JOIN ( Student INNER JOIN ( Qualifications INNER JOIN ( [High School] INNER JOIN ( Grade INNER JOIN ( Flags INNER JOIN ( Enrolled INNER JOIN ( Admit INNER JOIN [Student Profile] ON Admit.ADMIT_ID = [Student Profile].FK_ADMIT_ID ) ON Enrolled.ENROLLED_ID = [Student Profile].FK_ENROLLED_ID ) ON Flags.FLAGS_ID = [Student Profile].FK_FLAGS_ID ) ON Grade.GRADE_ID = [Student Profile].FK_GRADE_ID ) ON [High School].HS_ID = [Student Profile].FK_HS_ID ) ON Qualifications.QUALS_ID = [Student Profile].FK_QUALS_ID ) ON Student.STUDENT_DEMO_ID = [Student Profile].FK_STUDENT_DEMO_ID ) ON UAC.UAC_ID = [Student Profile].FK_UAC_ID ) ON [UNSW Maths].MATHS_ID = [Student Profile].FK_MATHS_ID WHERE [Student Profile].UNSW_MATHS_COURSE_CODE = 'Math1131' AND Grade.GRADE_DESCR = 'Fail' AND Qualifications.SCHOOL_LEAVER_IND = "School Leaver";
SELECT SP.UNSW_MATHS_COURSE_CODE AS CC
,SP.Grade.GRADE_DESCR AS GD
,SP.Qualifications.SCHOOL_LEAVER_IND AS SL
FROM [Student Profile] AS SP
INNER JOIN (
UAC INNER JOIN (
Student INNER JOIN (
Qualifications INNER JOIN (
[High School] INNER JOIN (
Grade INNER JOIN (
Flags INNER JOIN (
Enrolled INNER JOIN (
Admit INNER JOIN SP ON Admit.ADMIT_ID = SP.FK_ADMIT_ID
) ON Enrolled.ENROLLED_ID = SP.FK_ENROLLED_ID
) ON Flags.FLAGS_ID = SP.FK_FLAGS_ID
) ON Grade.GRADE_ID = SP.FK_GRADE_ID
) ON [High School].HS_ID = SP.FK_HS_ID
) ON Qualifications.QUALS_ID = SP.FK_QUALS_ID
) ON Student.STUDENT_DEMO_ID = SP.FK_STUDENT_DEMO_ID
) ON UAC.UAC_ID = SP.FK_UAC_ID
) ON [UNSW Maths].MATHS_ID = SP.FK_MATHS_ID
WHERE CC = "Math1131"
AND GD = "Fail"
AND SL = "School Leaver";