SQL Hangs

ppines · Jan 28, 2025

GaP42 said:
Step 1: Identify and remove duplicates: in what sense are they duplicates?

See attached example, same line item repeated for the Student in the original Excel file.

ppines · Jan 28, 2025

GaP42 said:
A star schema might look like below, however I do not have the subject matter expertise to be sure about the placement of data in the tables (eg AdmissionID and UNSWStudentID (I took some liberty with some names)).

The dimension tables would generally contain every combination of the attributes logically possible, or at least that occur within your data set. Where an item can be attributed to a student in the 1131 course (a fact) - such as RAW ATAR score - then it should be placed there. I don't think it is applicable to the UAC table where you originally had it (for eg)

Note that the OtherEnrolledCourse table does not follow this pattern as there is an unknown number of other courses the student may have enrolled in. A query would need to run to obtain the course codes for OtherEnrolledCourses for say the first 10 sorted by course code or other filter.

UNSW Maths course would, on the basis you only are interested in Course 1131 holds only one record, which you could migrate into the UNSW_Maths table as four columns. It just gives a structure for analysis focussing on any course where the data has been extracted using the same rules.
Ignore/adjust the data type assignments
I hope this gives you a start to the design you need.

View attachment 118255

Thank you! I will use this to create the new database.

GaP42 · Jan 28, 2025

ppines said:
See attached example, same line item repeated for the Student in the original Excel file.

From the picture there are 6 duplicates of each of the 2 records shown, however the data set is much wider than shown. Are the columns shown the only ones of interest in determining a duplicate?

From within Excel:
To identify and retain a single record from a set of duplicates in a dataset, you can use data analysis tools like Excel's "Remove Duplicates" function, where you first select the columns to check for duplicates, then use the "Remove Duplicates" option to identify and remove all but one record from each duplicate set, effectively keeping only a single record per duplicate group.

Key steps:

Select the data range:
Highlight the columns in your dataset that you want to check for duplicates.
- In Excel: Go to the "Data" tab, then in Data Tools click "Remove Duplicates" icon.
Choose the columns to compare:
Select the specific columns that should be used to identify duplicates.
Confirm the operation:
Click "OK" to remove all duplicate records except for one from each duplicate group.

Important considerations:

Identifying the "correct" duplicate to keep:
If you need to keep a specific record within a duplicate set based on additional criteria (like the latest date or highest value in another column), you might need to use advanced filtering or sorting options before removing duplicates.
Data cleaning before analysis:
Always ensure your data is clean and consistent before identifying duplicates to avoid unintended results.

You might find your dataset is much smaller and amenable to manipulation in Access.
On import to Access assign an autonumber PK to each record.

Mark_ · Jan 28, 2025

From your data, it looks like you have 6 columns per record that cover "Semester". It looks like you can keep the enrolled term and delete the rest. Before doing this, make a list of the "Enrolled term" codes and have a separate list that tells you what each means. Cutting down the redundant data will help you focus on what you really need to work with, speed up processing, and reduce resource usage as you'll be dealing with much smaller files.

ppines · Feb 8, 2025

I have used Excel to clean the data (using 'advanced filter' and 'remove duplicates'), redesigned the database based on the start schema suggestion, and created a new query. At least this SQL runs! However, there is no data, so I could use some help with redesigning the query and/or schema.

Design

Code

SELECT [Student Profile].UNSW_MATHS_COURSE_CODE, Grade.GRADE_DESCR, Qualifications.SCHOOL_LEAVER_IND, [UNSW Maths].UNSW_MATHS_COURSE_CODE

FROM

[UNSW Maths] INNER JOIN (UAC INNER JOIN (Student INNER JOIN (Qualifications INNER JOIN ([High School] INNER JOIN (Grade INNER JOIN (Flags INNER JOIN (Enrolled INNER JOIN (Admit INNER JOIN [Student Profile]

ON Admit.ADMIT_ID = [Student Profile].FK_ADMIT_ID)

ON Enrolled.ENROLLED_ID = [Student Profile].FK_ENROLLED_ID)

ON Flags.FLAGS_ID = [Student Profile].FK_FLAGS_ID)

ON Grade.GRADE_ID = [Student Profile].FK_GRADE_ID)

ON [High School].HS_ID = [Student Profile].FK_HS_ID)

ON Qualifications.QUALS_ID = [Student Profile].FK_QUALS_ID)

ON Student.STUDENT_DEMO_ID = [Student Profile].FK_STUDENT_DEMO_ID)

ON UAC.UAC_ID = [Student Profile].FK_UAC_ID)

ON [UNSW Maths].MATHS_ID = [Student Profile].FK_MATHS_ID

WHERE

((([Student Profile].UNSW_MATHS_COURSE_CODE)='Math1131')

AND ((Grade.GRADE_DESCR)='Fail')

AND ((Qualifications.SCHOOL_LEAVER_IND)="School Leaver")

AND (([UNSW Maths].UNSW_MATHS_COURSE_CODE)='Math1131'));

Results

GaP42 · Feb 8, 2025

Try this - I do not think you need to use UNSW_Maths table - appears to be trying to retrieve course code twice. The focus is to id the students in the course, that are school leavers and failed the course.

Code:

SELECT [Student Profile].UNSW_MATHS_COURSE_CODE
    ,Grade.GRADE_DESCR
    ,Qualifications.SCHOOL_LEAVER_IND

SELECT [Student Profile].UNSW_MATHS_COURSE_CODE
    ,Grade.GRADE_DESCR
    ,Qualifications.SCHOOL_LEAVER_IND
FROM [Student Profile]
INNER JOIN (
    UAC INNER JOIN (
        Student INNER JOIN (
            Qualifications INNER JOIN (
                [High School] INNER JOIN (
                    Grade INNER JOIN (
                        Flags INNER JOIN (
                            Enrolled INNER JOIN (
                                Admit INNER JOIN [Student Profile] ON Admit.ADMIT_ID = [Student Profile].FK_ADMIT_ID
                                ) ON Enrolled.ENROLLED_ID = [Student Profile].FK_ENROLLED_ID
                            ) ON Flags.FLAGS_ID = [Student Profile].FK_FLAGS_ID
                        ) ON Grade.GRADE_ID = [Student Profile].FK_GRADE_ID
                    ) ON [High School].HS_ID = [Student Profile].FK_HS_ID
                ) ON Qualifications.QUALS_ID = [Student Profile].FK_QUALS_ID
            ) ON Student.STUDENT_DEMO_ID = [Student Profile].FK_STUDENT_DEMO_ID
        ) ON UAC.UAC_ID = [Student Profile].FK_UAC_ID
    ) ON [UNSW Maths].MATHS_ID = [Student Profile].FK_MATHS_ID
WHERE [Student Profile].UNSW_MATHS_COURSE_CODE = 'Math1131'
    AND Grade.GRADE_DESCR = 'Fail'
    AND Qualifications.SCHOOL_LEAVER_IND = "School Leaver";

arnelgp · Feb 8, 2025

upload your db for anyone to see.
creating unnecessary Joins will make the query slow.

GaP42 · Feb 8, 2025

arnelgp said:
upload your db for anyone to see.
creating unnecessary Joins will make the query slow.

Yes - all the inner joins are unnecessary if the desired Student Profile records required always contain valid FK references. Any Student Profile records that do not match to the other table on the FK (eg a null FK in Student Profile) are excluded in the result.
Using a Left join for these may show/reveal different counts of records retrieved, or the OP might investigate the occurrence of null foreign keys in the Student Profile table to review data quality and make ensure that nulls for some FK in the Student Profile table are acceptable or need amendment (through the import/assignment process).

ppines · Feb 10, 2025

When I make this amendment I receive the following message: "Syntax error in FROM Clause."

GaP42 said:

Try this - I do not think you need to use UNSW_Maths table - appears to be trying to retrieve course code twice. The focus is to id the students in the course, that are school leavers and failed the course.

Code:

SELECT [Student Profile].UNSW_MATHS_COURSE_CODE
    ,Grade.GRADE_DESCR
    ,Qualifications.SCHOOL_LEAVER_IND

SELECT [Student Profile].UNSW_MATHS_COURSE_CODE
    ,Grade.GRADE_DESCR
    ,Qualifications.SCHOOL_LEAVER_IND
FROM [Student Profile]
INNER JOIN (
    UAC INNER JOIN (
        Student INNER JOIN (
            Qualifications INNER JOIN (
                [High School] INNER JOIN (
                    Grade INNER JOIN (
                        Flags INNER JOIN (
                            Enrolled INNER JOIN (
                                Admit INNER JOIN [Student Profile] ON Admit.ADMIT_ID = [Student Profile].FK_ADMIT_ID
                                ) ON Enrolled.ENROLLED_ID = [Student Profile].FK_ENROLLED_ID
                            ) ON Flags.FLAGS_ID = [Student Profile].FK_FLAGS_ID
                        ) ON Grade.GRADE_ID = [Student Profile].FK_GRADE_ID
                    ) ON [High School].HS_ID = [Student Profile].FK_HS_ID
                ) ON Qualifications.QUALS_ID = [Student Profile].FK_QUALS_ID
            ) ON Student.STUDENT_DEMO_ID = [Student Profile].FK_STUDENT_DEMO_ID
        ) ON UAC.UAC_ID = [Student Profile].FK_UAC_ID
    ) ON [UNSW Maths].MATHS_ID = [Student Profile].FK_MATHS_ID
WHERE [Student Profile].UNSW_MATHS_COURSE_CODE = 'Math1131'
    AND Grade.GRADE_DESCR = 'Fail'
    AND Qualifications.SCHOOL_LEAVER_IND = "School Leaver";

GaP42 · Feb 10, 2025

No syntax errors are reported by Poor SQL (a useful online tool you might use) for the supplied SQL, however I made some adjustments - without change to the inner join clauses - see this code below (now Amended) . Also if you require review of SQL syntax: W3School - SQL - Inner Join

Found Admit ID inner join was using Student Profile on both sides of the join (which would not make much sense) due to my change to reference Student Profile - changed that particular reference to the Admit table.

The WHERE clause in your original SQL references values in other tables, however each of these appears in your fact table: Student Profile. I presume these are populated and can be used to select the records, rather than the values in the dimension tables. Note that if these values are in fact duplicated (accurately) in the dimension tables then they do not need to be in the fact table.

Possible typo in translation to your query? Possible typo in the references in the FROM clauses - table names, field names?

You could, if below continues to return a syntax error, remove all the inner joins. The SQL will then be a simple query of your central "fact" table, returning all records meeting where criteria. Introduce the inner join clauses progressively. Presumably later you will retrieve descriptive terms from the inner join tables to help interpret the output needed.

Code:

SELECT SP.UNSW_MATHS_COURSE_CODE AS CC
    ,SP.Grade.GRADE_DESCR AS GD
    ,SP.Qualifications.SCHOOL_LEAVER_IND AS SL
FROM [Student Profile] AS SP
INNER JOIN (
    UAC INNER JOIN (
        Student INNER JOIN (
            Qualifications INNER JOIN (
                [High School] INNER JOIN (
                    Grade INNER JOIN (
                        Flags INNER JOIN (
                            Enrolled INNER JOIN (
                                Admit INNER JOIN SP ON Admit.ADMIT_ID = SP.FK_ADMIT_ID
                                ) ON Enrolled.ENROLLED_ID = SP.FK_ENROLLED_ID
                            ) ON Flags.FLAGS_ID = SP.FK_FLAGS_ID
                        ) ON Grade.GRADE_ID = SP.FK_GRADE_ID
                    ) ON [High School].HS_ID = SP.FK_HS_ID
                ) ON Qualifications.QUALS_ID = SP.FK_QUALS_ID
            ) ON Student.STUDENT_DEMO_ID = SP.FK_STUDENT_DEMO_ID
        ) ON UAC.UAC_ID = SP.FK_UAC_ID
    ) ON [UNSW Maths].MATHS_ID = SP.FK_MATHS_ID
WHERE CC = "Math1131"
    AND GD = "Fail"
    AND SL = "School Leaver";

SQL Hangs

ppines

New member

Attachments

ppines

New member

GaP42

Active member

Mark_

Longboard on the internet

ppines

New member

GaP42

Active member

arnelgp

..forever waiting... waiting for jellybean!

GaP42

Active member

ppines

New member

GaP42

Active member

Similar threads

Users who are viewing this thread