SQL Hangs

ppines

New member
Local time
Today, 11:21
Joined
Oct 15, 2012
Messages
13
The query runs; however, it’s a big database and tends to hang:
1737691003655.png


Design
1737691024394.png


Code

TRANSFORM Count(Numeracy_UAC.UAC_QUAL_SUBJECTS) AS CountOfUAC_QUAL_SUBJECTS

SELECT Numeracy_Enrolled.ACADEMIC_YEAR, Numeracy_Enrolled.ENROLLED_TERM

FROM (((((((Numeracy_Student

INNER JOIN Numeracy_Enrolled ON Numeracy_Student.[STUDENT] = Numeracy_Enrolled.[STUDENT])

INNER JOIN Numeracy_Flags ON Numeracy_Student.[STUDENT] = Numeracy_Flags.[STUDENT])

INNER JOIN Numeracy_Grade ON Numeracy_Student.[STUDENT] = Numeracy_Grade.[STUDENT])

INNER JOIN [Numeracy_High School] ON Numeracy_Student.[STUDENT] = [Numeracy_High School].[STUDENT])

INNER JOIN Numeracy_Qualifications ON Numeracy_Student.[STUDENT] = Numeracy_Qualifications.[STUDENT])

INNER JOIN [Numeracy_UNSW Maths] ON Numeracy_Student.[STUDENT] = [Numeracy_UNSW Maths].[STUDENT])

INNER JOIN Numeracy_Admit ON Numeracy_Student.[STUDENT] = Numeracy_Admit.[STUDENT])

INNER JOIN Numeracy_UAC ON Numeracy_Student.STUDENT = Numeracy_UAC.STUDENT

WHERE (((Numeracy_Grade.GRADE_DESCR)='Fail')

AND ((Numeracy_Qualifications.SCHOOL_LEAVER_IND)='School Leaver')

AND (([Numeracy_UNSW Maths].UNSW_MATHS_COURSE_CODE)='MATH1131'))

GROUP BY Numeracy_Admit.ADMIT_TERM,

Numeracy_Enrolled.ACADEMIC_YEAR,

Numeracy_Enrolled.ENROLLED_TERM,

Numeracy_Enrolled.ENROLLED_TERM_DESCR,

Numeracy_Flags.HS_YEAR11_FLAG,

Numeracy_Flags.HS_YEAR12_FLAG,

Numeracy_Grade.GRADE_POINTS,

[Numeracy_High School].HIGH_SCHOOL,

[Numeracy_High School].SELECTIVE_SCHOOL_FLAG,

[Numeracy_High School].GATEWAY_SCHOOL,

[Numeracy_UNSW Maths].UNSW_MATHS_COURSE_CODE

PIVOT Numeracy_Student.STUDENT;



I then created another query to delete duplicates before running the first one:

DELETE FROM Numeracy_UAC

WHERE ID NOT IN

( SELECT MIN(ID)

FROM Numeracy_UAC

GROUP BY STUDENT, UAC_QUAL_SUBJECTS );



But this also hangs. The database is 1,622 MB. Do you have any ideas on how to cleanse the data so I can run the query?
 
I don’t believe a query with Group By will ever allow editing.
 
I don’t believe a query with Group By will ever allow editing.
I used Crosstab which requires Group By. I am not sure how else to go about it.
 
A database file 1.6Gb in size will start to behave badly once you start running queries on the largest tables, as it will be attempting to create temporary objects during that query process that will make it approach the 2Gb file size limit.

I used Crosstab which requires Group By. I am not sure how else to go about it.
You used a group by in your delete query, that won't work.

How many records are in the larger tables?
If you only need a subset of those create a query to restrict the initial recordset first then use that as a Basis for the cross tab.
 
A database file 1.6Gb in size will start to behave badly once you start running queries on the largest tables, as it will be attempting to create temporary objects during that query process that will make it approach the 2Gb file size limit.


You used a group by in your delete query, that won't work.

How many records are in the larger tables?
If you only need a subset of those create a query to restrict the initial recordset first then use that as a Basis for the cross tab.
There are 913,300 records. I want to find all the duplicates and delete them first, then complete the TRANSFORM. If I remove the GROUP BY statement will it work?

DELETE FROM Numeracy_UAC

WHERE ID NOT IN

( SELECT MIN(ID)

FROM Numeracy_UAC;


If not, how could I amend the code to remove the duplicates?
 
#5 try your latest delete query in a test or copy.
 
This query is never going to be updateable. It is a Cartesian Product. which is tbl1 * tbl2 * tbl3 * tbl4 * tbl5 * tbl6 * tbl7. None of the selected tables are related to each other. They are only related to the student table. What you have are SEVEN parallel queries.

Think of what you have in simplistic terms. Students, Courses, Pets. Both Courses and Pets are related to students but not to each other so you end up with:
Suzie, Math, Cat
Suzie, Math, Dog
Suzie, English, Cat
Suzie, English, Dog
Dave, Geography, Dog
Dave, Geography, Fish
Dave, Math, Dog
Dave, Math, Fish

The resultset makes no sense and looks like it has duplicates but it doesn't.

I'm sure you don't want to hear this but the schema is incorrect also. You have all the other tables joining to a data field and you do not have RI enforced. To make a proper relationship, you need to be joining to the primary key of the parent table. If you have a unique index on Student in Numercy_Student, joining student to one of the 7 other tables would be updateable. Also, Autonumbers should NOT be added to tables UNLESS they are used as the primary key. There is an old bug which may or may not be fixed which causes the Autonumber to lose its seed and start generating duplicate IDs under some conditions. And finally, not that it matters in your case, naming all the PKs "ID" just leads to confusion once you begin to use them correctly.
 
Last edited:
I'm not sure that is the case simply from joining to 7 different tables from the main table, albeit I would prefer left joins unless there is a real need to filter the data.
 
It depends a lot on the cardinality between the tables. If the main table-to-smaller-table relationship is 1:1, there's no problem with all those left joins. I do it all the time when joining from Patients over to demographic information of all types
 
Access has a per-table limit of 2 GB (minus file overhead) and a per-file limit of 2 GB (including file overhead). With a DB size of 1.62 GB, you are already at 80% of capacity before you start. When you attempt to build a query on something that big, the intermediate space requirements for queries have to be taken into consideration. If I read the symptoms correctly, you have essentially hit a brick wall that you will not easily knock down. Your code is hanging because Access is asking Windows to allocate more work space in the middle of a query but there is no place left to put it because of Access structural limits. I'm surprised that Access doesn't crash and give you some sort of "virtual memory exceeded" error.

By itself, rewriting the query will not help you. Redesigning the tables will help you, perhaps. Rearranging the tables among multiple back ends might also help - but no promises unless you can clean up that structure.

Some of those tables appear to be in a one/one relationships and it also appears that your tables contain duplicate or extraneous data. For instance, several of your tables contain both ID and Student as fields. If those are auto-number ID fields, you only need a unique key on tables that are going to be parents of child tables. If the table IS a child table and there is no lower-level dependent table, you can remove those ID fields. Pat Hartman's comment (about using ID as a field name when you will re-use that name in several tables) is also important.

You list a field as "Student" but we can't tell much about it. If it is the text of the student's name, then the tables all repeat the student's name when, based on normalization rules, you shouldn't do that. If that [student] field is a numeric identifier (like a student number) then you don't have a student name at all unless it is hidden somewhere in a table we can't see.

If any of those tables that have both ID and Student as fields are unique to a single student (i.e. the [student] field makes them unique, they shouldn't be in separate tables anyway. Every one of those tables that has one and only one record that is unique to the student need to be combined (thus removing the student field entirely from the incoming tables). The tables that CAN have multiple records per single student DO belong separately, but I am highly suspicious that all of the tables shown in the relationship diagram are multi-record.
 
It depends a lot on the cardinality between the tables. If the main table-to-smaller-table relationship is 1:1, there's no problem with all those left joins. I do it all the time when joining from Patients over to demographic information of all types

True, no problem with the JOINs for 1/1 relationships - but other problems occur for 1/1 tables, which indirectly violate normalization rules.
 
From the look of the files, this is a flat file broken out into multiple tables?
What is your end goal?

If this is imported and you are trying to only bring in what you need, I would suggest going about this is a different way. If this is something else, please share what you are trying to accomplish.
 
What is the goal here with this application? What are you attempting to track? Are attempting to track students and what classes they attend? Are you wanting to track their grades? Ca you give us a hint as to what this application is going to be used for?

You most certainly will need a different design, but without knowing what you are attempting to keep track of, we can only make guesses as to what your design should look like.
 
The original Excel file was 277MB - 72 columns 912902 rows - and is riddled with duplicate records. I originally uploaded this file into Access and could not create the needed tables. I broke it into the required separate tables in Excel to load it into MS Access using Student as the Primary/Foreign Key.

The first goal is to remove the duplicates. The second goal is to use Student (this is a University Student ID) + Course Code = Math1131 + Enrolled Date (Year and Term) + Grade (Group, Description and Points) + High School, to profile students who have entered Math1131 for the last 10 years.

I need to aggregate their performance. The idea is to track university student performance, focusing on the ones who failed, vs assumed knowledge using high school math subjects and grades, and which high school they attended, over a 10-year period. We are trying to determine if there is a relationship between the student's high school math background and their level of preparedness for Math1131 - and whether they have the assumed knowledge (poorly prepared vs well prepared). Looking for correlation rather than causation.
 
Last edited:
You don't know which records are actual duplicates and which are not because your data is in spreadsheet format and not normalized into relational database tables. You need to create your design first and then worry about duplicates. Once your data is normalized and relationships created between the tables, actual duplicates will become much more clear when the data is appended to the proper tables.

I have worked on a very preliminary design which is shown below. Since I don't know what all the fields you had in your design mean, I didn't include them, but here is one design thought:
1738025142743.png

It is very preliminary but gives you some idea of how these kinds of applications work in ACCESS.
 
From your description, you should have been able to remove about 60 columns from your excel spreadsheet BEFORE you tried to import it.
 
Form what is being described at #14, the data is only for reporting so the OLTP normalised model is not applicable. This is essentially a data set that should be structured as a data mart - star schema or snowflake style schema are more applicable (for a RDBMS solution - after cleaning out duplicates. However the rules for de-duplication are not apparent.

As this is an analysis of university student enrolment data, there are some highly skilled people that use SASS or SPCC tools to data analytics, that use text files/ queries, in a flat file format. However if you are proceeding via the Access path the schema I expect should be focused on the UNSW_Maths entity as the central entity about which you will identify the students of interest - Enrolled in Math1311 - and determine the associated attributes of those enrolled students for further analysis - eg which schools/school types they came from, what UAC they achieved in their HSC, what grades they achieved as they progressed etc. The structure will then be able to be used to answer flexibly a range of questions associated with students enrolled in specified courses - not just Math1311.

Before expanding on this advice - and speculating about a suitable data mart schema - high-level confirmation of the situation described in the above para is needed in the first instance.
Populating the star / snowflake schema may not overcome the issues with the volume of data - see post #10.
 
Form what is being described at #14, the data is only for reporting so the OLTP normalised model is not applicable. This is essentially a data set that should be structured as a data mart - star schema or snowflake style schema are more applicable (for a RDBMS solution - after cleaning out duplicates. However the rules for de-duplication are not apparent.

As this is an analysis of university student enrolment data, there are some highly skilled people that use SASS or SPCC tools to data analytics, that use text files/ queries, in a flat file format. However if you are proceeding via the Access path the schema I expect should be focused on the UNSW_Maths entity as the central entity about which you will identify the students of interest - Enrolled in Math1311 - and determine the associated attributes of those enrolled students for further analysis - eg which schools/school types they came from, what UAC they achieved in their HSC, what grades they achieved as they progressed etc. The structure will then be able to be used to answer flexibly a range of questions associated with students enrolled in specified courses - not just Math1311.

Before expanding on this advice - and speculating about a suitable data mart schema - high-level confirmation of the situation described in the above para is needed in the first instance.
Populating the star / snowflake schema may not overcome the issues with the volume of data - see post #10.
I was asked to use MS Access by the Math faculty to clean the data and create the query. I don't know SASS or SPCC.

UNSW_Maths table has duplicates - it lists the student number, all course codes (but we are focusing on Math1131 only) and the course description. We are only focusing on this subject as it's a first-year subject that the students are failing repeatedly. And there are duplicates here too.

I suspect once I get this query sorted other faculties, like Engineering, will want to look at a similar query for their math-related subjects.

I think you may have hit on something important with the data mart idea. Although I am new to MS Access I am not sure how to accomplish this.
 
Last edited:
Step 1: Identify and remove duplicates: in what sense are they duplicates? As this is an extract of data / combined data from multiple queries of the uni enrolment system (possibly combined with data from UAC or external sources) then what combination of data will allow you to determine the duplicates? There might be several permutations. And therefore several steps you might follow thru.
Keep your original data set and operate on a copy. Assuming the Excel file is the source and is a single broad table of data you should progress through the import of the data in this cleansing/preparation step using staging tables - which you can repeat from another copy of the original file as needed. (given the volume of data it might mean staging in separate db).

Step 2: When cleaned / duplicates removed, data standardised / then the data is re-arranged into the structure determine to be the schema you need.
This requires creating the dimension tables by which you expect to slice and dice the data: eg the HighSchoolDim table - which appears to be largely what you have without the StudentID.

Step 3: Create the fact table and import data - the same number of records as in your last clean staging table - according the rules - so that foreign keys are assigned to the fact table data for each dimension of interest, and not importing the data that it in the dimension tables. The data in the dimension tables are not duplicated in the fact table.

There may be more subtlety to this if your schema needs to be snowflake schema - eg HighSchoolDim is linked to HighSchoolRegion (if this is considered of interest)

UNSW_Maths table has duplicates - it lists the student number, all course codes (but we are focusing on Math1131 only) and the course description. We are only focusing on this subject as it's a first-year subject that the students are failing repeatedly. And there are duplicates here too.
So is this all courses the student participated in, only their first-year subjects, only those that failed 1131? Duplicates only if the other courses they participated in are not considered a possible dimension/factor to the failure rate?
If successful and it is decided to extend to analysis of other courses then the replication of similar steps - preserved through staging - would be reapplied (although new cleansing issues will appear in new data). However I suspect you will continue to hit you head against the capacity limits of Access if you ever need to make a generic system.
 
Last edited:
for a starter, just rename your table and remove the "Numeracy_" thing from their name.
just name the db, "Numeracy".
next create a copy of this db (as backup in case anything get worse) and keep it on a safe place.
now concentrate on one subject, just Math1131, and the student's corresponding grade.
you remove unnecessary subjects, grades, etc.

also, just use Total query instead of crosstab since you are only filtering on 1 subject at a time.
 

Users who are viewing this thread

Back
Top Bottom