manu
08-22-2001, 06:30 AM
Is there a utility which can select records using "nth select" so an even representation can be selected from a given data set
|
View Full Version : Nth select manu 08-22-2001, 06:30 AM Is there a utility which can select records using "nth select" so an even representation can be selected from a given data set Drew 08-22-2001, 06:34 AM hi manu, you need to use a syntax like "SELECT TOP n myTable.ID..." in your query, HTH Drew [This message has been edited by Drew (edited 08-22-2001).] manu 08-22-2001, 06:37 AM Thanx for your prompt reply Drew I am a complete rookie when it comes to familiarity with expressions etc.. May I take the Liberty of requesting a walk through your idea and help in creating an expression as suggested by you. Thanks for your time and patience my friend Drew 08-22-2001, 06:50 AM Of course. Having looked at your question again i'm not sure my answer was really what you're after. It will return you the top so many records rather than a random selection. If that's what you're after then: Assuming you have your query written which pulls all the data that you want a selection of, go into SQL view ( View | SQL View ) and just add 'TOP n' without the '' after SELECT and before th first field name. eg instead of "SELECT myTable.ID..." you would now have "SELECT TOP n myTable.ID...". If you wanted a random selection then post back, i'm sure i have some code here somewhere to do that. OR are you after a way of selecting a representation of the 'average' record within the dataset? Drew [This message has been edited by Drew (edited 08-22-2001).] manu 08-22-2001, 07:02 AM Thank You Yes, Random selection is what I am after. As an e.g. "I have a dataset of 1000 records and would like to select 200 records. To achieve even representation across the entire dataset, I should pick every 5th record from my dataset (1000/200 = 5) The utility/code that I am looking for basically would make me specify: 1. Total record count (e.g. 1000) 2. Sample size needed (e.g. 200) the code at this point will go figure the "nth value("5" in the above e.g.) subsequently, will return the sample dataset Apologies if I got a bit long winded there. Thanks, and all help is appreciated Manu The_Doc_Man 08-22-2001, 07:23 AM If I may butt in with a thought... Add a Yes/No field to your table. This is not very expensive to do. Make its name [Selected] or [Chosen] or [Picked] or something like these but that doesn't conflict with any of your other names. I'll used [Chosen] because it is a short name. Now write two update queries for your table. The first one should set every [Chosen] (or whatever its name is) to False. The second one should set [Chosen] to True with a criteria field of Rnd() < fraction where fraction is the decimal fractional representation of what percent of the table you want to pick. So if it is 1 out of 5, 1/5 = 0.20 so the criteria would be Rnd() < 0.20 Then, one last query (for all records with [Chosen] = True) gives you a random sample. Don't like the sample? Changed the table and want to re-run it? Just repeat the process. The Rnd function initializes itself from the system clock if you don't run a randomize, so this should produce a different sequence each time. This ought to work. Just make sure when you build the query that it doesn't try to use "Rnd" as a text value. Sometimes the expression builder doesn't quite get it right. But this should work even within a simple query. Drew 08-22-2001, 07:30 AM Okay, i'm off home shortly but will pick up again tomorrow. Where to go next depends on how you want to use the data - do you just want to print a report or do you want to be able to manipulate it further? Catch u later Drew <extra bit> I like Docs post above, but Rnd either needs to be randomized somehow or it will remain the same for the entire query ( is it actually called for every record?). I think it'll need to be run from code to work. Otherwise way better than what i was thinking of. Anyway, hometime for me, bye 4 now http://www.access-programmers.co.uk/ubb/smile.gif [This message has been edited by Drew (edited 08-22-2001).] manu 08-22-2001, 07:40 AM Have a Good Day Drew... The code for nth select should allow choosing all/selected fields for qualified records. So, the 200 records(post nth select)can include all/selected fields from the original dataset. Regards, Manu manu 08-22-2001, 08:06 AM Hi The_Doc_Man Thanks for your input, Tried your suggestion, Having trouble in 2nd update query with criteria:Rnd() < 0.20 Expression Builder defaults the function as Rnd(<number> ) < 0.20 Tried: 1.deleting the <number> Rnd() < 0.20 2.replace with record count Rnd(1000) < 0.20 where 1000 is the total record count of dataset neither of the above updates the table with True Any further suggestions?? Regards, Manu Drew 08-23-2001, 02:38 AM Morning Manu, Working with Docs idea above here's some code that will mark the records as selected Sub changeYN(bytPercSel As Byte) Dim rst As Recordset Dim dbs As Database Dim x As Integer Dim lngTotalRecords As Long Dim lngNumToSet As Long Set dbs = CurrentDb dbs.Execute "UPDATE org_employees SET chosen=False" Set rst = dbs.OpenRecordset("org_employees", dbOpenDynaset) Randomize rst.MoveLast lngTotalRecords = rst.RecordCount lngNumToSet = (bytPercSel * 0.01 * lngTotalRecords) Do Until x >= lngNumToSet With rst .MoveFirst .Move Int(lngTotalRecords * Rnd) If Not .Fields("chosen") Then .Edit .Fields("chosen") = True .Update x = x + 1 End If End With Loop End Sub Its DAO so if you're in 2K you'll either need to change it accoringly or add the references in, you may also want to change the recordset opening to something more sensible rather than grabbing the entire table, any problems post back HTH Drew However - this doesn't select every 5th record, it selects 1/5 of records ( if you pass in 20 as bytPercSel, thinking about it, you'll want to check that bytPercSel is less than 100 before it sets off... ) [This message has been edited by Drew (edited 08-23-2001).] v.khawaja 08-23-2001, 04:39 AM Or Private Sub cmdWhatEver() Dim rst As Recordset Dim dbs As Database Dim intCounter As Integer Set dbs = CurrentDb dbs.Execute "UPDATE tbltablename SET chosen=False" Set rst = dbs.OpenRecordset("tbltablename", dbOpenDynaset) intCounter = 1 rst.MoveFirst While Not rst.EOF If intCounter = 5 Then rst.Edit rst!chosen = True rst.Update intCounter = 1 End If intCounter = intCounter + 1 rst.MoveNext Wend End Sub manu 08-23-2001, 07:45 AM Good Morning Drew, Kindly read my e-mail message. Regards, Manu Drew 08-23-2001, 07:52 AM erm, bad news on that one i'm afraid - i broke my linux box the other day, i'm planning to do a fresh install tonight but may not have time to set everything up. Can u post here? manu 08-23-2001, 08:30 AM Thanks for your detailed response this morning, however, I will need some guidance as to: 1. Where should I insert this code 2. Next steps after code insertion 3. Your posted message talks to DAO...?? 4. What should I expect to see in my table after running this code/utility My apologies for these very basic/silly questions and all your time and patience is much appreciated. Regards, Manu Drew 08-23-2001, 08:58 AM you're welcome, they're not silly questions ( not to me anyhow, the only stupid question is the unasked one ). Okay:- 1/ In a module - just cut and paste. Doesn't matter if it's a new one or not, up to you. 2/ depends where you want to call it from. If you're just doing this db for yourself then you could just put Sub RunTheThing() changeYN 50 End Sub above the already pasted routine. The 50 here refers to what percentage of records will be 'selected' from the whole group, if you want 1 in 5 then 20 will do it etc etc You'll also need the change the references to my field and table names in there ( ie 'org_employees' and 'chosen' ) to whatever yours are called. 4/ If you've added the Chosen field to your table then x% of the records will be marked as true, the remaining 100-x% will be false still. You can then build any queries you want based on that field being true. Every time you run the code it will change all the records to false and randomly reselect some more. 3/You only need to worry about DAO/ADO if you're on Access 2000. I only have 97 so i'm not totaly sure of the changes that need to be made to make it a2k friendly. But if u are on 2k post back, there's plenty of people here that will know. The TLAs refer to Data Access Objects and ActiveX Data Objects - basically microsoft has changed the way it's databases present data to the outside world. I guess ADO is better(?) as it's good to work with in ASP but can't really comment in any deep way, If that doesn't help any please post back, Drew [This message has been edited by Drew (edited 08-23-2001).] manu 08-23-2001, 10:25 AM Here we go.. Code as entered in My Module: ********************************************* Sub Runthething() changeYN 20 Dim rst As Recordset Dim dbs As Database Dim x As Integer Dim lngTotalRecords As Long Dim lngNumToSet As Long Set dbs = CurrentDb dbs.Execute "UPDATE TA2 SET chosen=False" Set rst = dbs.OpenRecordset("TA2", dbOpenDynaset) Randomize rst.MoveLast lngTotalRecords = rst.RecordCount lngNumToSet = (bytPercSel * 0.01 * lngTotalRecords) Do Until x >= lngNumToSet With rst .MoveFirst .Move Int(lngTotalRecords * Rnd) If Not .Fields("chosen") Then .Edit .Fields("chosen") = True .Update x = x + 1 End If End With Loop End Sub ********************************************* Additional Info: Table Name - TA2 Yes/No Field - Chosen Receiving the following Compile Error: "COMPILE ERROR: SUB OR FUNCTION NOT DEFINED" Please advise further steps Manu manu 08-23-2001, 10:26 AM Oops, forgot to mention, I am on Access97 FYI Manu Drew 08-23-2001, 11:47 AM sorry, that wasn't very clear. You should end up with 2 seperate subs. One as Sub changeYN(bytPercSel As Byte) and one as Sub RunTheThing(). Then when you want to run it, put yourself into the Sub RunTheThing() one and either hit F5 or the play button on the top on the Visual Basic toolbar ( if you have it visible ). At the moment it's probably choking on the changeYN 20 line. Your changes to the table name looks fine and as your on 97 too you'll probably be okay once they're split back out into 2 routines again. Do post back if you get any more probs Drew manu 08-23-2001, 12:09 PM Beautiful.... Works like a charm, a comment though: The routine selects records randomly and not necessarily sequentially. Here's what I mean: Given Dataset - 1000 records Sample size needed - 250 A true Nth select routine should go and select every 4th(1000/250)record to create a truly representative sample. The current routine works on %, where in answer to the above situation, it will return the required 250 records but not necessarily every 4th record. Subsequent to running the routine, I went and looked at my table, records were duly selected and marked in the "chosen" field, occassionally though, as many as 4 records were chosen together before a skip to look for the next qualified record. I hope I am not beating a "dead horse" here Thanks for all your effort and patience Manu pcs 08-23-2001, 05:25 PM i'm kind of hesitant to weigh-in on this one and i'm prepared for the abuse from those members who are more statistically-inclined. http://www.access-programmers.co.uk/ubb/smile.gif but, ----you wrote--- A true Nth select routine should go and select every 4th(1000/250)record to create a truly representative sample. ---------------- i think if you want a purely random dataset selecting the nth record is a problem as it imposes an 'order' on the data. think that getting a random % of the total record count (which you now have) is a better sample... hth, al Drew 08-24-2001, 12:28 AM i was thinking random was what you were after and i would have thought it more statistically valid but i'm not a statistician either. v.khawaja's post above will pick you every 5th record out rather than a random selection. The code i wrote could select the first 200 records and no others, that's the way it goes. If you need any help getting v.khawaja 's code to work post again HTH Drew manu 08-24-2001, 05:08 AM A Huge Thank You to all for reaching out and assisting. I now have 2 codes which select either 1. Randomly 2. Sequentially This is awesome, makes my day Special thanks to Drew for staying with me on this... Best Regards, Manu Drew 08-24-2001, 05:25 AM pleasure, it's nice to write some code to do something i've not done before, especialy when it does something useful for someone. And a special thanks from me to The_Doc _Man, without whom we'd now be knee deep in queries and functions, thanks mate Drew [This message has been edited by Drew (edited 08-24-2001).] manu 12-22-2003, 05:41 AM Both codes [outlined in the attached message string] have worked very well for me as long as I was using Access97, however, I recently upgraded to Access 2000 and they don't seem to work anymore. Can someone help identify and correct the problem? All help is much appreciated. Regards, Manu Attached is a screenshot of what I'm getting when using either of the codes in Access 2000 namliam 12-22-2003, 05:45 AM What where how when ? manu 12-22-2003, 05:55 AM Sub Runthething() changeYN 96 End Sub Sub changeYN(bytpercSel As Byte) Dim rst As Recordset Dim dbs As Database Dim x As Integer Dim lngTotalRecords As Long Dim lngNumToSet As Long Set dbs = CurrentDb dbs.Execute "UPDATE CEON_AGE SET chosen=False" Set rst = dbs.OpenRecordset("CEON_AGE", dbOpenDynaset) Randomize rst.MoveLast lngTotalRecords = rst.RecordCount lngNumToSet = (bytpercSel * 0.01 * lngTotalRecords) Do Until x >= lngNumToSet With rst .MoveFirst .Move Int(lngTotalRecords * Rnd) If Not .Fields("chosen") Then .Edit .Fields("chosen") = True .Update x = x + 1 End If End With ************************************************** ** Private Sub cmdWhatEver() Dim rst As Recordset Dim dbs As Database Dim intCounter As Integer Set dbs = CurrentDb dbs.Execute "UPDATE CEON_AGE SET chosen=False" Set rst = dbs.OpenRecordset("CEON_AGE", dbOpenDynaset) intCounter = 1 rst.MoveFirst While Not rst.EOF If intCounter = 20 Then rst.Edit rst!chosen = True rst.Update intCounter = 1 End If intCounter = intCounter + 1 rst.MoveNext Wend End Sub ************************************************** ** Trying to run either of the 2 results in: 1. Compile Error "User-Defined Type not defined 2. Yellow Highlights "Sub changeYN(bytpercSel As Byte)" All help is appreciated.... Manu manu 12-22-2003, 06:01 AM Here's the screenshot Rich 12-22-2003, 06:25 AM Dim dbs As DAO Database manu 12-22-2003, 06:59 AM Attached is screenshot indicating your suggested syntax, however, it gives a compile error.... Manu namliam 12-23-2003, 12:31 AM 1) There is no X= 0 line.... Although this works its "Bad practice" Dim rst as DAO.Recordset Dim dbs as DAO.Database should relieve your troubles How ever make sure DAO is referenced...... (from a module in the menu Tools=>References) Regards namliam 12-23-2003, 12:56 AM Just thought i would give you another thought.... rnd give a number >= 0 and < 1 Thus your LAST record will NEVER EVER be selected!!!! Just a mind you..... Regards manu 12-23-2003, 07:23 AM Thank You Sir, Adding the DAO prefix is fine however, I'm not clear as to where in the code should I insert your suggested modifications rnd give a number >= 0 and < 1 There is no X= 0 line.... Although this works its "Bad practice" [PS: I did not author this code and neither am I a programmer....& I appreciate you bearing with me on this] Never being able to select the last record is something that I can live with. Regards, Manu namliam 12-23-2003, 11:16 PM just add the respective previxed at the respective dim lines in your code.... Regards |