Lookup the value from a previous record (1 Viewer)

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
OK - you know where to come if you do need further support.
If its not directly related to the serialize function you may be better off starting a new thread and if necessary linking back to this one.
Good luck

Hello again Isladogs. Mind if I take you up on that offer? This isn't related to the serialize function but is completely in line with the original post I made.

Here goes... so I've used serialize to create a rank field for my data and I've now been looking at how I use this lookup a previous record in another field. However, I've realise things aren't as simple as I originally thought.

Here's the non-simplified version fo the background and output...

...ELO was a system devised by a famous chess player of the same name to measure the strength of his opponents. There are various versions but in mine a player starts with an ELO score of 1500. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. In a series of games between a high-rated player and a low-rated player, the high-rated player is expected to win more. If the high-rated player wins, then only a few rating points will be taken from the low-rated player. However, if the lower-rated player wins, many rating points will be transferred.

Now there are lots of complicated calculations to underpin this but the basic principle is that a player's ELO rating is determined by a formula based upon both players' ELO scores going into that match. If it helps then I've put together a small Excel sheet for Novak Djokovic - https://1drv.ms/x/s!AjGyJyJaARk2hf9JgTU3EMLIEAnZvg?e=orc1dQ. In cell K111 you can see that Djokovic starts with a score of 1500 and his opponent starts with a score of 1434 (cell S111). These two scores mean that Djokovic had a 59% chance of winning the match (which he did). Combining this probability with something called a 'K' factor (basically a number based on the number of previous matches played) means Djokovic has a new ELO score of 1553 (an increase as he won). This is then carried through to his next match in cell K110 and so on and so on. You can see the way his ELO score progresses from the first game in this dataset to his current score which is equal to Roger Federer in the final game. Note: I have kept the formulas in the Djokovic specific cells but I have to leave out formulas for the other players as this would require all their matches to be in the dataset and then all their opponents matches and all their opponents matches... etc etc.

I'm able to build a query that replicates nearly all of this Excel - the only columns I'm stuck on are K and S which are the starting ELO scores for player 1 and player 2. As per the previous posts I've used the serialise function to build a ranking system that I can use to find a previous ELO score for each player from another field. However, the big problem is that the previous record isn't in another table - it needs to be generated by the query itself as every ELO score is the output of the previous match. I've tried using a sub-query to lookup a field generated by the query but this errors (probably obviously!). I then thought about having two identical queries to reference each other but I don't think this is right as you still need one of them to generate the previous ELO score. Is there a simple SQL solution to this or will I maybe have to drastically think about the data structures I'm using?

I realise this it's a huge ask to read all this but I'm at my wit's end. Thanks in advance!!
 

isladogs

MVP / VIP
Local time
Today, 02:43
Joined
Jan 14, 2017
Messages
18,209
Hi Jossy
I understand the principle involved but, due to other commitments, I really haven't time to study the Excel file and attempt to provide a solution for you at the moment.
However, there are plenty of other forum members who may be willing to take this on.
As many forum members will not open files posted externally, I suggest you zip the Excel file and post it here
 

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
Hi Jossy
I understand the principle involved but, due to other commitments, I really haven't time to study the Excel file and attempt to provide a solution for you at the moment.
However, there are plenty of other forum members who may be willing to take this on.
As many forum members will not open files posted externally, I suggest you zip the Excel file and post it here

Completely understand and thanks for your help so far. Attached the file to this post.
 

Attachments

  • ELO Excel to Access - Djokovic.xlsx
    41.4 KB · Views: 133

CJ_London

Super Moderator
Staff member
Local time
Today, 02:43
Joined
Feb 19, 2013
Messages
16,607
not sure if I am in a position to offer any help. But the way a problem is solved in excel is frequently not the way you would solve it in a database.

For a start, data needs to be normalised in a database and Excel is generally as far from normalised as you can get.

Secondly the required output - Excel by it's nature ends up with lots of rows, lots of columns which you then have to parse/filter to find the information your require. Do you really need a query that presents all rows for all players? or are you only interested (at any one time) in the players in an upcoming match?

Thirdly, your explanation misses out the most important bit (and I'm not going to try to work it out from the spreadsheet) which is
These two scores mean that Djokovic had a 59% chance of winning the match (which he did). Combining this probability with something called a 'K' factor (basically a number based on the number of previous matches played)
Why is the P2 CurrentELO hardcoded? Why is the P2 K overall value hard coded?

Is there any relevance of the location - are you saying if player A plays player B the 'projected outcome' would be the same regardless on where they play
 

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
Thanks for picking up the mantle CJ.

Do you really need a query that presents all rows for all players? or are you only interested (at any one time) in the players in an upcoming match?

There will be two main outputs needed:

1. The ability to see the latest ELO rating for a given player. In this case it wouldn't be necessary to calculate all rows for all players. However, the way the calculation works it will probably be close to all records as you will need the scores of each of their opponents for all their previous matches. In order to get those you will need the scores of all the opponents' opponents and so on and so on.

2. A comparison of the accuracy of the forecast results to the actual results, e.g. did the model predict a winner with 75% accuracy when the player was calculated to have a 75% probability of winning? I will then add in a range of additional metrics and calculations to try and improve this accuracy, e.g. the number of points won in a match.

Why is the P2 CurrentELO hardcoded? Why is the P2 K overall value hard coded?

CurrentELO and K are only hardcoded for Djokovic's opponent. Otherwise you need all their matches in the dataset to calculate cELO and K, and then all their opponents' matches and so on. The file size would have been above the 2Mb limit.

Is there any relevance of the location - are you saying if player A plays player B the 'projected outcome' would be the same regardless on where they play

I assume by this you mean the name of the tournament? No - just a bad habit of always seeing it so adding it to the query! At a later point I will want to create a query to calculate ELO for a specific surface, e.g. grass or clay, but once I figure out the 'overall' method then I'm sure I can adapt it.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 02:43
Joined
Feb 19, 2013
Messages
16,607
I don't have a lot of time today, but first you need to normalise your data. Suggest the structure would be something like


tblLocations (not really required since it does not bear on the calculation but included for completeness)
LocationPK
Location
Name

tblGames
GamePK
LocationFK
GameDate


tblPlayers
PlayerPK
PlayerName
ELO

tblScores
ScorePK
GameFK
PlayerFK
Winner

you then need to populate tblScores for the earliest score for each player with an earliest gamedate (could be 1/1/1900) - unless all players default to a score of 1500

after that your query for a new game would be something like

Code:
SELECT W.PlayerFK, W.ELO, L.PlayerFK, L.ELO, doyourcalcshere
FROM tblPlayers W INNER JOIN (SELECT PlayerFK, Count(*) FROM tblScores GROUP BY PlayerFK) WL ON W.PlayerFK= WL.PlayerFK, tblPlayers L INNER JOIN (SELECT PlayerFK, Count(*) FROM tblScores GROUP BY PlayerFK) CL ON L.PlayerFK=CL.PlayerFK
WHERE W.P

to build your history you will need to populate tblGames with all the games, then using vba, loop through from earliest to latest to calculate the latest score and update the ELO value in tblPlayers
 

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
Thanks CJ - really do appreciate your help. The data is normalised - it's a well structured third party product. The Excel is a copy and paste from a query of a few tables - the structure of which is very similar to yours :)

So I need to go the VBA route you reckon and build a table...? Okay dokey - that might stretch my VBA abilities but I shall give it a go!

One quick question - I know calculated fields in tables are usually a no no but in this instance would it be best to create the non-ELO fields as calculated fields?These depend on, or input into the ELO scores, so they'll need to be created at the same time as the VBA loops through each line. Seems sensible to get Access to do the work for these rather than build the VBA?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 02:43
Joined
Feb 19, 2013
Messages
16,607
I know calculated fields in tables are usually a no no
they are limited in scope - and certainly can't do what you are look for
 

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
Ok thanks. Will work out the best way to do this.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 02:43
Joined
Feb 19, 2013
Messages
16,607
please find attached my attempt to help get you going. The data is based on your spreadsheet less a number of duplicate records such as these

16/11/2018 Nitto ATP Finals - London Novak Djokovic Marin Cilic
16/11/2018 Nitto ATP Finals - London Novak Djokovic Marin Cilic
16/11/2018 Nitto ATP Finals - London Novak Djokovic Marin Cilic
14/11/2018 Nitto ATP Finals - London Novak Djokovic Alexander Zverev
14/11/2018 Nitto ATP Finals - London Novak Djokovic Alexander Zverev
14/11/2018 Nitto ATP Finals - London Novak Djokovic Alexander Zverev
12/11/2018 Nitto ATP Finals - London Novak Djokovic John Isner
12/11/2018 Nitto ATP Finals - London Novak Djokovic John Isner
12/11/2018 Nitto ATP Finals - London Novak Djokovic John Isner

it is not complete by far but hopefully will point you in the right direction

you need to write a vba function to loop through the game dates (earliest first) and then an update query to update eloPlayers. Manually, when you open qryCalcNewELO you will be prompted for a date - enter 16/1/18 to replicate the last row of your spreadsheet. You can use the newELO values to base an update query to update ELO fields in the players table. Then run the query again for the next date (18/1/18). and so on. I would estimate it should only take a couple of seconds to run.

Some of your hard coded values do not relate to the formula. for example on the last row you have hardcoded the P2K value as 122.1 whereas the calculation returns 131.3

Best I can do for you - now need to focus on work that pays
 

Attachments

  • ELO.accdb
    948 KB · Views: 122

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
Really appreciate this CJ, I really do. I've been working on the VBA this afternoon and managed to find a useful post from "asc" about half way down this page: https://forum.punterslounge.com/topic/71276-elo-access-function/page/2/

Interestingly his data looks like it is formatted in the same way as my Excel, i.e. both the Winner and the Loser are on the same row. Do you think that's a hindrance in anyway? The main database is also structured like that...

If you're stacked please don't worry about replying to this quickly - I've got loads to work on with your recommendations and the VBA. Thanks again.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 02:43
Joined
Feb 19, 2013
Messages
16,607
both the Winner and the Loser are on the same row. Do you think that's a hindrance in anyway?
you can do it that way, but it makes for more complicated queries. Don't confuse presentation with table design which is the Excel way. tables are for data, queries/forms/reports for presentation. If you look at qryCalcNewELO you will see the data is presented with winner and loser on the same row
 

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
Thanks CJ. Will have a look at doing it this way.
 

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
please find attached my attempt to help get you going. The data is based on your spreadsheet less a number of duplicate records such as these

16/11/2018 Nitto ATP Finals - London Novak Djokovic Marin Cilic
16/11/2018 Nitto ATP Finals - London Novak Djokovic Marin Cilic
16/11/2018 Nitto ATP Finals - London Novak Djokovic Marin Cilic
14/11/2018 Nitto ATP Finals - London Novak Djokovic Alexander Zverev
14/11/2018 Nitto ATP Finals - London Novak Djokovic Alexander Zverev
14/11/2018 Nitto ATP Finals - London Novak Djokovic Alexander Zverev
12/11/2018 Nitto ATP Finals - London Novak Djokovic John Isner
12/11/2018 Nitto ATP Finals - London Novak Djokovic John Isner
12/11/2018 Nitto ATP Finals - London Novak Djokovic John Isner

it is not complete by far but hopefully will point you in the right direction

you need to write a vba function to loop through the game dates (earliest first) and then an update query to update eloPlayers. Manually, when you open qryCalcNewELO you will be prompted for a date - enter 16/1/18 to replicate the last row of your spreadsheet. You can use the newELO values to base an update query to update ELO fields in the players table. Then run the query again for the next date (18/1/18). and so on. I would estimate it should only take a couple of seconds to run.

Some of your hard coded values do not relate to the formula. for example on the last row you have hardcoded the P2K value as 122.1 whereas the calculation returns 131.3

Best I can do for you - now need to focus on work that pays

Hi CJ. Firstly - thank you so much for building this - it must have taken a while. I really do appreciate it. I've also learnt loads from looking at the table and query structures and reading the SQL, e.g. aliases, Nz functions to name a couple.

Have a few questions if you don't mind?

1. Plugging in 16/1/18 works perfectly however plugging in 18/1/18 (or any other subsequent date) still returns the data for the match on 16/1/18. Am I doing something wrong?

As I've not been able to follow through the logic then I apologise for a few of these questions - I might have been able to figure them out...

2. I'm not so used to update queries - are you recommending that I build an update query based on qryNewCalcELO? Or should I run qryNewCalcELO and then an update query off the results?

3. Looking at the SQL for qryNewCalcELO it selects only Djokovic - I assume that I'd remove this in the full version as it will need to calculate ELO scores match by match for both players?

4. Also on the SQL for qryNewCalcELO it uses TOP 1 - wouldn't this mean it will only select the top record for that date? Surely I'll want to pull all match records for that date and then update eloPlayers with the results?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 02:43
Joined
Feb 19, 2013
Messages
16,607
It's a while since I looked at this, but working from memory

1. Plugging in 16/1/18 works perfectly however plugging in 18/1/18 (or any other subsequent date) still returns the data for the match on 16/1/18. Am I doing something wrong?
assuming you are using my example, no - the reason for the date is to force the query to display the earliest record and show the calc for PC and K factors to prove the calculation works. If you put it 18/1 you should be seeing the values (newELO) generated that need to be saved to move your calculation forward. technically you do not need the date criteria and it can be removed once you start to update the EOL values.

2. I'm not so used to update queries - are you recommending that I build an update query based on qryNewCalcELO? Or should I run qryNewCalcELO and then an update query off the results?
yes and no - I would expect you do this in vba because a) the qryCalcNewELO cannot be made updateable and b) it needs to be run multiple times (for each date) to bring the ELO up to date - so easier to do in vba - you would need to remove the date criteria.#

3. Looking at the SQL for qryNewCalcELO it selects only Djokovic - I assume that I'd remove this in the full version as it will need to calculate ELO scores match by match for both players?
correct - in hindsight that was not required but was originally intended to get to the earliest record.

4. Also on the SQL for qryNewCalcELO it uses TOP 1 - wouldn't this mean it will only select the top record for that date? Surely I'll want to pull all match records for that date and then update eloPlayers with the results?
No, you have to run this on a game by game basis. What if a player plays two games on the same day? Also see below

Once you have the data up to date, the principle would be to use qryFutureProb to make your prediction. Once the game is played and winner/loser known, run the qryCalcNewELO (in VBA) to update the ELO field.

Not clear what your actual process would be - perhaps you run qryCalcNewELO (in vba) after each game, once a day, after the tournament?

And you have data inconsistencies - you have duplicate records and some of the hard coded values do not match the calculation. In addition your date might need to include a time element - if a player plays two games on the same day?

No doubt the design can be improved, for example you might want to store a history of ELO's in which case you need an extra table and make changes to the queries, but that is not apparent from the description of your requirement.
 

Jossy

Registered User.
Local time
Today, 02:43
Joined
Aug 24, 2009
Messages
32
As ever CJ - thanks.

Not clear what your actual process would be - perhaps you run qryCalcNewELO (in vba) after each game, once a day, after the tournament?

Will run 1-2 times per day depending on location of matches, e.g. Europe, US, Asia.

And you have data inconsistencies - you have duplicate records and some of the hard coded values do not match the calculation. In addition your date might need to include a time element - if a player plays two games on the same day?

The dupes were me not joining correctly - fixed this now. The hard coded values were necessary as otherwise I'd have had to have all the non-Djokovic player matches in there. So values wouldn't calculate correctly if I switched to formulas as the K values would be off due to low match counts. The time element was a pain for exactly the reason you mention - there is a round ID which can be used to create a time orderable field so in the end I built an append query to import all new data into a fresh table in sequence of date and round. I then put an autonumber primary key in so make indexing searching etc lots easier.

No doubt the design can be improved, for example you might want to store a history of ELO's in which case you need an extra table and make changes to the queries, but that is not apparent from the description of your requirement.

I've actually adapted your design quite a bit already. The biggest change was the fact that using count in SQL took ages when searching 650k records. I read a lot about recordsets and using .filter here I've managed to reduce the time from several seconds per record to one second :)

On the storing of previous ELO values - it's definitely worthwhile but I've avoided it so far. What would you think - a table with Date, Player ID, Old and New?

Again - thank you so so much for your help with this. I'm pretty chuffed sat watching the database crunch through 325k match records at one a second. Even the time prediction of 88 hours isn't dampening my spirits!
 

CJ_London

Super Moderator
Staff member
Local time
Today, 02:43
Joined
Feb 19, 2013
Messages
16,607
The biggest change was the fact that using count in SQL took ages when searching 650k records
make sure the appropriate fields are indexed

What would you think - a table with Date, Player ID, Old and New?
No - just date, playerID and ELO - latest is then the one with the latest date
 

Users who are viewing this thread

Top Bottom