Analyzing Baseball Data

stratocaster8 · Aug 4, 2009

I know what I need can be done in Access, I just have no clue how to do it. My Access skills are very limited. Any help would be greatly, greatly appreciated.

I have a table with every Yankee game's baseball data since 1999. It's about 15 columns of different types of data. One of the columns has an "O" or a "U" in it. They stand for Over and Under and the column is called 'over/under result'.

What I want to be able to do is take a year, say 2002, and analyze the data starting at the first game to find the instances of three straight "O" occurrences (or however many consecutive I want), and then once the first "U" happens, how many additional "U"s occur within the next three records.

So in layman's terms, I want to be able to find every instance where there is three straight Unders solely, and then be able to count how many Unders appear in the NEXT three games after that and if they appear in the first, second, third, or fourth game after the initial three.

I hope this makes sense, but I want to be able to run reports on the data based on theories I have...."when a team has 5 straight overs, when does the next under appear in the three games after that?"

I'll be happy to clarify any of this if it is confusing.

Thanks!

Galaxiom · Aug 5, 2009

Firstly I would encourage you to change the field name from [over/under result]
Using the slash and spaces in names is best avoided. It also makes the query design easier to write you don't have to enter the [] around the name. For example, you could call it OverUnder. Using capitilisation like this is easily entered.

Remember tables just store data. You can present that data under any heading in a form or report so keep the names simple but meaningful in the tables.

Is there an eqaul result as well? If not consider using a yes/no field if there are only two possible values. As a boolean value it uses less storage than a character. Working with the booleans often simplifies the syntax.
You can write expressions as If [fieldname] which really means If [fieldname] = True

Since you are working with years, if you don't already have a field like [LeagueYear] derive one in a query using Year([GameDate]). (This can also be derived where a season is split across two years but that is another question.)

You should also have a GameID key field (format number) with the sequence matching the odrer of the games.

The basic technique for your query involves joining a table/query to an alias of itself.

Create a new query. Drag the first query (or original table if you didn't need to create LeagueYear) into the designer twice.
The second instance will be an alias called queryname_1. Join them on LeagueYear. This will connect each game to all other games in that Year and ignore the others.

Add the fields you want from the first query including the OverUnder field.
Add the GameID from queryname_1 (GameDate can be used in the adsence of a key but best to have the ID.)
Add the OverUnder field from queryname_1

In the criteria box of queryname_1.GameID type:
> queryname.GameID
This will then only select games held after the current record.
In the Sort Box choose Ascending.

Once this query has associated the games with those immediately following you will be able to ask the questions you need about sequences.

Unfortunately selecting the first x number of games can only be done by editing the SQL view of the query. Do you have any familiarity with SQL? We need to add the SELECT TOP sql command manually get the desired number of following matches.

Try what I have suggested and post back when you need more help.

stratocaster8 · Aug 5, 2009

Wow thank you for the detailed response. Since I am a very novice Access user, I'm going to address the first half of your reply before I even get into the rest. I appreciate your help for as long as you'd like to answer my questions. Thank you in advance.

1.Firstly I would encourage you to change the field name from [over/under result]
Using the slash and spaces in names is best avoided. It also makes the query design easier to write you don't have to enter the [] around the name. For example, you could call it OverUnder. Using capitilisation like this is easily entered.

Response: Done.

2.Is there an eqaul result as well? If not consider using a yes/no field if there are only two possible values. As a boolean value it uses less storage than a character. Working with the booleans often simplifies the syntax.
You can write expressions as If [fieldname] which really means If [fieldname] = True

Response: There are actually three results. "O" for over, "U" for under, and "P" for a push (a tie). I always want a P to be ignored as a no value if that is possible, and just move onto the next record.

Should I keep this as a character value since there are three results?

3.Since you are working with years, if you don't already have a field like [LeagueYear] derive one in a query using Year([GameDate]). (This can also be derived where a season is split across two years but that is another question.)

Response: I have three different fields for dates. I have a month field (values 1-12), a day field (values 1-31), and a year field (values 1999-2007). I have no idea if this is the correct way to go about this. I will be running queries for individual years only and don't need to span multiple years. Any suggestions are welcome.

4.You should also have a GameID key field (format number) with the sequence matching the odrer of the games.

Response: I have a couple questions about this. What my plan was going to be, was to create one table for each of the 31 teams (so 31 tables), with each table containing every game result from 1999-2007 for that specific team. I was going to do it this way because I'm never going to be comparing team to team, rather I'm only going to be evaluating data on a team by team basis.

If you agree with me creating 31 separate tables, should I then create a specific ID number for each game, meaning the first game of 1999 would be record #1, and the last game of 2007 would be record #1457? OR is this something that Access will assign for me? And should there be a different GameID number for ALL of my tables, or is it only table specific?

Or do you think I should combine all 31 teams into one massive table?

Let's start here for now with my questions. I have attached the Excel spreadsheet for the New York Yankees that I am working with so that you can see what I'm working with. I had to take 2006 and 2007 data off of this spreadsheet for size purposes in this forum, but you'll get the idea. Any suggestions are absolutely welcome. View attachment yankees99-05.xls

Galaxiom · Aug 5, 2009

Keep your three letter code for the OverUnder. Though there are advantages to using a number instead.

Normally you would keep a date in a single field formatted as Date. Access is very good at managing everything about dates in this format. Including both the year and the date in the table is against "Normalization" meaning that it is possible to have confilcting information becasue it is effectively entered twice. Once in the date and again in the year. So we create the year on the fly.

Indeed if OverUnder can be calculated from any other information in the database it should not be stored in a table but generated as required.

Database structure may appear odd at first because you must not think of tables as spreadsheets. Tables store the data. Entry and presentation are done in forms and reports. Try to understand this as explained below. Novices rarely use enough or the right tables or use too many the same type as you have.

Data goes into the same table if it has the same structure and fields.
Hundreds of thousands of records in a table are not a problem for Access.
It is much easier to lookup all the information in related tables than check 31 separate tables.

Make a Team table with TeamID and TeamName. Make TeamID the Primary Key of the table. Ideally use a number format (Byte in this situation as you have less then 255 teams) for the TeamID.

You can add other fields to this such as their home ground locations etc.
Make a Locations table with a LocationID, LocationName etc The HomeGround field in the Teams table would refer to the LocationID.

Throughout your other tables always refer to the team by its TeamID key and the Location by its LocationID field. This is the normal construction of a database and you need to think in these terms.

Create a Match table with MatchID, MatchDate, LocationID. Location is called a Foreign Key because it refers to another table.

You might think you would include the fields
TeamA, TeamB, ScoreA, ScoreB, OverUnderA, OverUnderB etc
But there is a better way that makes it easier to query.

Create a table with MatchID, TeamID, Score and OverUnder using the foreign keys from Match and Team tables to enter the data.

Now this single table can be easily queried for team, who they played, Score OverUnder value etc.
Any extra information about the teams or matches is found using the foreign key. Information like home or away is also easily found.

As further examples, later you might add a table of People. Once again you have PersonID and information about them. Then you would make a Player table with two fields. TeamID, PersonID.

You might then add their position to this table. Make a Position table with Position ID and PositionName. Use the PositionID in the Player table.

When the structure is right it is so much easier to query the information. Access loves looking up short numbers. Text strings slow it down a lot.

Galaxiom · Aug 5, 2009

To assign a key automatically where many records are frequently added use the AutoNumber format in the table column.

Since there are many different teams and many matches we will use the match date for the calculations. This would not be susceptible to match records being entered out of order.

While you might not see a reason to compare teams now, good database design always keeps as many options open as possible. Besides, keeping them together means they can be easily displayed in the same report. Otherwise each team would need a customised report to enter or display essentially the same kind of information.

Another reason we use ID fields. The text is written once and whenever that ID is used that one value will be substituted. Change the word once and the whole database is updated. Imagine if you used team names in the records then discovered a spelling mistake.

stratocaster8 · Aug 5, 2009

When you say:
Make a Team table with TeamID and TeamName. Make TeamID the Primary Key of the table. Ideally use a number format (Byte in this situation as you have less then 255 teams) for the TeamID.

You can add other fields to this such as their home ground locations etc.
Make a Locations table with a LocationID, LocationName etc The HomeGround field in the Teams table would refer to the LocationID.

Throughout your other tables always refer to the team by its TeamID key and the Location by its LocationID field. This is the normal construction of a database and you need to think in these terms.

What I have done is make a Team table with column for TeamID (a unique number 1-30 for each team) and a column with the actual Team Name. That's it.

I have also made a second Location table with a LocationID (a unique number 31-60) and a LocationName field. That's it.

My question is: are you saying that I should also add a LocationID field into my Team Table? And if so, am I putting in the same unique LocationId number in there that corresponds to the home field of the Team ID?

Galaxiom · Aug 5, 2009

The Primary Key fields only need to be unique within their own table. The LocationsID can use the same numbers as TeamID.

When these field are referred to in queries they will be called Team.TeamID and Locations.LocationID. Some developers would just call the fields "ID" in their own table. Fields can have the same name in different tables though it can lead to confusion.

The Location field in the Team table is optional and not strictly necessary for the calculations you originally asked about. But yes. You would use the LocationID in the field.

As a foreign key you don't have to call the field LocationID in the Teams table, just use the values from the Primary Key (ID) field in the Location table. You might call it HomePitch in the Teams table. In the Match table you would call it something else.

I really included the Locations table as a further example to emphasise proper database structuring. It is easier to see the pattern when provided with several examples.

It is good that you have implemented it though. You are probably already seeing the potential to extract a lot more than the your original queries.

The home team advantage is well recognised in many sports. Indeed in many cases a draw awards less championship points to the at home team than the away team. This factor could certainly influence the conclusions you are drawing from your analysis.

stratocaster8 · Aug 17, 2009

Question....

How do I assign a Game ID to each contest for every team since 1999? Do the games played have to be listed within the table in chronological order by date, or can I list every game by team by year (chronologically) and assign Game ID number that way?

I'm just unsure how to structure the table since I will only running queries by team and never comparing multiple teams results in one query.

Analyzing Baseball Data

stratocaster8

New member

Galaxiom

Super Moderator

stratocaster8

New member

Galaxiom

Super Moderator

Galaxiom

Super Moderator

stratocaster8

New member

Galaxiom

Super Moderator

stratocaster8

New member

Similar threads

Users who are viewing this thread