JSON Data Types when importing to Access (1 Viewer)

  • Thread starter Thread starter Deleted member 73419
  • Start date Start date
D

Deleted member 73419

Guest
Hi,

What is the best way to determine the data types within JSON?

Below is an snippet of the type of code I'm dealing with.

It is a very large file overall and I'm looking to insert it in to an Access DB so I can easily search and produce reports etc so the parsing is done using VBA.

The problem comes when I need to decide what data type the the columns need to be when creating the table.

There is no schema available and I'm not sure that is even an option with the library I'm using.

At first, I'm thinking that I'll need to iterate over everything and see what datatype is best but that changes as fnt:g for id1 would be text however when you get to id 2 for fnt:g, it would be better with text format.

The crux of it is that cannot take the data type of the first element you come across as at some point, that data may well change.

I cannot add it all in as text as that doesn't make for very good use of the queries as you're never quite sure whether to try and convert the text to numbers when doing comparisons.

Is there an easier way to do this?

JSON:
{
    "dte": "3rd August 2022",
    "bnd": [
        {
            "id": 1,
            "a": "S",
            "b": "",
            "c": 0,
            "d": 8300,
            "s": "Not Allocated",
            "v": 1
        },
        {
            "id": 2,
            "a": "P",
            "b": "",
            "c": 8300,
            "d": 9000,
            "s": "Not Allocated",
            "v": 1
        },
    ],
    "fnt": [
        {
            "id": 1,
            "g": "5.53",
            "t": "Lorem ipsum dolor sit amet"
        },
        {
            "id": 1,
            "g": "5.54",
            "t": "Phasellus condimentum, tellus at faucibus feugiat, odio lacus consequat nulla, ac molestie enim orci sit amet mi."
        },
        {
            "id": 2,
            "g": "5.54A",
            "t": "Integer sodales sem non tempus pulvinar"
        },
        {
            "id": 2,
            "g": "5.54B",
            "t": "Vestibulum tempus magna vitae tempor interdum"
        }
    ]
}
 
On the most basic level, it the data is in quotes its text, otherwise its a number
You will need to decide whether date strings should be converted to a datetime datatype

When importing data from Excel, Access looks at the first 8 rows to determine datatype
I suggest you do the same here
 
Last edited:
If you can't be sure what future values would contain, how can you determine beforehand what data type to assign? It seems, to be certain, you'll have to read all values first. However, if values could change, then what data type is appropriate? Just a thought...
 
Is it JSON convention that once you have a datatype then it will always be that datatype?

So 'a' if text the first time, it will always be text every time afterwards.

If that is the case, then I guess I can rely on that and just read the first record to get the data types...
 
Is it JSON convention that once you have a datatype then it will always be that datatype?

So 'a' if text the first time, it will always be text every time afterwards.

If that is the case, then I guess I can rely on that and just read the first record to get the data types...
I don't think it's JSON convention at all. I could be wrong but maybe JSON convention is actually that every value has to be surrounded in quotes. What you could rely on is the source of the JSON data.

Most of the time, that information comes from a data source that already defined its data types. If you're not familiar with the data source or cannot ask the owners for their spec, then I guess you have no choice but to speculate on the data types to use. Can you tell us where the JSON data is coming from?
 
Is it JSON convention that once you have a datatype then it will always be that datatype?

So 'a' if text the first time, it will always be text every time afterwards.

If that is the case, then I guess I can rely on that and just read the first record to get the data types...

No - see my answer in post #2

I could be wrong but maybe JSON convention is actually that every value has to be surrounded in quotes.
Again no - see my answer in post #2
 
I agree with Colin and theDBGuy. The only convention I am aware of is that text needs quotes, numbers do not.

I believe each JSON file is an entity unto itself. There is no "template" or Schema for it. It just is what it is.

That means you also have to depend on the creator of the JSON to be consistent and logical. I.e. if an element is a number in the first item, the creator has to generate a number in subsequent items.

Given how JSON is typically generated, though, that should be internally consistent within a file.

Coincidentally, I just went through an extended process over the last week trying to import JSON files. After 8 or 9 months of successful importing thousands of them, the two newest files started failing. It turned out that the creator of the JSON had started using some characters that the VBA JSON parse I was using could not handle. One was a type of bullet point from an extended character set and the other was internal stylized quotes, the kind generated by Word.

I would be worried more about extended character sets introducing problematic characters than the datatypes of items.
 
I agree with Colin and theDBGuy. The only convention I am aware of is that text needs quotes, numbers do not.

I believe each JSON file is an entity unto itself. There is no "template" or Schema for it. It just is what it is.

That means you also have to depend on the creator of the JSON to be consistent and logical. I.e. if an element is a number in the first item, the creator has to generate a number in subsequent items.

Given how JSON is typically generated, though, that should be internally consistent within a file.

Coincidentally, I just went through an extended process over the last week trying to import JSON files. After 8 or 9 months of successful importing thousands of them, the two newest files started failing. It turned out that the creator of the JSON had started using some characters that the VBA JSON parse I was using could not handle. One was a type of bullet point from an extended character set and the other was internal stylized quotes, the kind generated by Word.

I would be worried more about extended character sets introducing problematic characters than the datatypes of items.
The Key words must be in Quotes, Values can be quoted or not, depending on whether they are strings or numbers.
Code:
"authors": [
          "Teresa Hennig",
          "Ben Clothier",
          "George Hepworth",
          "Dagi (Doug) Yudovich"
        ],
 
Is it JSON convention that once you have a datatype then it will always be that datatype?

So 'a' if text the first time, it will always be text every time afterwards.

If that is the case, then I guess I can rely on that and just read the first record to get the data types...
The convention is: if it works, it works

The data types I've seen in JSON are:
strings, they need quotes
nulls, they don't need quotes, but might come with them
numbers, they don't need quotes, but might come with them
booleans, they don't need quotes, but might come with them
objects, expressed with curly brackets {}, can be empty
arrays, expressed with square brackets [], can be empty

You don't know how these JSON strings will come with every source, when you're dealing with JSON strings from different sources, you gotta know they come in all sorts of flavors. You either gotta run mad tests and figure it out on your own or read their docs if they got any (if they do, trust them, but always look at the data and compare).

JSONs can be all over the place, depending on where they're coming from. Some folks couldn't care less about proper formatting, others might even ignore their JSONs are all malformed, so you might end up with some messed up JSONs many times, this is especially the case when dealing with unofficial stuff.

The light at the end of the tunnel is to keep in mind these JSON strings are there to make something work, so a defined structure should be expected. Still, sh*t happens. Here's a tip: Treat everything as text at first. Once you've got a decent amount of data, then you can start tweaking and putting in some counter measures based on what you've received.
 
Treat everything as text at first.
This is exactly what the first step should look like. Text as a data type can hold all content. By creating a staging table you would at least have a usable table for the time being.
A text file (txt, csv) also only contains plain text. There, a specification or a Schema.ini helps how this table is to be read, i.e. data type per column, decimal separator, text delimiter, code page to be used, etc.

When importing data into database tables, I would not align the data types with what an external exchange file offers. For the functionality of my database, I determine the data types for the individual fields myself.
When transferring the data from the staging table into the real destination table, the data is then made to fit, converted, split, formatted, whatever is necessary.

From the example: 5.53, 5.54, 5.54A, 5.54B
What is this supposed to be functionally? Just text, or a number (for mathematical calculations), where the additional letter would then have to be moved into an additional field of its own? Does the letter have a separate atomic meaning and does it belong in its own field when interpreting the text? This is something the developer decides.

In the case of bad data, i.e. those with unplanned changes, the transfer from the staging table to the destination table is less or more difficult => use a transaction; if there are errors, the process must be improved.

If you receive consistently structured, standardized data from outside, you can forego the staging table and enter it directly into the destination table.

But as I said: I determine the data types of my database tables and other properties (validity rules, indexes) myself, so I don't let an import dictate anything to me.
 
Last edited:
I've spent most of today sifting through the JSON file which appears to be consistent in terms of data types.

So I'm going to go on the premise that quotation marks will denote strings, numbers are numbers and dates are dates (which I'm not too worried about in this context).

The VBA VarType function seems to reliably determine the type so a select case statement should sort this aspect.

More tinkering tomorrow. Thanks for the advice :)
 
I've spent most of today sifting through the JSON file which appears to be consistent in terms of data types.

So I'm going to go on the premise that quotation marks will denote strings, numbers are numbers and dates are dates (which I'm not too worried about in this context).

The VBA VarType function seems to reliably determine the type so a select case statement should sort this aspect.

More tinkering tomorrow. Thanks for the advice :)
Are you aware of Postman? It is a great tool for testing Web APIs and for validating and evaluating the JSON they return.
 
The Key words must be in Quotes, Values can be quoted or not, depending on whether they are strings or numbers.
Code:
"authors": [
          "Teresa Hennig",
          "Ben Clothier",
          "George Hepworth",
          "Dagi (Doug) Yudovich"
        ],
I should have noted that this is NOT well-formed JSON.

As I understand it, the items inside the array should also be key/value pairs.
 
Hi George
Not necessarily. For example, this is valid JSON

JSON:
[{"id": 1, "height": 480, "width": 640, "tags":["cats", "sunrises", "kittens", "travel", "vacation", "work"]},{"id": 2, "height": 480, "width": 640, "tags":["dogs", "kittens", "work"]},{"id": 3, "height": 480, "width": 640, "tags":["kittens", "travel"]},{"id": 4, "height": 480, "width": 640, "tags":["dogs", "sunrises", "kittens", "travel"]},{"id": 5, "height": 480, "width": 640, "tags":["dogs", "cats", "sunrises", "kittens", "work"]},{"id": 6, "height": 480, "width": 640, "tags":["work"]},{"id": 7, "height": 480, "width": 640, "tags":["dogs", "sunrises"]},{"id": 8, "height": 480, "width": 640, "tags":["dogs", "cats", "sunrises", "kittens", "travel"]},{"id": 9, "height": 480, "width": 640, "tags":["dogs", "sunrises", "travel"]},{"id": 10, "height": 480, "width": 640, "tags":["cats", "kittens", "travel", "vacation", "work"]},{"id": 11, "height": 480, "width": 640, "tags":["cats", "travel"]},{"id": 12, "height": 480, "width": 640, "tags":["dogs", "cats", "travel"]},{"id": 13, "height": 480, "width": 640, "tags":["dogs", "cats", "sunrises", "kittens"]},{"id": 14, "height": 480, "width": 640, "tags":["dogs", "sunrises", "vacation"]},{"id": 15, "height": 480, "width": 640, "tags":["dogs", "travel", "vacation", "work"]},{"id": 16, "height": 480, "width": 640, "tags":["cats", "vacation"]},{"id": 17, "height": 480, "width": 640, "tags":["kittens", "vacation", "work"]},{"id": 18, "height": 480, "width": 640, "tags":["dogs", "cats", "sunrises", "kittens", "vacation", "work"]},{"id": 19, "height": 480, "width": 640, "tags":["dogs", "sunrises", "kittens", "travel", "work"]},{"id": 20, "height": 480, "width": 640, "tags":["sunrises", "kittens"]},{"id": 21, "height": 480, "width": 640, "tags":["dogs", "cats", "sunrises", "travel", "vacation"]},{"id": 22, "height": 480, "width": 640, "tags":["sunrises", "travel", "vacation", "work"]},{"id": 23, "height": 480, "width": 640, "tags":["cats", "kittens", "travel"]},{"id": 24, "height": 480, "width": 640, "tags":["dogs", "cats", "sunrises", "kittens", "travel", "vacation"]},{"id": 25, "height": 480, "width": 640, "tags":["travel", "work"]},{"id": 26, "height": 480, "width": 640, "tags":["sunrises", "kittens", "travel", "vacation"]},{"id": 27, "height": 480, "width": 640, "tags":["dogs", "sunrises", "kittens", "travel", "work"]},{"id": 28, "height": 480, "width": 640, "tags":["sunrises", "kittens", "travel"]},{"id": 29, "height": 480, "width": 640, "tags":["kittens", "vacation", "work"]},{"id": 30, "height": 480, "width": 640, "tags":["cats", "sunrises", "vacation"]}]

When imported, the data in the subarray needs to be a second table. I'm using a subdatasheet for illustration purposes

1710798062059.png
 
@GPGeorge - I'm not sure that statement is correct.
I am happy to be corrected, of course. Is there a reference where I can find the proper guidelines? I had to figure out a way to get around the missing keys in that specific case I posted above, but I'll stop whining about it if I am wrong on that point.
 
MDN - array

Description

In JavaScript, arrays aren't primitives but are instead Array objects with the following core characteristics:
  • JavaScript arrays are resizable and can contain a mix of different data types. (When those characteristics are undesirable, use typed arrays instead.)
  • JavaScript arrays are not associative arrays and so, array elements cannot be accessed using arbitrary strings as indexes, but must be accessed using nonnegative integers (or their respective string form) as indexes.
  • ...
(emphasis mine)

This is from the JS documentation, of which JSON is a textual representation.

In PHP you can have both keyed and un-keyed array items.
 

Users who are viewing this thread

Back
Top Bottom