ChatGPT

Gasman

Enthusiastic Amateur
Local time
Today, 23:45
Joined
Sep 21, 2011
Messages
17,469
I think we are safe for a while. :-)

1759485739026.png


And I just asked it myself.

Yes, 9.11 is greater than 9.9. While both numbers start with 9, the decimal places matter, and 9.11 is slightly larger than 9.9. :cool:
 
Len("9.11") > Len("9.9")
 
Your assumption is numeric 😁.
If I was less lazy I might try use the words “value of” or other testing.
 
You are not related to my old team leader are you?
She came back with 'If you are referring to American dates then 9.11 is greater than 9.9' :(
 
What happens if you ask ChatGPT "Is the number 9.11 bigger than 9.9"?
 
No, 9.11 is not greater than 9.9.


Here’s why:


  • 9.11 is actually less than 9.9, because:
    • 9.11 is the same as 9 and 11 hundredths (9.110)
    • 9.9 is the same as 9 and 9 tenths, or 9.900
 
Clearly in your original question, it assumed a string. Adding ‘the number’ was necessary to do what you wanted
 
Even without 'the number' it returns the correct answer



1759583172872.png
 
I was just repeating a test that was posted on Facebook.
Also if you read it's response, it actually states 'numerically' :)
 
This illustrates very well my experience with most AI LLMs. Going in, we assume a certain level of context and tend to think that the LLM will share that context. Sometimes yes, sometimes no. The thing is, LLMs lack contextual experience.

To explain the problem as I see it, I resort to something I learned many, many years ago studying for a degree in language teaching in college.
The concept is called rhetorical competence. It refers to the fact that a speaker learning a new language for the first time has to rely on word-for-word or phrase-for-phrase translation for meanings. However, native speakers also bring their entire experience to the task. So, while the explicit meanings of words or phrases are the same for both, the non-native speaker lacks what we called rhetorical competence, which is derived from that experience, to understand "the meaning behind the literal meaning.

The most famous example of this relies on remembering that before smart phones and similar devices, a lot of people wore wrist watches. Maybe you still have one in a drawer?

Anyway, setting the scene. A person standing at a bus stop. Another person walks up to the bus stop to join the first.

The first person says to the second, "Hey, do you have a wrist watch?"

The literal meaning of that question requires a yes or no answer. "Yes, I do have a watch." or "No, I do not have a watch."

And that is pretty much where LLMs are.

The rhetorically competent speaker of the language, however, puts the question in context. We're both here to catch a bus. Buses run on schedules. This person doesn't have a watch (otherwise why ask the question). This person wants to know what time it is because he's waiting for a specific bus."

So, the rhetorically correct answer is NOT, "Yes, I have a watch."

The correct answer is, "It's 10 minutes until the next bus arrives."

Someday LLMs will be able to interpret the rhetorical meaning of our questions, and not just the literal meaning. Until then, we have to be very precise. If you want them to compare strings to strings, tell them that. If you want them to compare numbers to numbers, tell then that.
 
I listened to a podcast the other day. Under investigation AI researchers had discovered that training data was so full of 9/11 media reports it was assuming 9/11 was a date and after 9/9. And of course because AI focuses on probability prediction when 9.11 is usually mentioned it is to reference the date.
 
@Gasman thanks for sharing this, it is indeed funny - but also a reminder of how the whole power of AI lies with the specificity of the prompt-er.
If you'd been more specific on "greater than", its answer might have been more to everyone's liking, since 9.11 IS 'greater than' 9.9 in a way. It's all about the prompts. And it IS a science and an art to work with it long enough to know when to prompt it in a certain way, when to even remind it of its own memory, how to separate chats and Projects in a way that leverages its memory in the precise way you'd prefer, etc. etc.

The lowest-level performance, IMO, comes from questions out of the blue. The highest-level performance that I see comes when working on related Projects and leveraging things it learns about your projects as you go. In those cases it can be absolutely genius. Its performance in these cases could easily replace a Business Analyst. A few minutes invested early on makes it now format my SQL better than even Redgate can in SQL Prompt, fully utilizing the settings for Formatting.

It's ALL about the prompt.

PS. I got the "right" answer, although I was hoping for the "wrong" one in order to demonstrate the quality of answers based on the prompt.
1759879148424.png
 

Users who are viewing this thread

Back
Top Bottom