Fast way to cut a long string (1 Viewer)

Guus2005

AWF VIP
Local time
Today, 09:50
Joined
Jun 26, 2007
Messages
2,641
i have a long string of about 1500 characters.
i need to cut it in 15 pieces

First from 13 length 12
Second from 25 length 200
Third from 290 length 34
Fourth from 450 length 10
Fifth from 460 length 2
etc.

I know that there is a way to use mid or mid$ ofcourse i don't want to use the mid function.
I have 90K lines to read and the import functionality doesn't do what i want. Skipping a number of lines etc.

So I want to use a pointer. Which should be much faster IMHO. Like in C.
Is there a way to use a pointer in VBA?

Thanks!
 

GPGeorge

Grover Park George
Local time
Today, 00:50
Joined
Nov 25, 2004
Messages
1,877
i have a long string of about 1500 characters.
i need to cut it in 15 pieces

First from 13 length 12
Second from 25 length 200
Third from 290 length 34
Fourth from 450 length 10
Fifth from 460 length 2
etc.

I know that there is a way to use mid or mid$ ofcourse i don't want to use the mid function.
I have 90K lines to read and the import functionality doesn't do what i want. Skipping a number of lines etc.

So I want to use a pointer. Which should be much faster IMHO. Like in C.
Is there a way to use a pointer in VBA?

Thanks!
I seem to recall a Partition() function that might be applicable. Let's see who finds a reference to it first, and if it does apply.
 

GPGeorge

Grover Park George
Local time
Today, 00:50
Joined
Nov 25, 2004
Messages
1,877
I seem to recall a Partition() function that might be applicable. Let's see who finds a reference to it first, and if it does apply.
Partition() returns consistent segments, so it might have to be nested to get the variable lengths you want. A function using Mid() might be more accessible as a way to parse the string into uneven chunks.
 

Minty

AWF VIP
Local time
Today, 08:50
Joined
Jul 26, 2013
Messages
10,371
Can you use a fixed-width import instead, that would Shirely be the quickest method?
Then just skip the records you don't want?
 

ebs17

Well-known member
Local time
Today, 09:50
Joined
Feb 7, 2020
Messages
1,949
I have 90K lines to read and the import functionality doesn't do what i want.
What is the data source exactly, and where are the pieces going? When you say import, I'm thinking of a fixed-length text file that could be made readable (subject to actual structure) via the import specification.
 

sonic8

AWF VIP
Local time
Today, 09:50
Joined
Oct 27, 2015
Messages
998
Is there a way to use a pointer in VBA?
Yes, but...

The String data type in VBA is Unicode and thus deals with all the (potentially) different characters sets and their representation internally. - Handling strings with the built-in string functions is not the most performance approach, but hides away all the potentially complexity.

If you work with strings by using pointers to memory, its all raw bytes and it is up to you handle all complexity yourself. - I very much doubt that this will be worth your while.

Nonetheless, there is an old article in the Visual Basic Journal about a CStrings class for VB6, which can be applied to VBA as well. Unfortunately, I'm unable to find the source code for the CStrings class mentioned in that article. - Maybe you've got better luck.

PS: You might also be interested in my videos on strings with the Win32-API, which cover some basics on the matter.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 02:50
Joined
Feb 28, 2001
Messages
27,194
Is there a way to use a pointer in VBA?

It is theoretically possible to build a pointer into a contiguous array but not in VBA. That text pointer (which is a type of descriptor) isn't an intrinsic data type. You CAN build a pointer and diddle with it using some WinAPI functions, but it is not a native action for VBA.

Another complication is that when dealing with WinAPI functions you don't have strings that are identical to Access strings. They are more like C strings - strings ending with an ASCII NUL character, whereas Access strings allow the NUL character to exist in a string.
 

Gasman

Enthusiastic Amateur
Local time
Today, 08:50
Joined
Sep 21, 2011
Messages
14,319
I know that there is a way to use mid or mid$ ofcourse i don't want to use the mid function.
May we ask why? :(
 

CJ_London

Super Moderator
Staff member
Local time
Today, 08:50
Joined
Feb 19, 2013
Messages
16,619
There is a pointer called StrPtr (Google it, plenty of links) which points at a string in memory - see if that meets your needs, however I doubt it will resolve whatever issues you are having with the mid function.

we could be more helpful if you provided an example string, the output required and the code you tried so we can see what the issue is. Perhaps it is hidden characters? HTML? In which case using the plaintext function might tidy things up
 

jdraw

Super Moderator
Staff member
Local time
Today, 03:50
Joined
Jan 23, 2006
Messages
15,379
You might consider this older response from Ken Sheridan to a similar question
or set up a Type/End Type construct and assign the field values from your long text (but this would require Mid...).

Some sample data would be helpful.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 03:50
Joined
Feb 19, 2002
Messages
43,302
So I want to use a pointer.
Think about it. Every time the Access app opens it gets loaded at a different memory address. That provides the base register value. The p-code resolves the starting address for the string at runtime because it knows where in the string of the loaded program some variable starts. Then you have a second offset value to define the starting position for each part of the string. That provides the address for each pointer.

When you use the Mid() function, you are providing that second offset to come up with the same memory address as the pointer.

Pointers work with a base and displacement concept because programs are always loaded into different addresses in memory depending on what is already running. So, the load address provides the base. Then throughout the code, the displacement is added to the base to find the memory address you need.

Keep in mind that Access runs from p-code. It is not an executable so everything is relative anyway. I think you are wasting your time looking for some kind of "efficiency" by using a pointer but have fun in the rabbit hole.
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 15:50
Joined
May 7, 2009
Messages
19,247
So I want to use a pointer. Which should be much faster IMHO. Like in C.
Is there a way to use a pointer in VBA?
it is "somewhat" the same concept of what i will show you.
you will need a "user-defined Type" for your Structure.

the following is based on the initial structure you showed on post#1.

create two udt:

Code:
Private Type typTextLine
    s00Line As String * 12  'you can dispose this 12 characters
    s01Line As String * 12
    s02Line As String * 200
    s03Line As String * 34
    s04Line As String * 10
    s05Line As String * 2
    ' add more lines as you wish
End Type

Private Type typAllText
    sText As String * 270   '270 is the total from typTextLine
End Type

for complete demo see sub Test on Module1 on the attached db.
this techique is very very fast.
 

Attachments

  • dbStructure.accdb
    420 KB · Views: 97

Guus2005

AWF VIP
Local time
Today, 09:50
Joined
Jun 26, 2007
Messages
2,641
I don't want to use the mid function because every time you use it the complete string (1500 char) is needed as input and then a string is returned from the 13th character with length of 12 characters.
Then for the second i need the from 25th over a length of 200 characters.

You can imagine that each time you use the mid function it starts at the beginning. Which is logical. But if i could use a pointer it would already be at the 25th character. Which saves time.

Thanks for all the responses. I'll let you know what i come up with.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 02:50
Joined
Feb 28, 2001
Messages
27,194
You can imagine that each time you use the mid function it starts at the beginning.

I wonder if you are looking at this incorrectly. If you used the typical pointer systems or built a formal data descriptor, it would need exactly the same elements that are used in MID - that would be the string (to provide a base address), a starting offset, and a length. There is no reason to believe that it would take longer to begin a MID-style string extraction from the 1st character or the 1000th character.

The function simply adds the offset to the string's base address to get the starting address of the sub-string and then extracts bytes for the given length. We don't know how the extraction part works, but I do not see a difference in times based on different starting addresses or offsets. Have you seen something that makes you believe that it DOESN'T do what I just described?

OK, Access doesn't publish its code so there is that "black box" effect - but we don't have to assume a poor choice of string manipulation algorithms in a product as mature as Access. By now they have had time to optimize the code to squeeze the maximum amount of speed out of what they do.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 03:50
Joined
Feb 19, 2002
Messages
43,302
I don't want to use the mid function because every time you use it the complete string (1500 char) is needed as input and then a string is returned from the 13th character with length of 12 characters.
Then for the second i need the from 25th over a length of 200 characters.
Do you have some data that supports your premise that strings are not handled efficiently with VBA using Mid()? Or are you trying to impose a structure from a different language on VBA forgetting that language works differently.

I will point out again that VBA is interpreted so the compile is only partial. It gets you to p-code but not to anything executable. The VBA interpreter still needs to "compile" each instruction to make it executable. In a compiled program, the pointer could be resolved at load time. VBA is not compiled. YOU can't change that no matter how much you want to. That means that even if you can define a pseudo pointer, it starts with the address of the string and you must calculate the offset to get to the start of the element you want to extract. Not sure how that is different from how Mid() works. At runtime, it gets you to the start of the element you want to extract.
 

isladogs

MVP / VIP
Local time
Today, 08:50
Joined
Jan 14, 2017
Messages
18,239
Ha!
Agree with both Pat and the Doc on this point-er. I see no reason to reinvert the wheel and would just use Mid or possibly Split depending on how the original string is constructed.
I'll leave this one to @ebs17 if he's interested in testing it 😎
 

MarkK

bit cruncher
Local time
Today, 00:50
Joined
Mar 17, 2004
Messages
8,183
As I understand it, the costliest operation manipulating a VBA string variable is changing its size. Consider the following code, and note how slow, relatively, concatenation is using the VBA ampersand concatenation operator...
Code:
Sub AmpersandVsMidConcat()
    Const MAX As Long = 50000
    Const GRW As Integer = 1000
    Const STR As String = "0123456789"
    
    Dim t1 As String
    Dim t2 As String
    Dim pos As Long
    Dim length As Long
    
    Dim i As Long
    Dim clock As Single
    
    clock = Timer
    For i = 0 To MAX
        t1 = t1 & STR
    Next
    Debug.Print VBA.Format(Timer - clock, "0.0000") & " VBA & Concat"
        
    clock = Timer
    For i = 0 To MAX
        length = Len(STR)
        ' test if a new char will fit on t2, and if not, pad with spaces
        If pos + length > Len(t2) Then t2 = t2 & VBA.Space$(GRW)
        Mid$(t2, pos + 1, length) = STR
        pos = pos + length
    Next
    t2 = Trim(t2)
    Debug.Print VBA.Format(Timer - clock, "0.0000") & " Mid$() Concat"
End Sub
In the first block we expand the size of t1 in ten-character chunks. VBA is inefficient doing this expansion.

In the second block we expand t2 in 1000-character chunks, and then use Mid$() to replace the spaces with the string we are appending, and finally, Trim() the excess spaces at the end.

Conclusions: Mid() is fast. Also, if you are building a long string by concatenation, particularly in small chunks, you can realize a considerable speed advantage by expanding the size of the string in big chunks, and then swap in the characters you are appending using Mid().
 

ebs17

Well-known member
Local time
Today, 09:50
Joined
Feb 7, 2020
Messages
1,949
I have 90K lines to read and the import functionality doesn't do what i want. Skipping a number of lines etc.
90K lines speak for healthy mass data processing, not necessarily for serial individual processing. Excavator instead of sand shovel - if the import needs to be done quickly. An import is more than a pointer wobble.
 

Isaac

Lifelong Learner
Local time
Today, 00:50
Joined
Mar 14, 2017
Messages
8,779
I'm with Minty.

Used Fix-length. It's built in to Access's brain and will be faster than any RBAR code solution, and less muddy, too
 

Users who are viewing this thread

Top Bottom