Fast way to cut a long string

Guus2005

AWF VIP
Local time
Today, 13:41
Joined
Jun 26, 2007
Messages
2,642
i have a long string of about 1500 characters.
i need to cut it in 15 pieces

First from 13 length 12
Second from 25 length 200
Third from 290 length 34
Fourth from 450 length 10
Fifth from 460 length 2
etc.

I know that there is a way to use mid or mid$ ofcourse i don't want to use the mid function.
I have 90K lines to read and the import functionality doesn't do what i want. Skipping a number of lines etc.

So I want to use a pointer. Which should be much faster IMHO. Like in C.
Is there a way to use a pointer in VBA?

Thanks!
 
i have a long string of about 1500 characters.
i need to cut it in 15 pieces

First from 13 length 12
Second from 25 length 200
Third from 290 length 34
Fourth from 450 length 10
Fifth from 460 length 2
etc.

I know that there is a way to use mid or mid$ ofcourse i don't want to use the mid function.
I have 90K lines to read and the import functionality doesn't do what i want. Skipping a number of lines etc.

So I want to use a pointer. Which should be much faster IMHO. Like in C.
Is there a way to use a pointer in VBA?

Thanks!
I seem to recall a Partition() function that might be applicable. Let's see who finds a reference to it first, and if it does apply.
 
I seem to recall a Partition() function that might be applicable. Let's see who finds a reference to it first, and if it does apply.
Partition() returns consistent segments, so it might have to be nested to get the variable lengths you want. A function using Mid() might be more accessible as a way to parse the string into uneven chunks.
 
Can you use a fixed-width import instead, that would Shirely be the quickest method?
Then just skip the records you don't want?
 
I have 90K lines to read and the import functionality doesn't do what i want.
What is the data source exactly, and where are the pieces going? When you say import, I'm thinking of a fixed-length text file that could be made readable (subject to actual structure) via the import specification.
 
Is there a way to use a pointer in VBA?
Yes, but...

The String data type in VBA is Unicode and thus deals with all the (potentially) different characters sets and their representation internally. - Handling strings with the built-in string functions is not the most performance approach, but hides away all the potentially complexity.

If you work with strings by using pointers to memory, its all raw bytes and it is up to you handle all complexity yourself. - I very much doubt that this will be worth your while.

Nonetheless, there is an old article in the Visual Basic Journal about a CStrings class for VB6, which can be applied to VBA as well. Unfortunately, I'm unable to find the source code for the CStrings class mentioned in that article. - Maybe you've got better luck.

PS: You might also be interested in my videos on strings with the Win32-API, which cover some basics on the matter.
 
Is there a way to use a pointer in VBA?

It is theoretically possible to build a pointer into a contiguous array but not in VBA. That text pointer (which is a type of descriptor) isn't an intrinsic data type. You CAN build a pointer and diddle with it using some WinAPI functions, but it is not a native action for VBA.

Another complication is that when dealing with WinAPI functions you don't have strings that are identical to Access strings. They are more like C strings - strings ending with an ASCII NUL character, whereas Access strings allow the NUL character to exist in a string.
 
There is a pointer called StrPtr (Google it, plenty of links) which points at a string in memory - see if that meets your needs, however I doubt it will resolve whatever issues you are having with the mid function.

we could be more helpful if you provided an example string, the output required and the code you tried so we can see what the issue is. Perhaps it is hidden characters? HTML? In which case using the plaintext function might tidy things up
 
You might consider this older response from Ken Sheridan to a similar question
or set up a Type/End Type construct and assign the field values from your long text (but this would require Mid...).

Some sample data would be helpful.
 
So I want to use a pointer. Which should be much faster IMHO. Like in C.
Is there a way to use a pointer in VBA?
it is "somewhat" the same concept of what i will show you.
you will need a "user-defined Type" for your Structure.

the following is based on the initial structure you showed on post#1.

create two udt:

Code:
Private Type typTextLine
    s00Line As String * 12  'you can dispose this 12 characters
    s01Line As String * 12
    s02Line As String * 200
    s03Line As String * 34
    s04Line As String * 10
    s05Line As String * 2
    ' add more lines as you wish
End Type

Private Type typAllText
    sText As String * 270   '270 is the total from typTextLine
End Type

for complete demo see sub Test on Module1 on the attached db.
this techique is very very fast.
 

Attachments

I don't want to use the mid function because every time you use it the complete string (1500 char) is needed as input and then a string is returned from the 13th character with length of 12 characters.
Then for the second i need the from 25th over a length of 200 characters.

You can imagine that each time you use the mid function it starts at the beginning. Which is logical. But if i could use a pointer it would already be at the 25th character. Which saves time.

Thanks for all the responses. I'll let you know what i come up with.
 
You can imagine that each time you use the mid function it starts at the beginning.

I wonder if you are looking at this incorrectly. If you used the typical pointer systems or built a formal data descriptor, it would need exactly the same elements that are used in MID - that would be the string (to provide a base address), a starting offset, and a length. There is no reason to believe that it would take longer to begin a MID-style string extraction from the 1st character or the 1000th character.

The function simply adds the offset to the string's base address to get the starting address of the sub-string and then extracts bytes for the given length. We don't know how the extraction part works, but I do not see a difference in times based on different starting addresses or offsets. Have you seen something that makes you believe that it DOESN'T do what I just described?

OK, Access doesn't publish its code so there is that "black box" effect - but we don't have to assume a poor choice of string manipulation algorithms in a product as mature as Access. By now they have had time to optimize the code to squeeze the maximum amount of speed out of what they do.
 
Ha!
Agree with both Pat and the Doc on this point-er. I see no reason to reinvert the wheel and would just use Mid or possibly Split depending on how the original string is constructed.
I'll leave this one to @ebs17 if he's interested in testing it 😎
 
As I understand it, the costliest operation manipulating a VBA string variable is changing its size. Consider the following code, and note how slow, relatively, concatenation is using the VBA ampersand concatenation operator...
Code:
Sub AmpersandVsMidConcat()
    Const MAX As Long = 50000
    Const GRW As Integer = 1000
    Const STR As String = "0123456789"
    
    Dim t1 As String
    Dim t2 As String
    Dim pos As Long
    Dim length As Long
    
    Dim i As Long
    Dim clock As Single
    
    clock = Timer
    For i = 0 To MAX
        t1 = t1 & STR
    Next
    Debug.Print VBA.Format(Timer - clock, "0.0000") & " VBA & Concat"
        
    clock = Timer
    For i = 0 To MAX
        length = Len(STR)
        ' test if a new char will fit on t2, and if not, pad with spaces
        If pos + length > Len(t2) Then t2 = t2 & VBA.Space$(GRW)
        Mid$(t2, pos + 1, length) = STR
        pos = pos + length
    Next
    t2 = Trim(t2)
    Debug.Print VBA.Format(Timer - clock, "0.0000") & " Mid$() Concat"
End Sub
In the first block we expand the size of t1 in ten-character chunks. VBA is inefficient doing this expansion.

In the second block we expand t2 in 1000-character chunks, and then use Mid$() to replace the spaces with the string we are appending, and finally, Trim() the excess spaces at the end.

Conclusions: Mid() is fast. Also, if you are building a long string by concatenation, particularly in small chunks, you can realize a considerable speed advantage by expanding the size of the string in big chunks, and then swap in the characters you are appending using Mid().
 
I have 90K lines to read and the import functionality doesn't do what i want. Skipping a number of lines etc.
90K lines speak for healthy mass data processing, not necessarily for serial individual processing. Excavator instead of sand shovel - if the import needs to be done quickly. An import is more than a pointer wobble.
 
I'm with Minty.

Used Fix-length. It's built in to Access's brain and will be faster than any RBAR code solution, and less muddy, too
 
The most expensive string function is the one that produces the most strings as variables. Taking a string apart one character at a time and storing each character separately causes the following steps for EACH character: (1) Allocate memory for the string descriptor. (2) Allocate memory for the actual (sub) string to be created. (3) Extract the sub-string into the allocated slot. (4) Update the string-length slot in the descriptor. NOTE that steps 1 and 2 must allocate a new string area each time and cannot re-use previously allocated memory due to the possibility that each new string is potentially longer than the previous one. (Its a long tale of woe. You don't want to hear it.)

But the worst part isn't that memory allocation. It is the part where you are done with the strings and have to release them. The problem is that as variables, they usually are stored in the programming structure called the HEAP and occupy that space for the remainder of the session. If you do this enough you see a message about "HEAP OVERFLOW" or "STACK / HEAP COLLISION" or some other similar message that has changed over the years but still means "you blew out your heap work space." The message you get varies depending on which facility (or even which library) sensed the overlap of the stack and heap address range.
 

Users who are viewing this thread

Back
Top Bottom