Array - testing neighboring elements (1 Viewer)

CedarTree

Registered User.
Local time
Today, 13:17
Joined
Mar 2, 2018
Messages
404
I have an array with thousands of items (each one is a word extracted from a document). I want to run some logic on the array, but the main goal is to focus on words that are only nearby to a key word. So, for example,

The quick brown FOX jumped over the lazy dog.

Let's say I want to focus on any word right next to "FOX" and ignore all other words. Currently, I run through the array one word at a time, and then have a sub-loop that tests the words right before and right after.

More precisely...

Code:
For iWord = 1 to 9
For iWord2=iWord-1 to iWord+1
If Array(iWord2)="FOX" then MARK THESE WORDS AS NOT IGNORABLE; OTHERWISE MARK THEM TO BE IGNORED LATER ON
Next iWord2
Next iWord

The problem is that with thousands of elements (and actually testing 5 words prior and 5 words after), the total iterations is a lot and takes too long currently.

Any suggestions? Basically I want to determine array elements are nearby to the key word ("fox") because ultimately I want to ignore all the other elements in a later loop.
 

NauticalGent

Ignore List Poster Boy
Local time
Today, 13:17
Joined
Apr 27, 2015
Messages
6,284
Good morning CedarTree,

Although you did a great job explaining your problem, I have NO idea on how to speed your process up. Assuming you are using Access to import the text into tables, maybe a query to search for the key word first - a preliminary sort before the process if you will.

Sorry I can’t be of any assistance with this...
 

theDBguy

I’m here to help
Staff member
Local time
Today, 10:17
Joined
Oct 29, 2018
Messages
21,358
Hi. Just a thought... If you had made your words as a delimited string instead of an array, you might be able to use regular expression to find the neighbor words to your keyword.
 

Micron

AWF VIP
Local time
Today, 13:17
Joined
Oct 20, 2018
Messages
3,476
Took me about 3 reads to get any sort of handle on this. Definitely an unusual request. The 'process later' part is kind of sketchy, so maybe this idea is of no use:
- similar to the 'sort' suggestion (maybe, because I'm not sure exactly what is meant by sorting) use Instr function to find the keyword, thus don't build arrays for strings where the keyword doesn't exist. At least I'm assuming that's what you're doing. Thus cut down on arrays. Process fewer arrays as before.

- consider using a collection instead, but still only for each word of a string where the keyword is found. You can find the ordinal position of the keyword in the collection, thus should be able to get the prior and following word quite quickly.
- consider the dictionary object over the collection. I've never used this so am sketchy as to whether or not it is any better than a collection considering your purpose. I think not, but I'm just throwing darts here.
That's all I've got.
 

vba_php

Forum Troll
Local time
Today, 12:17
Joined
Oct 6, 2019
Messages
2,884
Basically I want to determine array elements are nearby to the key word ("fox") because ultimately I want to ignore all the other elements in a later loop.
You know CedarTree, if you explain to us why exactly you are doing this, and possibly what business you're in and what your goal is by doing this, it might provide some insight into a better solution for you whether that be dbGuy's suggestion or another one suggested by me, which could possibly alter your entire process from start to finish and get you what you want.

care to share this info?
 

NauticalGent

Ignore List Poster Boy
Local time
Today, 13:17
Joined
Apr 27, 2015
Messages
6,284
Good morning (evening for you!) DBG, my first thought was RegEx too, but I couldn’t figure out a way to make it work. The thought of delimiting the text never occurred to me...

It would be interesting to see the difference in performance
 

plog

Banishment Pending
Local time
Today, 12:17
Joined
May 11, 2011
Messages
11,611
Currently, I run through the array one word at a time, and then have a sub-loop that tests the words right before and right after.

Code:
For iWord = 1 to 9
For iWord2=iWord-1 to iWord+1
If Array(iWord2)="FOX" then MARK THESE WORDS AS NOT IGNORABLE; OTHERWISE MARK THEM TO BE IGNORED LATER ON
Next iWord2
Next iWord

That's not what that code is doing. Because you are using Array(iWord2) your testing every element 3 times. It should be Array(iWord).
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 17:17
Joined
Sep 12, 2006
Messages
15,613
depending on the number of iterations required you might be able to come up with a recursive solution.
If you only want to examine one word either side this is probably not a worth doing.
However, if you might want to examine two or more words, depending on what you find, then it certainly will make the process easier.


sort of this. VBA deals with the manipulation of the data - all you have to find is a way of handling it. You get a very complex process in a few lines of code, but if the recursion is too deep, you can crash your stack and the process time can grow exponentially.

Code:
recursivearraycheck (x as long)

   'you need a way to terminate the recursion
   'eg, if you examine the left and right neighbours of the target word, then you don't want to examine 
   'the right neighbour of the left neighbour, or the recursion will never end. 
   'You may only want to examine successively left or right - not mix left and right

   if stopcondition here then exit sub

   if wordarray(x-1)  …. then
           recursivearraycheck (x-1)
   end if
   if wordarray(x+1)  …. then
           recursivearraycheck (x+1)
   end if
next

sub main
     analyse the sentence into an array of words
     find the position of the word you want eg FOX
     'start the recursion
     recursivearraycheck (TargetWordPosition)
end sub
 
Last edited:

NauticalGent

Ignore List Poster Boy
Local time
Today, 13:17
Joined
Apr 27, 2015
Messages
6,284
CedarTree...The suspense is killing me. How is it going?
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 12:17
Joined
Feb 28, 2001
Messages
26,999
First, this is automatically a slow process because VBA isn't compiled fully - it gets built in pcode (pseudo-code) which is then run through an emulation process. So VBA code is probably running anywhere from 3 to 300 times slower than compiled code depending on what you just compiled.

Second, doing this through SQL won't work faster because SQL assumes you have a disk-based data file. Adding a disk operation for every word would be worse.

Third, you inherently limit yourself by trying to build an array because Access has to be careful in its use of memory. With all the DLL files that have to be mapped plus Access itself, and with various forms that could occupy dynamic memory, you are potentially straining the virtual memory limits of your process. (Not the system - the process!)

I think this is a case where Access is the wrong tool. You might want to consider some language that actually compiles true code, such as VB or C++, so that you drop the need for pcode emulation. Such languages can still use DLLs but probably would use fewer such modules. That might give you the fastest possible execution.

Then there is the issue that you are looking at word adjacency, some sort of keyword search. Here is my question: Do you know ahead of time what word or words you are interested in? If so, you are building the arrays and then searching them. But why build a whole array that might not even contain what you seek?

My thought is that you can do some sort of "ripple" search. I'm not about to fake the code here, but...

1. Build an array of 11 string holders initialized to zero-length strings.
2. Open the file for input in a text-oriented style.
3. Parse the file for separators so you can identify the individual words.
4. Start looping through the file by loading the first six words of the file to elements 6-11.
5. NOW look at word 6. If it is the keyword, look at the adjacent elements. Do your thing.
6. REGARDLESS of whether you did your thing or not, ripple the elements down one. Word 6 becomes word 5, word 7 becomes word 6, etc. When a word reaches word 1, it gets dropped. The next word you read becomes the new slot 11 word.
Now repeat the loop of steps 5 and 6 until you reach the end condition where the last word of the file is in slot 6 and you have been rippling zero-length strings for the last five iterations.

I'm not going to say I have a great parser, and a lot of text parser modules are out there, but I would do it the way I suggested.

There is also the possibility that you could use Word as an app object, though it would be slow also. You CAN do such a thing as a keyword search after which you would look at words as elements of a paragraph, for which the individual words have indexes. If you find the keyword, you can look for words with indexes +1 to +5 and -1 to -5 added to the main index.

The reason I suggested WORD is that they will build the collection for you, but that code is true-compiled and if you are searching for a keyword, that is ALSO a compiled routine. Then the only emulation relates to what you do when you find a keyword.


This MIGHT help you
 

Cronk

Registered User.
Local time
Tomorrow, 04:17
Joined
Jul 4, 2013
Messages
2,770
In the OP
I have an array with thousands of items (each one is a word extracted from a document)


I suspect that the operation might be significantly faster if the individual words were extracted into an indexed table with an autonumber index and another field to indicate "ignorable".


Queries to filter instances of the key word and then the adjoining records surely must be faster than VBA on an array.
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 17:17
Joined
Sep 12, 2006
Messages
15,613
@OP

are you anywhere near sorting this?
can you give a clearer example of what you are trying to do.
 

jdraw

Super Moderator
Staff member
Local time
Today, 13:17
Joined
Jan 23, 2006
Messages
15,364
CedarTree,
I think this is one of those cases where a sample of your input and expected outputs would clarify the requirement. Step back and, using your "FOX" example, show us what you expect as a result.
I'm with Dave -unsure of what you are trying to do.
 

Users who are viewing this thread

Top Bottom