Extracting paragraph from text

Ammarhm

Beginner User
Local time
Today, 08:57
Joined
Jul 3, 2008
Messages
80
Hi
Guess I am bombarding the forum with my questions :-)

Let us say I have a variable (Tx) that include a large amount of text divided into a no of paragraphs, a paragraph always ends with a return or enter ie Chr 13
Let us say one of those paragraphs starts with the word "Start", how can you extract the whole paragraph starting with "Start" from the variable Tx?
 
The following code should do the trick;
Code:
Dim startInt As Integer
Dim endInt As Integer
Dim outputtxStr As String

startInt = [URL="http://www.techonthenet.com/access/functions/string/instr.php"]Instr[/URL](1, Tx, "Start")
endInt = Instr(startInt, Tx, Chr(13))
outputtxStr = [URL="http://www.techonthenet.com/access/functions/string/mid.php"]Mid[/URL](Tx, startInt, endInt)
You can then assign the variable outputtxStr to a control or whatever else you wish to do with it.
 
The following code should do the trick;
Code:
Dim startInt As Integer
Dim endInt As Integer
Dim outputtxStr As String

startInt = [URL="http://www.techonthenet.com/access/functions/string/instr.php"]Instr[/URL](1, Tx, "Start")
endInt = Instr(startInt, Tx, Chr(13))
outputtxStr = [URL="http://www.techonthenet.com/access/functions/string/mid.php"]Mid[/URL](Tx, startInt, endInt)
You can then assign the variable outputtxStr to a control or whatever else you wish to do with it.

Thank you for your reply
I actually came up with the following code after doing some research, I am posting it here as it might be useful to someone

Public Sub ImportParagraphs()

Dim WordApp As Word.Application
Dim WordDoc As Word.Document
Dim OutPrg As String

Set WordApp = New Word.Application
Set WordDoc = WordApp.Documents.Open("C:\temp\mydoc.doc")

For i = 1 To WordDoc.Paragraphs.Count

If Left(WordDoc.Paragraphs(i).Range.Text, 5) = "Start" then OutPrg= WordDoc.Paragraphs(i).Range.Text

Next
WordApp.Quit
Set WordDoc = Nothing
Set WordApp = Nothing

End Sub


Regards
 
John's suggestion will extract from the first instance of the word "start" where the requirement was for paragraphs that begin with "start".

To fix this, maybe change the first line to:
startInt = Instr(1, Tx, Chr(13) & "Start")

However due to the nature of the LineFeed character you will have problems if the ends of the paragraph are not simply Chr(13), and they probably aren't.

In Windows the new line uses two characters in sequence, Chr(13) & Chr(10)
(Also in VBA an enumerated constant vbCrLf)

In Unix the new line is simply Chr(10)
(Also enumerated as vbLf)

You might need to use:
startInt = Instr(1, Tx, vbCrLf & "Start")

or
startInt = Instr(1, Tx, vbLf & "Start")

A further issue would be if the opening paragraph was the one you were targetting because it would not be preceded by the new paragraph character. You have to think of all possibilities when it comes to programming.

BTW The first argument of InStr is Optional and defaults to 1 so it can be omitted in that particular instance.
 
If it is just a txt document, opening Word is cracking a peanut with a sledge hammer.

BTW. Put code in a code box rather then using colour.
http://www.access-programmers.co.uk/forums/showthread.php?goto=newpost&t=200247

Sorry, I guess I was not very clear about the task I wanted to do
The problem is that I am trying to import data from over 1200 different word documents, so I am doing that through VBA, rather than doing it manually, and that is why I need to open those Word documents. I dont know if there is any other way to do it?
Thanks again
 
I decided to ditch my own solution and go with the solution suggested by John Big Booty for several reasons

The following code should do the trick;
Code:
Dim startInt As Integer
Dim endInt As Integer
Dim outputtxStr As String

startInt = [URL="http://www.techonthenet.com/access/functions/string/instr.php"]Instr[/URL](1, Tx, "Start")
endInt = Instr(startInt, Tx, Chr(13))
outputtxStr = [URL="http://www.techonthenet.com/access/functions/string/mid.php"]Mid[/URL](Tx, startInt, endInt)
You can then assign the variable outputtxStr to a control or whatever else you wish to do with it.

I think however that code should be modified, in the Mid(Tx, startInt, endInt) one should be giving the number of charachters to be extracted and not the last position of extraction
So my suggestion would be
Code:
Dim startInt As Integer
Dim endInt As Integer
Dim outputtxStr As String
Dim lgth as string


startInt = [URL="http://www.techonthenet.com/access/functions/string/instr.php"]Instr[/URL](1, Tx, "Start")
endInt = Instr(startInt, Tx, Chr(13))
lgth= endInt-startInt
outputtxStr = [URL="http://www.techonthenet.com/access/functions/string/mid.php"]Mid[/URL](Tx, startInt, lgth)
And just as Galaximo pointed out, I am running into several problems when using this code because of the differences in LineFeed character

When I use my own solution with the Document.Paragraphs solution things work perfectly, but this is seriously slowing down the performance of the database and I have already the whole content of the document imported into a variable, so I need something to search within this variable rather than going with the .Paragraphs propertyAnyone has any suggestion to make this work??
Regards
 
Make sure you are not opening a new Word application for each document. That is the most time consuming part of the process. Set a module wide or global variable as the Word Application Object and keep it open between documents. It is also probably faster if the application is run Not Visible.

Better to not import the whole text but process it in Word to just get the right paragraph. You can use the Search Method in Word to find the start term in the whole document. Then you only need to import the paragraph where it is found.

A bit left field but if the documents are in Word 2007 they are actually zip archives composed of many different parts holding the text and formatting separately. Open one with something like 7-zip and you can see the parts. You may be able to extract the text directly.

I haven't tried that exact process myself but I have successfully changed all the links in a Word 2007 document with a find and replace on the links component file of the archive.
 

Users who are viewing this thread

Back
Top Bottom