niki
axes noob 'll b big 1 day
- Local time
- Today, 13:35
- Joined
- Apr 17, 2003
- Messages
- 66
text recognition query
Hello,
I created a database using an HTML parser. Due to the fact that parsers don't work on text recognition but on HTML architectural analysis, I had to extract full parts of HTML pages. My problem is that in this full part on ly a small and precise part is interesting for my databse work. Indeed this part easily indentifiable because there's a specific word at the start of this chain of text.
For example, the field to be filtered contains this text value:
"Organisation details field"
"Name: Pacific Northwest National LaboratoryAddress: 902 Battelle Blvd PO Box 999Richland, WA 99352UNITED STATESType: Consultancy; Research; Non CommercialNumber of Employees: > 500Details: The Pacific Northwest National Laboratory (http://www.pnl.gov/main/welcome/index.html) is operated by Battelle Memorial Institute for the United States Department of Energy. At Pacific Northwest, we deliver breakthrough science and technology to meet key national needs. We also apply our capabilities to meet selected environmental, energy, health and national security objectives, strengthen the economy, and support the education of future scientists and engineers.Turnover: 478 million euroKeywords: ; Phontonics; Optics; Chalcogenide; Microstructurual Characterization; Properties "
And I want this field (which will be renamed keywords after the filtering operation) to contain only this text value :" ; Phontonics; Optics; Chalcogenide; Microstructurual Characterization; Properties".
I assume that I have to use the wizard and configure it to search for the text value "keywords" through the "Organisation details field", but for the life of me I can't figure out how to do it. I'd like to know what is the function or the trick to do that ?
thanks for your help!
nico

Hello,
I created a database using an HTML parser. Due to the fact that parsers don't work on text recognition but on HTML architectural analysis, I had to extract full parts of HTML pages. My problem is that in this full part on ly a small and precise part is interesting for my databse work. Indeed this part easily indentifiable because there's a specific word at the start of this chain of text.
For example, the field to be filtered contains this text value:
"Organisation details field"
"Name: Pacific Northwest National LaboratoryAddress: 902 Battelle Blvd PO Box 999Richland, WA 99352UNITED STATESType: Consultancy; Research; Non CommercialNumber of Employees: > 500Details: The Pacific Northwest National Laboratory (http://www.pnl.gov/main/welcome/index.html) is operated by Battelle Memorial Institute for the United States Department of Energy. At Pacific Northwest, we deliver breakthrough science and technology to meet key national needs. We also apply our capabilities to meet selected environmental, energy, health and national security objectives, strengthen the economy, and support the education of future scientists and engineers.Turnover: 478 million euroKeywords: ; Phontonics; Optics; Chalcogenide; Microstructurual Characterization; Properties "
And I want this field (which will be renamed keywords after the filtering operation) to contain only this text value :" ; Phontonics; Optics; Chalcogenide; Microstructurual Characterization; Properties".
I assume that I have to use the wizard and configure it to search for the text value "keywords" through the "Organisation details field", but for the life of me I can't figure out how to do it. I'd like to know what is the function or the trick to do that ?
thanks for your help!
nico



Last edited: