Reading a file byte by byte - faster

evertVB

Registered User.
Local time
Today, 12:59
Joined
Sep 21, 2010
Messages
21
I'm using VBA in order to read through a file byte by byte.
Ultimately I want to remove some invalid characters from a textfile.
Code:
[SIZE=2]Option Compare Database[/SIZE]
[SIZE=2]Option Explicit[/SIZE]
[SIZE=2] [/SIZE]
[SIZE=2]Sub TestFunction()[/SIZE]
[SIZE=2]MsgBox "fn_ReadWriteStream started"[/SIZE]
[SIZE=2]If fn_ReadWriteStream("C:\Project\Utilities\Access en VB\ReadWriteStream\Macro_mcr_all_qry.txt") = True Then[/SIZE]
[SIZE=2]MsgBox "fn_ReadWriteStream succeeded"[/SIZE]
[SIZE=2]Else[/SIZE]
[SIZE=2]MsgBox "fn_ReadWriteStream failed"[/SIZE]
[SIZE=2]End If[/SIZE]
[SIZE=2]End Sub[/SIZE]
[SIZE=2]Public Function fn_ReadWriteStream(pFileName As String) As Boolean[/SIZE]
[SIZE=2]Dim fname As String[/SIZE]
[SIZE=2]Dim fname2 As String[/SIZE]
[SIZE=2]Dim fnr As Integer[/SIZE]
[SIZE=2]Dim fnr2 As Integer[/SIZE]
[SIZE=2]Dim tstring As String * 1[/SIZE]
[SIZE=2]Dim i As Integer[/SIZE]
 
 
[SIZE=2]fn_ReadWriteStream = False[/SIZE]
 
[SIZE=2]fname = pFileName[/SIZE]
[SIZE=2]fname2 = pFileName & ".clean.txt"[/SIZE]
 
[SIZE=2]fnr2 = FreeFile()[/SIZE]
[SIZE=2]Open fname2 For Binary Lock Read Write As #fnr2[/SIZE]
[SIZE=2]fnr = FreeFile()[/SIZE]
[SIZE=2]Open fname For Binary Access Read As #fnr[/SIZE]
[SIZE=2]Do[/SIZE]
[SIZE=2]Get #fnr, , tstring[/SIZE]
[SIZE=2]If EOF(fnr) Then Exit Do[/SIZE]
 
[SIZE=2]If Asc(tstring) = 254 Or _[/SIZE]
[SIZE=2]Asc(tstring) = 255 Or _[/SIZE]
[SIZE=2]Asc(tstring) = 0 Then[/SIZE]
[SIZE=2]Else[/SIZE]
[SIZE=2]Put #fnr2, , tstring[/SIZE]
[SIZE=2]End If[/SIZE]
 
[SIZE=2]Loop[/SIZE]
[SIZE=2]Close #fnr[/SIZE]
[SIZE=2]Close #fnr2[/SIZE]
 
[SIZE=2]fn_ReadWriteStream = True[/SIZE]
[SIZE=2]End Function[/SIZE]
This code works fine, but it is very slow for files exceeding 100MB.
Is there a way to make this work faster? I suspect that one should read blocks of multiple bytes, but then what would the code look like?
 
for a textfile with cr/lfs, i would tend to read the file a line at a time, clean lines, and write them back.

i just wonder whether reading chr(0) in particular might cause an issue though, as it may be interpreted as a line terminator
 
My file looked like this in hex edit mode:

attachment.php


I usually start with FileSystemObject.OpenTextFile and ReadLine, but for some reason this just didn't work here.
 
ok

i think this is all unicode text. 2 bytes per char.
I am not sure exactly what you are trying to do, but i expect you can't just strip out the chr(0)

the end of lines are the 0D, 0A pairs chr(13), chr(10)
 
My file looked like this in hex edit mode:

attachment.php


I usually start with FileSystemObject.OpenTextFile and ReadLine, but for some reason this just didn't work here.

The first word (hFEFF) of the file is a Byte Order Mark indicating the file is a UTF-16 Unicode file. Depending on what is in the file you just stripping the zeros may produce some strange result. You might need a translation table (like the one provided in the second link above) to convert the contents to an ASC file. The thing to remember when reading unicodes is that they are stored as "overloaded bytes", i.e. bytes are read as words. If there are no special characters, Use the AscW function to read individual bytes i.e.
Code:
 n (long) = AscW(Mid(inputStr, x,1)  : if n > 0 And n < 256 then Mid(outputStr, y,1) = Chr(n)

As for the chunks read at one time you can read in the LOF() when you open the IO file and decide the segments you want to use, truncating the last one as you need. (You need the check if the LOF takes the overloaded or natural length of the file. I don't know.) You can read into strings greater than 255 bytes.

Best,
Jiri
 

Users who are viewing this thread

Back
Top Bottom