xsharp.eu • Reading Very Large files
Page 1 of 2

Reading Very Large files

Posted: Thu Jun 16, 2022 11:37 am
by Plummet
Hi All,
When processing a VLF in VO, I find that number of lines output is less than number of lines input.

I wrote some code below to read a file line by line and return the line count.

FUNCTION Start()
LOCAL cInputFile AS STRING
LOCAL nInputLineCount AS DWORD
cInputFile := K_FILE_PATH
? cInputFile
IF File(cInputFile)
? "Will count lines ..."
WAIT
nInputLineCount := GetLineCount(cInputFile)
? "Lines=", nInputLineCount
ELSE
? cInputFile, "File not found"
ENDIF
WAIT
RETURN NIL

FUNCTION GetLineCount(cFile AS STRING) AS DWORD PASCAL
// assume file exists
LOCAL pFile AS PTR
LOCAL nCount AS DWORD
pFile := FOpen(cFile, FO_READ)
IF pFile == F_ERROR
? DosErrString(FError())
ENDIF
DO WHILE ! FEof(pFile)
FGetS2(pFile, 1024)
++nCount
ENDDO
FClose(pFile)
RETURN nCount

DEFINE K_FILE_PATH := "EPD_202203.csv"


The input file has size of 6,522,309,040 and 17,938,549 lines.
This code run in VO gives 11,816,979 lines!
This code run in X# gives the correct answer 17,938,549 lines.

Can anyone pls explain why VO will not read the entire file? Is it a bug in VO runtime or a limit in the WIN32 API functions?

You can find the actual data here if you want to test. Make sure you d/load the ZIP format!
https://opendata.nhsbsa.net/dataset/eng ... 4540a962fd

This post is linked to my other post on the Macro compiler. If I can solve one of the problems I can forget the other :)

Don

Reading Very Large files

Posted: Thu Jun 16, 2022 11:57 am
by OhioJoe
Two suggestions:

1. If this code has worked before, then it's probably the input file. Try saving the file with an editor that enforces DOS line terminators: CHR(13) + CHR(10).

2. Use FReadLine() instead of FGet(). The instructions say there's no difference but there might be.

Reading Very Large files

Posted: Thu Jun 16, 2022 12:59 pm
by Plummet
Thanks Joe.
This file has standard line terminators.
Problem only happens with VLF's, > 11m lines?
Think I already tried FReadline ... will check anyway.
You can check this code on any file - just change the K_FILE_PATH value to point to your data.
Don

Reading Very Large files

Posted: Thu Jun 16, 2022 1:37 pm
by robert
Don,

Are there lines in the file with line length > 1024?

Robert

Reading Very Large files

Posted: Thu Jun 16, 2022 5:26 pm
by Plummet
Thanks for your reply Robert.

Line lengths variable (csv) but seem to be < 512, although dunno if there's a longy hidden somewhere . Difficult to look thru a file of 6gb ...

However the same code run in X# gives the correct answer of 17,938,549 lines.
Don

Reading Very Large files

Posted: Fri Jun 17, 2022 6:52 am
by Chris
Don,

It's very easy to check that with X#. Just use System.IO.File.ReadAllLines() in a small test app and then check the length of each line returned in the array.
Of course you'll need to have enough memory in your system for this simple way to work! And compile in AnyCPU/x64 mode...

.

Reading Very Large files

Posted: Fri Jun 17, 2022 8:18 am
by Plummet
Thanks a lot for your reply Chris.

Well I didn't want to read the file into memory as there is a hell of a lot of it!
But I would like to know why, reading line by line, I was unable to get past 11m lines with VO. Is the blockage in the VO runtime or the underlying WIN32 API functions?

Anyway, it's not really important now as Robert helped fix my macro problem, so I have successfully processed all 17m lines in X# - yess. It took about 30 mins (9 yr old pc)
Thanks all for rapid response -
Don

Reading Very Large files

Posted: Fri Jun 17, 2022 10:23 am
by Chris
Hi Don,

I didn't mean to do this in your real app! :) I only suggested to do it in a small 10 line test app, just to find out if your file contains large lines.
But it's not important anymore, only maybe if you wanted to do it out of curiosity.

.

Reading Very Large files

Posted: Fri Jun 17, 2022 3:58 pm
by ArneOrtlinghaus
The problem in VO is perhaps related to the size of the file. More than 4 GB needs larger address pointers than a DWORD. I remember the old and famous PKZIP that had also limits with the file size.

Arne

Reading Very Large files

Posted: Fri Jun 17, 2022 4:26 pm
by Chris
Hi Arne,

In VO, you would not load the whole file in memory, instead you would read line by line. And also in X#, for such huge files in real
conditions you would normally do the same thing, but since this was only about a very small and quick test, I suggested doing it
this crude way, in just 10 lines of code. Doing it properly and reading line by line and avoiding using a lot of memory would instead
need 15 lines of code :)

.

.