Reading Very Large files

Public support forum for peer to peer support with related to the Visual Objects and Vulcan.NET products
User avatar
Plummet
Posts: 21
Joined: Tue Jan 19, 2016 4:18 pm

Reading Very Large files

Post by Plummet »

Hi All,
When processing a VLF in VO, I find that number of lines output is less than number of lines input.

I wrote some code below to read a file line by line and return the line count.

FUNCTION Start()
LOCAL cInputFile AS STRING
LOCAL nInputLineCount AS DWORD
cInputFile := K_FILE_PATH
? cInputFile
IF File(cInputFile)
? "Will count lines ..."
WAIT
nInputLineCount := GetLineCount(cInputFile)
? "Lines=", nInputLineCount
ELSE
? cInputFile, "File not found"
ENDIF
WAIT
RETURN NIL

FUNCTION GetLineCount(cFile AS STRING) AS DWORD PASCAL
// assume file exists
LOCAL pFile AS PTR
LOCAL nCount AS DWORD
pFile := FOpen(cFile, FO_READ)
IF pFile == F_ERROR
? DosErrString(FError())
ENDIF
DO WHILE ! FEof(pFile)
FGetS2(pFile, 1024)
++nCount
ENDDO
FClose(pFile)
RETURN nCount

DEFINE K_FILE_PATH := "EPD_202203.csv"


The input file has size of 6,522,309,040 and 17,938,549 lines.
This code run in VO gives 11,816,979 lines!
This code run in X# gives the correct answer 17,938,549 lines.

Can anyone pls explain why VO will not read the entire file? Is it a bug in VO runtime or a limit in the WIN32 API functions?

You can find the actual data here if you want to test. Make sure you d/load the ZIP format!
https://opendata.nhsbsa.net/dataset/eng ... 4540a962fd

This post is linked to my other post on the Macro compiler. If I can solve one of the problems I can forget the other :)

Don
User avatar
OhioJoe
Posts: 131
Joined: Wed Nov 22, 2017 12:51 pm
Location: United States

Reading Very Large files

Post by OhioJoe »

Two suggestions:

1. If this code has worked before, then it's probably the input file. Try saving the file with an editor that enforces DOS line terminators: CHR(13) + CHR(10).

2. Use FReadLine() instead of FGet(). The instructions say there's no difference but there might be.
Joe Curran
Ohio USA
User avatar
Plummet
Posts: 21
Joined: Tue Jan 19, 2016 4:18 pm

Reading Very Large files

Post by Plummet »

Thanks Joe.
This file has standard line terminators.
Problem only happens with VLF's, > 11m lines?
Think I already tried FReadline ... will check anyway.
You can check this code on any file - just change the K_FILE_PATH value to point to your data.
Don
User avatar
robert
Posts: 4567
Joined: Fri Aug 21, 2015 10:57 am
Location: Netherlands

Reading Very Large files

Post by robert »

Don,

Are there lines in the file with line length > 1024?

Robert
XSharp Development Team
The Netherlands
robert@xsharp.eu
User avatar
Plummet
Posts: 21
Joined: Tue Jan 19, 2016 4:18 pm

Reading Very Large files

Post by Plummet »

Thanks for your reply Robert.

Line lengths variable (csv) but seem to be < 512, although dunno if there's a longy hidden somewhere . Difficult to look thru a file of 6gb ...

However the same code run in X# gives the correct answer of 17,938,549 lines.
Don
User avatar
Chris
Posts: 4986
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Reading Very Large files

Post by Chris »

Don,

It's very easy to check that with X#. Just use System.IO.File.ReadAllLines() in a small test app and then check the length of each line returned in the array.
Of course you'll need to have enough memory in your system for this simple way to work! And compile in AnyCPU/x64 mode...

.
Chris Pyrgas

XSharp Development Team
chris(at)xsharp.eu
User avatar
Plummet
Posts: 21
Joined: Tue Jan 19, 2016 4:18 pm

Reading Very Large files

Post by Plummet »

Thanks a lot for your reply Chris.

Well I didn't want to read the file into memory as there is a hell of a lot of it!
But I would like to know why, reading line by line, I was unable to get past 11m lines with VO. Is the blockage in the VO runtime or the underlying WIN32 API functions?

Anyway, it's not really important now as Robert helped fix my macro problem, so I have successfully processed all 17m lines in X# - yess. It took about 30 mins (9 yr old pc)
Thanks all for rapid response -
Don
User avatar
Chris
Posts: 4986
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Reading Very Large files

Post by Chris »

Hi Don,

I didn't mean to do this in your real app! :) I only suggested to do it in a small 10 line test app, just to find out if your file contains large lines.
But it's not important anymore, only maybe if you wanted to do it out of curiosity.

.
Chris Pyrgas

XSharp Development Team
chris(at)xsharp.eu
User avatar
ArneOrtlinghaus
Posts: 414
Joined: Tue Nov 10, 2015 7:48 am
Location: Italy

Reading Very Large files

Post by ArneOrtlinghaus »

The problem in VO is perhaps related to the size of the file. More than 4 GB needs larger address pointers than a DWORD. I remember the old and famous PKZIP that had also limits with the file size.

Arne
User avatar
Chris
Posts: 4986
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Reading Very Large files

Post by Chris »

Hi Arne,

In VO, you would not load the whole file in memory, instead you would read line by line. And also in X#, for such huge files in real
conditions you would normally do the same thing, but since this was only about a very small and quick test, I suggested doing it
this crude way, in just 10 lines of code. Doing it properly and reading line by line and avoiding using a lot of memory would instead
need 15 lines of code :)

.

.
Chris Pyrgas

XSharp Development Team
chris(at)xsharp.eu
Post Reply