Xml file UTF-8 BOM

Public support forum for peer to peer support with related to the Visual Objects and Vulcan.NET products
stecosta66
Posts: 46
Joined: Mon Sep 26, 2016 12:59 pm

Xml file UTF-8 BOM

Post by stecosta66 »

Hi All,
I'm having truble with some electronic invoice, in XML format, that are encoded in UTF-8 BOM (Byte Order Mark)
I use Fopen() to open the xml file, and FReadLine() to read every line step by step.

With some of these files I'm getting some strange characters at the beginning, and I discovered that are encoded with BOM
"<?xml version"

There is any method to remove the BOM encoding with VO?

Thanks
Sherlock
Posts: 63
Joined: Mon Sep 28, 2015 1:37 pm
Location: Australia mate... fare dikkum

Xml file UTF-8 BOM

Post by Sherlock »

https://www.w3.org/International/questi ... order-mark

What is do ,, is if that string found [  ] reduce to []
I have XML that does not have it, but my editor/hex editor adds it.
My XML code reader could not detect the <?xml version in the file.
You could detect ether as valid. "<?xml version" or "<?xml version"

The hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF
Phil McGuinness
User avatar
robert
Posts: 4518
Joined: Fri Aug 21, 2015 10:57 am
Location: Netherlands

Xml file UTF-8 BOM

Post by robert »

Stefano,

What you probably should do is:
- skip the BOM when it exists
- when the file has a BOM then then use the function Utf82Ansi() to translate the strings that you read from the file from UTF8 to Ansi. (This function is in the Util module inside System Library).

Robert

Robert
XSharp Development Team
The Netherlands
robert@xsharp.eu
User avatar
Chris
Posts: 4899
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Xml file UTF-8 BOM

Post by Chris »

I would just use .Net methods for file access, since those are a lot more powerful and can automatically handle BOM markers, encodings etc:

Code: Select all

USING System.IO
...
LOCAL oStream AS StreamReader
LOCAL cLine AS STRING
oStream := StreamReader{"c:testtestutf.txt", TRUE} // automatically detect encoding
DO WHILE oStream:Peek() != -1
	cLine := oStream:ReadLine()
	? cLine
END DO
Or even simpler:

Code: Select all

System.IO.File.ReadAllLines() // returns an array of strings

Edit: Oops, sorry, did not realize this is about VO!
Chris Pyrgas

XSharp Development Team
chris(at)xsharp.eu
ic2
Posts: 1858
Joined: Sun Feb 28, 2016 11:30 pm
Location: Holland

Xml file UTF-8 BOM

Post by ic2 »

Hello Stefano,

Are you using VO or X#?

We read (and create) UBL files in VO and it works fine so far. But we read the XML string using this function and probably that is what could help for you as well.


Dick

FUNCTION StringReadZeroNoAnsi(cPath AS STRING) AS STRING PASCAL
//#s KB 24-1-2011
//#s Alternative for MemoRead that is not SetAnsi dependant
LOCAL cText AS STRING
LOCAL ptrHandle AS PTR
LOCAL dwFileSize AS DWORD
LOCAL dwError AS DWORD

cText := ""
dwError := 0

IF FFirst(String2Psz(cPath), FC_NORMAL)
dwFileSize := FSize()
IF dwFileSize > 0
cText := Buffer(dwFileSize)
ptrHandle := FOpen2(cPath, FO_READ + FO_SHARED)
dwError := FError()
IF dwError == 0
IF FRead(ptrHandle, @cText, dwFileSize) == dwFileSize
FClose(ptrHandle)
ENDIF
dwError := FError()
ENDIF
ENDIF
ELSE
dwError := FError()
ENDIF

IF dwError <> 0
// Error handling
ENDIF

RETURN cText
User avatar
wriedmann
Posts: 3755
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

Xml file UTF-8 BOM

Post by wriedmann »

Ciao Stefano,
since the .NET XML functions are much more powerful, I have implemented the reading of the FPA/FPR invoices in X# and I'm using them in through a COM module in VO.
That also helps removing the eventual present signature in case of a p7m file.
If you are interested, I can give you a part of the code (my complete code includes also the sending and receiving code to the web service of my provider).
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
stecosta66
Posts: 46
Joined: Mon Sep 26, 2016 12:59 pm

Xml file UTF-8 BOM

Post by stecosta66 »

Thanks all for the suggestions,
I'll try that

With FReadLine() I'm also getting a string lenght of 256 byte with no CRLF.
Tried to open the xml file with Notepad++ and I see, in the status bar, Unix (LF) UTF-8 BOM
This file is generated from a web based software for electronic invoice, in this case Aruba fatturazione elettronica.

With other xml files says Windows (CRLF) + UTF-8. This file gives me no problem

How can I workaround this with VO?
stecosta66
Posts: 46
Joined: Mon Sep 26, 2016 12:59 pm

Xml file UTF-8 BOM

Post by stecosta66 »

wriedmann wrote:Ciao Stefano,
since the .NET XML functions are much more powerful, I have implemented the reading of the FPA/FPR invoices in X# and I'm using them in through a COM module in VO.
That also helps removing the eventual present signature in case of a p7m file.
If you are interested, I can give you a part of the code (my complete code includes also the sending and receiving code to the web service of my provider).
Wolfgang
Hi Wolfgang,
thanks for the support.

I would be interested in trying what you have done.
Actually I'm un-singning the .p7m files with openssl command using ShellExecute() and it is working fine.
Are you using a scraping tecnhique to send/receive files through web service?

Thanks
Stefano
User avatar
wriedmann
Posts: 3755
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

Xml file UTF-8 BOM

Post by wriedmann »

Ciao Stefano,
you need to read the file entirely and then use MemoLine() to split the lines, and maybe split the lines using StrTran() replacing all LF with CRLF.
But please beware that received files may have several different formats: maybe even the entire data without any line break - I have seen a lot of different things now. Your read function should not depend on any newline.
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
User avatar
wriedmann
Posts: 3755
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

Xml file UTF-8 BOM

Post by wriedmann »

Ciao Stefano,
(for others: in Italy all the invoices need to be sent in a specific XML format through a system maintained by the ministry of the Finance):
to remove the signature I'm using a simple .NET call.
For sending and receiving the invoices I'm using an API that my provider has. AFAIK also Aruba has a sort of API, and it is much, much simpler do that in .NET than in plain VO.
Therefore I have all that functionality in a X# module that is used through COM in my VO applications.
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Post Reply