XMLPARSE(n) 0.9 "XML"

NAME

XMLPARSE - Parser for XML files in Fortran

TABLE OF CONTENTS

    TABLE OF CONTENTS
    SYNOPSIS
    DESCRIPTION
    PROCEDURES
    MOTIVATION
    PARAMETERS AND DERIVED TYPES
    GENERATING A READING ROUTINE
    EXAMPLES
    LIMITATIONS
    RELEASE NOTES
    TO DO
    KEYWORDS

SYNOPSIS

subroutine xml_open( info, filename, mustread )
subroutine xml_close( info )
subroutine xml_options( info, ... )
subroutine xml_get( info, tag, endtag, attribs, no_attribs, data, no_data )
subroutine xml_put( info, tag, attribs, no_attribs, data, no_data, type )
logical function xml_ok( info )
logical function xml_error( info )
logical function xml_data_trunc( info )
integer function xml_find_attrib( attribs, no_attribs, name, value )
subroutine read_xml_file_xxx( filename, lurep, error )
subroutine xml_process( filename, attribs, data, startfunc, datafunc, endfunc, lurep, error )

DESCRIPTION

The XML parser provided by this module has been written entirely in Fortran, making it possible to read and write XML files without the need to use mixed-language programming techniques.

It should be noted that the implementation has a number of limitations (cf. the section Limitations). The module has the following features:

The module has been implemented in standard Fortran 90. It is the intention to make it compilable by the F compiler as well, so that it can be used in conjunction to a wide set of Fortran compilers.

(It should even be possible to convert the parsing routines to an equivalent library in FORTRAN 77, though with the availability of several free Fortran 95 compilers, there seems little need for that.)

PROCEDURES

The module defines the following public routines and functions:

subroutine xml_open( info, filename, mustread )
Open an XML-file and fill the structure info, so that it can be used to refer to the opened file.

To check if all is well, (errors could be: the file can not be opened for some reason), the function xml_error() is available.

Arguments:

info - TYPE(XML_PARSE) structure used to identify the file

filename - CHARACTER(LEN=*) name of the file to be opened

mustread - LOGICAL whether to read the file or to write to it


subroutine xml_close( info )
Close an opened XML-file. If the file was not opened, this routine has no effect.

info - TYPE(XML_PARSE) structure used to identify the file



subroutine xml_options( info, ... )
Set one or more options. These are all defined as optional arguments, so that the name=value convention can be used to select an option and to set its value. The first argument is fixed:

info - TYPE(XML_PARSE) structure used to identify the file

All other arguments are optional and include:



ignore_whitespace - LOGICAL compress the array of strings (remove empty lines and remove leading blanks) for easier processing

no_data_truncation - LOGICAL if data truncation occurs (too many lines of data or too many attributes, so that they can not all be stored in the arrays), this can be marked as an error or not. If the option is set to true, it is considered an error.

report_lun - INTEGER LU-number of a file to which messages can be logged (use XML_STDOUT for output to screen)

report_errors - LOGICAL write error messages to the report

report_details - LOGICAL write detailed messages to the report, useful for debugging


Note that these options are off by default. They should be set after the file has been opened. The reporting options can be set before an XML file has been opened, they hold globally (that is, they are in effect for all reading and writing, independent of the files).



subroutine xml_get( info, tag, endtag, attribs, no_attribs, data, no_data )
Read the current tag in the file up to the next one or the end-of-file. Store the attributes in the given array and do the same for the character data that may be present after the tag.

info - TYPE(XML_PARSE) structure used to identify the file

tag - CHARACTER(LEN=*) string that will hold the tag's name

endtag - LOGICAL indicates whether the current tag has ended or not

attribs - CHARACTER(LEN=*), DIMENSION(:,:) array of strings that will hold the attributes given to the tag

no_attribs - INTEGER number of attributes that were found

data - CHARACTER(LEN=*), DIMENSION(:) array of strings that will hold the character data (one element per line)

no_data - INTEGER number of lines of character data
Note:

If an error occurs or end-of-file is found, then use the functions xml_ok() and xml_error() to find out the conditions.



subroutine xml_put( info, tag, attribs, no_attribs, data, no_data, type )
Write the information for the current tag to the file. This subroutine is the inverse, so to speak, of the subroutine xml_get that parses the XML input.

For a description of the arguments, other than type: see above.

type - CHARACTER(LEN=*) string having one the following values:



logical function xml_ok( info )
Returns whether the parser is still okay (no read errors or end-of-file).

info - TYPE(XML_PARSE) structure used to identify the file



logical function xml_error( info )
Returns whether the parser has encountered some error (see also the options).

info - TYPE(XML_PARSE) structure used to identify the file



logical function xml_data_trunc( info )
Returns whether the parser has had to truncate the data or the attributes.

info - TYPE(XML_PARSE) structure used to identify the file



integer function xml_find_attrib( attribs, no_attribs, name, value )
Convenience function that searches the list of attributes and returns the index of the sought attribute in the array or -1 if not present. In that case the argument value is not set, so that you can use this to supply a default.

attribs - CHARACTER(LEN=*), DIMENSION(:,:) array of strings that hold the attributes

no_attribs - INTEGER number of attributes that was found

name - CHARACTER(LEN=*) name of the attribute to be found

value - CHARACTER(LEN=*) actual or default value of the attribute upon return
subroutine read_xml_file_xxx( filename, lurep, error )
Subroutine generated via the method described below to read an XML file of a particular structure.

filename - CHARACTER(LEN=*) name of the XML file to read

lurep - INTEGER LU-number to use for reporting errors (use 0 to write to the screen; optional)

error - LOGICAL variable that indicates if an error occurred while reading (optional).
subroutine xml_process( filename, attribs, data, startfunc, datafunc, endfunc, lurep, error )
Subroutine that reads the XML file and calls three user-defined subroutines to take care of the actual processing. This is a routine that implements the so-called SAX approach.

filename - CHARACTER(LEN=*) name of the XML file to read

attribs - CHARACTER(LEN=*), DIMENSION(:,:) work array to store the attributes

data - CHARACTER(LEN=*), DIMENSION(:) work array to store the character data associated with a tag

startfunc - Subroutine that is called to handle the start of a tag:

 
    subroutine startfunc( tag, attribs, error )
       character(len=*)                 :: tag
       character(len=*), dimension(:,:) :: attribs
       logical                          :: error



If the argument error is set to true (because the tag was unexpected or something similar), the reading is interrupted and the routine returns. Only the fact that something was wrong is recorded. You need to use other means to convey more information if that is needed.

datafunc - Subroutine that is called to handle the character data associated with a tag:

 
    subroutine datafunc( tag, attribs, error )
       character(len=*)               :: tag
       character(len=*), dimension(:) :: data
       logical                        :: error



endfunc - Subroutine that is called to handle the end of a tag:

 
    subroutine endfunc( tag, error )
       character(len=*)               :: tag
       logical                        :: error



lurep - INTEGER LU-number to use for reporting errors (use 0 to write to the screen; optional)

error - LOGICAL variable that indicates if an error occurred while reading (optional).

MOTIVATION

The use of XML-files as a means to store data and more importantly to transfer data between very disparate applications and organisations has been growing these last few years. Standard implementations of libraries that deal with all features of XML or a significant part of them are available in many languages, but as far as we know there was no implementation in Fortran.

One could of course use, say, the well-known Expat library by ... and provide a Fortran interface, but this is slightly awkward as it forces one to have a compatible C compiler. More importantly, this introduces platform-dependencies because the interfacing between Fortran and C depends strongly on the used compilers and this introduces a way of working that is alien to Fortran programmers: Expat requires the programmer to register a callback function, to be called when some "event" occurs while reading the file (a begin tag is found, character data are found and so on).

The alternative is even more awkward: build a tree of tags and associated data and ask for these data. To a Fortran programmer, one of the first things they will want to do with an XML-file is to get all the information out - so a stream-oriented parsing method is more appropriate.

Among the two predominant types of XML-parsing, SAX or stream-oriented parsing and DOM or object-oriented parsing, the stream-oriented approach is more suitable to the frame of mind of the average Fortran programmer. But instead of registering callbacks, this module uses the method known from, for instance, GNU's getopt() function: parse the data and return to the caller to have it process the information. The caller calls the function again and again, letting getopt() take care of the details.

This is exactly the approach taken by the xmlparse module:

 
    call xml_open(info, ... )

    do while ( xml_ok(info) )
       call xml_get(info, ... ) ! Get the first/next tag
       ... identify the tag (via xml_check_tag for instance)
       ... process the information
    enddo

    call xml_close(info)

    ... proceed with the rest of the program

For convenience, the module does supply the routine xml_process that takes three user-defined subroutines to perform the actual processing. The file will be processed in its entirety.

PARAMETERS AND DERIVED TYPES

The module defines several parameters and derived types for use by the programmer:

XML_BUFFER_LENGTH
the length of the internal buffer, representing the maximum length of any individual line in an XML file and the maximum length for a tag including all its attributes.

XML_STDOUT
a parameter to indicate the standard output (or *) as the file to write messages to.

type(XML_PARSE)
the data structure that holds information about the XML file to be read or written. Its contents are partially accessible via functions such as XML_OK() and XML_ERROR(). Note: do not use its contents directly, as these may change in future.

GENERATING A READING ROUTINE

Reading an XML file and making sure the data are structured the way they are supposed to, generally requires a lot of code. This can not be avoided: you will want to make sure everything you need is there and anything else is dealt with appropriately.

There is a way out: by automatically generating the reading routine you can reduce the amount of manual coding to a minimum. This has two advantages:

The idea is simple:

In an XML-file you define the data structure and the way this data structure should appear in an input XML file for your program. The process is probably best explained via an example.

Say, you want to read addresses (a classical example). Each address consists of the name of the person, street name and the number of the house, city (let us keep it simple). Of course we have multiple addresses, so they are stored in an array. Then via the xmlreader program you can generate a reading routine that deals with this type of information.

The program takes an XML file as input and produces a Fortran 90 module that reads input files and stores the data in the designated variables. It also creates a writing routine to write the data to an XML file.

In our case, we want a derived type to hold the various pieces that form a complete address and we want an array of that type:

 
<typedef name="address_type">
   <component name="person" type="character" length="40">
   <component name="street" type="character" length="40">
   <component name="number" type="integer">
   <component name="city"   type="character" length="40">
</typedef>
<variable name="adress" dimension="1">

This will produce the following derived type:

 
type address_type
   character(len=40) :: person
   character(len=40) :: street
   integer           :: number
   character(len=40) :: city
end type address_type

and a variable "address":

 
type(address_type), dimension(:), pointer :: address

The reading routine will be able to read such XML files as the following:

 
<address>
   <person>John Doe</person>
   <street>Wherever street</street>
   <number>30</number>
   <city>Erewhon</city>
</address>
<address>
   ...
</address>
...

If in some address the number was forgotten, the reading routine will report this, as by default all variables and components in a derived type must be present.

Here is a more detailed description of the XML files accepted by the xmlreader program:

Future versions may also include options for:

EXAMPLES

The directory "examples" contains some example programs.

LIMITATIONS

Basic limitations:

RELEASE NOTES

This document belongs to version 1.00 of the module.

History:

version 0.1: Proof of concept, august 2003

A very preliminary version meant to show that it is indeed possible to read and write XML files using Fortran only. It was published on the comp.lang.fortran newsgroup and generated enough interest to encourage further development.

version 0.2: First public release, august 2003

After some additional testing with practical XML-files, a number of bugs were found and solved, several enhancements were made:

version 0.3: Improvements, september 2003

version 0.4: Corrected xml_put(), october 2003

version 0.9: Added new approach, october 2005

version 0.94: Gradually expanding the capabilities, june 2006

version 0.97: Added the following capabilities to the xmlreader program since 0.94, june 2007

version 1.00: Added the following capabilities to the xmlreader program since 0.97, april 2008

The project now also contains a first version of a program to convert an XSD file to a file accepted by the xmlreader program. This is called "xsdconvert".

TO DO

The following items remain on the "to do" list:

KEYWORDS

Fortran, XML, parsing