MeeGo 1.2 Harmattan Developer Documentation Develop for the Nokia N9

XQuery

Introduction

XQuery is a language for traversing XML documents to select and aggregate items of interest and to transform them for output as XML or some other format. XPath is the element selection part of XQuery.

The QtXmlPatterns module supports using XQuery 1.0 and XPath 2.0 in Qt applications, for querying XML data and for querying non-XML data that can be modeled to look like XML. Readers who are not familiar with the XQuery/XPath language can read A Short Path to XQuery for a brief introduction.

Advantages of using QtXmlPatterns and XQuery

The XQuery/XPath language simplifies data searching and transformation tasks by eliminating the need for doing a lot of C++ or Java procedural programming for each new query task. Here is an XQuery that constructs a bibliography of the contents of a library:

 <bibliography>
 {doc("library.xml")/bib/book[publisher="Addison-Wesley" and @year>1991]/<book year="{@year}">{title}</book>}
 </bibliography>

First, the query opens a <bibliography> element in the output. The embedded path expression then loads the XML document describing the contents of the library (library.xml) and begins the search. For each <book> element it finds, where the publisher was Addison-Wesley and the publication year was after 1991, it creates a new <book> element in the output as a child of the open <bibliography> element. Each new <book> element gets the book's title as its contents and the book's publication year as an attribute. Finally, the <bibliography> element is closed.

The advantages of using QtXmlPatterns and XQuery in your Qt programs are summarized as follows:

  • Ease of development: All the C++ programming required to perform data query tasks can be replaced by a simple XQuery like the example above.
  • Comprehensive functionality: The expression syntax and rich set of functions and operators provided by XQuery are sufficient for performing any data searching, selecting, and sorting tasks.
  • Conformance to standards: Conformance to all applicable XML and XQuery standards ensures that QtXmlPatterns can always process XML documents generated by other conformant applications, and that XML documents created with QtXmlPatterns can be processed by other conformant applications.
  • Maximal flexibility The QtXmlPatterns module can be used to query XML data and non-XML data that can be modeled to look like XML.

Using the QtXmlPatterns module

There are two ways QtXmlPatterns can be used to evaluate queries. You can run the query engine in your Qt application using the QtXmlPatterns C++ API, or you can run the query engine from the command line using Qt's xmlpatterns command line utility.

Running the query engine from your Qt application

If we save the example XQuery shown above in a text file (e.g. myquery.xq), we can run it from a Qt application using a standard QtXmlPatterns code sequence:

     QFile xq("myquery.xq");

     QXmlQuery query;
     query.setQuery(&xq, QUrl::fromLocalFile(xq.fileName()));

     QXmlSerializer serializer(query, myOutputDevice);
     query.evaluateTo(&serializer);

First construct a QFile for the text file containing the XQuery (myquery.xq). Then create an instance of QXmlQuery and call its setQuery() function to load and parse the XQuery file. Then create an XML serializer to output the query's result set as unformatted XML. Finally, call the evaluateTo() function to evaluate the query and serialize the results as XML.

Note: If you compile Qt yourself, the QtXmlPatterns module will not be built if exceptions are disabled, or if you compile Qt with a compiler that doesn't support member templates, e.g., MSVC 6.

See the QXmlQuery documentation for more information about the QtXmlPatterns C++ API.

Running the query engine from the command line utility

xmlpatterns is a command line utility for running XQueries. It expects the name of a file containing the XQuery text.

 xmlpatterns myQuery.xq

The XQuery in myQuery.xq will be evaluated and its output written to stdout. Pass the -help switch to get the list of input flags and their meanings.

xmlpatterns can be used in scripting. However, the descriptions and messages it outputs were not meant to be parsed and may be changed in future releases of Qt.

The XQuery Data Model

XQuery represents data items as atomic values or nodes. An atomic value is a value in the domain of one of the built-in datatypes defined in Part 2 of the W3C XML Schema. A node is normally an XML element or attribute, but when non-XML data is modeled to look like XML, a node can also represent a non-XML data items.

When you run an XQuery using the C++ API in a Qt application, you will often want to bind program variables to $variables in the XQuery. After the query is evaluated, you will want to interpret the sequence of data items in the result set.

Binding program variables to XQuery variables

When you want to run a parameterized XQuery from your Qt application, you will need to bind variables in your program to $name variables in your XQuery.

Suppose you want to parameterize the bibliography XQuery in the example above. You could define variables for the catalog that contains the library ($file), the publisher name ($publisher), and the year of publication ($year):

 <bibliography>
 {
     doc($file)/bib/book[publisher = $publisher and @year > $year]/<book year="{@year}">{title}</book>
 }
 </bibliography>

Modify the QtXmlPatterns code to use one of the bindVariable() functions to bind a program variable to each XQuery $variable:

     QFile xq("myquery.xq");
     QString fileName("the filename");
     QString publisherName("the publisher");
     qlonglong year = 1234;

     QXmlQuery query;

     query.bindVariable("file", QVariant(fileName));
     query.bindVariable("publisher", QVariant(publisherName));
     query.bindVariable("year", QVariant(year));

     query.setQuery(&xq, QUrl::fromLocalFile(xq.fileName()));

     QXmlSerializer serializer(query, myOutputDevice);
     query.evaluateTo(&serializer);

Each program variable is passed to QtXmlPatterns as a QVariant of the type of the C++ variable or constant from which it is constructed. Note that QtXmlPatterns assumes that the type of the QVariant in the bindVariable() call is the correct type, so the $variable it is bound to must be used in the XQuery accordingly. The following table shows how QVariant types are mapped to XQuery $variable types:

QVariant type XQuery $variable type
QVariant::LongLong xs:integer
QVariant::Int xs:integer
QVariant::UInt xs:nonNegativeInteger
QVariant::ULongLong xs:unsignedLong
QVariant::String xs:string
QVariant::Double xs:double
QVariant::Bool xs:boolean
QVariant::Double xs:decimal
QVariant::ByteArray xs:base64Binary
QVariant::StringList xs:string*
QVariant::Url xs:string
QVariant::Date xs:date.
QVariant::DateTime xs:dateTime
QVariant::Time. xs:time. (see Binding To QVariant::Time below)
QVariantList (see Binding To QVariantList below)

A type not shown in the table is not supported and will cause undefined XQuery behavior or a $variable binding error, depending on the context in the XQuery where the variable is used.

Binding To QVariant::Time

Because the instance of QTime used in QVariant::Time does not include a zone offset, an instance of QVariant::Time should not be bound to an XQuery variable of type xs:time, unless the QTime is UTC. When binding a non-UTC QTime to an XQuery variable, it should first be passed as a string, or converted to a QDateTime with an arbitrary date, and then bound to an XQuery variable of type xs:dateTime.

Binding To QVariantList

A QVariantList can be bound to an XQuery $variable. All the QVariants in the list must be of the same atomic type, and the $variable the variant list is bound to must be of that same atomic type. If the QVariants in the list are not all of the same atomic type, the XQuery behavior is undefined.

Interpreting XQuery results

When the results of an XQuery are returned in a sequence of result items, atomic values in the sequence are treated as instances of QVariant. Suppose that instead of serializing the results of the XQuery as XML, we process the results programatically. Modify the standard QtXmlPatterns code sequence to call the overload of QXmlQuery::evaluateTo() that populates a sequence of result items with the XQuery results:

     QFile xq("myquery.xq");
     QString fileName("the filename");
     QString publisherName("the publisher");
     qlonglong year = 1234;

     QXmlQuery query;

     query.bindVariable("file", QVariant(fileName));
     query.bindVariable("publisher", QVariant(publisherName));
     query.bindVariable("year", QVariant(year));

     query.setQuery(&xq, QUrl::fromLocalFile(xq.fileName()));

     QXmlResultItems result;
     query.evaluateTo(&result);
     QXmlItem item(result.next());
     while (!item.isNull()) {
         if (item.isAtomicValue()) {
             QVariant v = item.toAtomicValue();
             switch (v.type()) {
                 case QVariant::LongLong:
                     // xs:integer
                     break;
                 case QVariant::String:
                     // xs:string
                     break;
                 default:
                     // error
                     break;
             }
         }
         else if (item.isNode()) {
             QXmlNodeModelIndex i = item.toNodeModelIndex();
             // process node
         }
         item = result.next();
     }

Iterate through the result items and test each QXmlItem to see if it is an atomic value or a node. If it is an atomic value, convert it to a QVariant with toAtomicValue() and switch on its variant type to handle all the atomic values your XQuery might return. The following table shows the QVariant type to expect for each atomic value type (or QXmlName):

XQuery result item type QVariant type returned
xs:QName QXmlName (see Handling QXmlNames below)
xs:integer QVariant::LongLong
xs:string QVariant::String
xs:string* QVariant::StringList
xs:double QVariant::Double
xs:float QVariant::Double
xs:boolean QVariant::Bool
xs:decimal QVariant::Double
xs:hexBinary QVariant::ByteArray
xs:base64Binary QVariant::ByteArray
xs:gYear QVariant::DateTime
xs:gYearMonth QVariant::DateTime
xs:gMonthDay QVariant::DateTime
xs:gDay QVariant::DateTime
xs:gMonth QVariant::DateTime
xs:anyURI QVariant::Url
xs:untypedAtomic QVariant::String
xs:ENTITY QVariant::String
xs:date QVariant::DateTime
xs:dateTime QVariant::DateTime
xs:time (see No mapping for xs:time below)

Handling QXmlNames

If your XQuery can return atomic value items of type xs:QName, they will appear in your QXmlResultItems as instances of QXmlName. Since the QVariant class does not support the QXmlName class directly, extracting them from QXmlResultItems requires a bit of slight-of-hand using the Qt metatype system. We must modify our example to use a couple of template functions, a friend of QMetaType (qMetaTypeId<T>()) and a friend of QVariant (qVariantValue<T>()):

     QFile xq("myquery.xq");

     QXmlQuery query;
     query.setQuery(&xq, QUrl::fromLocalFile(xq.fileName()));

     QXmlResultItems result;
     query.evaluateTo(&result);
     QXmlItem item(result.next());
     while (!item.isNull()) {
         if (item.isAtomicValue()) {
             QVariant v = item.toAtomicValue();
             switch (v.type()) {
                 case QVariant::LongLong:
                     // xs:integer
                     break;
                 case QVariant::String:
                     // xs:string
                     break;
                 default:
                     if (v.userType() == qMetaTypeId<QXmlName>()) {
                         QXmlName n = qVariantValue<QXmlName>(v);
                         // process QXmlName n...
                     }
                     else {
                         // error
                     }
                     break;
             }
         }
         else if (item.isNode()) {
             QXmlNodeModelIndex i = item.toNodeModelIndex();
             // process node
         }
         item = result.next();
     }

To access the strings in a QXmlName returned by an XQuery evaluation, the QXmlName must be accessed with the name pool from the instance of QXmlQuery that was used for the evaluation.

No mapping for xs:time

An instance of xs:time can't be represented correctly as an instance of QVariant::Time, unless the xs:time is a UTC time. This is because xs:time has a zone offset (0 for UTC) in addition to the time value, which the QTime in QVariant::Time does not have. This means that if an XQuery tries to return an atomic value of type xs:time, an invalid QVariant will be returned. A query can return an atomic value of type xs:time by either converting it to an xs:dateTime with an arbitrary date, or to an xs:string.

Using XQuery with Non-XML Data

Although the XQuery language was designed for querying XML, with QtXmlPatterns one can use XQuery for querying any data that can be modeled to look like XML. Non-XML data is modeled to look like XML by loading it into a custom subclass of QAbstractXmlNodeModel, where it is then presented to the QtXmlPatterns XQuery engine via the same API the XQuery engine uses for querying XML.

When QtXmlPatterns loads and queries XML files and produces XML output, it can always load the XML data into its default XML node model, where it can be traversed efficiently. The XQuery below traverses the product orders found in the XML file myOrders.xml to find all the skin care product orders and output them ordered by shipping date.

 <result>
     <para>The following skin care products have shipped, ordered by shipping date(oldest first):</para>
     {
         for $i in doc("myOrders.xml")/orders/order[@product = "Acme Skin Care"]
         order by xs:date($i/@shippingDate) descending
         return $i
     }
 </result>

QtXmlPatterns can be used out of the box to perform this query, provided myOrders.xml actually contains well-formed XML. It can be loaded directly into the default XML node model and traversed. But suppose we want QtXmlPatterns to perform queries on the hierarchical structure of the local file system. The default XML node model in QtXmlPatterns is not suitable for navigating the file system, because there is no XML file to load that contains a description of it. Such an XML file, if it existed, might look something like this:

 <?xml version="1.0" encoding="UTF-8"?>
 <directory name="home">

     <file name="myNote.txt" mimetype="text/plain" size="8" extension="txt" uri="file:///home/frans/myNote.txt">
         <content asBase64Binary="TXkgTm90ZSE=" asStringFromUTF-8="My Note!"/>
     </file>

     <directory name="src">
         ...
     </directory>

     ...

 </directory>

The File System Example does exactly this.

There is no such file to load into the default XML node model, but one can write a subclass of QAbstractXmlNodeModel to represent the file system. This custom XML node model, once populated with all the directory and file descriptors obtained directly from the system, presents the complete file system hierarchy to the query engine via the same API used by the default XML node model to present the contents of an XML file. In other words, once the custom XML node model is populated, it presents the file system to the query engine as if a description of it had been loaded into the default XML node model from an XML file like the one shown above.

Now we can write an XQuery to find all the XML files and parse them to find the ones that don't contain well-formed XML.

 <html>
     <body>
         {
             $myRoot//file[@mimetype = 'text/xml' or @mimetype = 'application/xml']
             /
             (if(doc-available(@uri))
              then ()
              else <p>Failed to parse file {@uri}.</p>)
         }
     </body>
 </html>

Without QtXmlPatterns, there is no simple way to solve this kind of problem. You might do it by writing a C++ program to traverse the file system, sniff out all the XML files, and submit each one to an XML parser to test that it contains valid XML. The C++ code required to write that program will probably be more complex than the C++ code required to subclass QAbstractXmlNodeModel, but even if the two are comparable, your custom C++ program can be used only for that one task, while your custom XML node model can be used by any XQuery that must navigate the file system.

The general approach to using XQuery to perform queries on non-XML data has been a three step process. In the first step, the data is loaded into a non-XML data model. In the second step, the non-XML data model is serialized as XML and output to XML (text) files. In the final step, an XML tool loads the XML files into a second, XML data model, where the XQueries can be performed. The development cost of implementing this process is often high, and the three step system that results is inefficient because the two data models must be built and maintained separately.

With QtXmlPatterns, subclassing QAbstractXmlNodeModel eliminates the transformation required to convert the non-XML data model to the XML data model, because there is only ever one data model required. The non-XML data model presents the non-XML data to the query engine via the XML data model API. Also, since the query engine uses the API to access the QAbstractXmlNodeModel, the data model subclass can construct the elements, attributes and other data on demand, responding to the query's specific requests. This can greatly improve efficiency, because it means the entire model might not have to be built. For example, in the file system model above, it is not necessary to build an instance for a whole XML file representing the whole file system. Instead nodes are created on demand, which also likely is a small subset of the file system.

Examples of other places where XQuery could be used in QtXmlPatterns to query non-XML data:

  • The internal representation for word processor documents
  • The set of dependencies for a software build system
  • The hierarchy (or graph) that links a set of HTML documents from a web crawler
  • The images and meta-data in an image collection
  • The set of D-Bus interfaces available in a system
  • A QObject hierarchy, as seen in the QObject XML Model example.

See the QAbstractXmlNodeModel documentation for information about how to implement custom XML node models.

More on using QtXmlPatterns with non-XML Data

Subclassing QAbstractXmlNodeModel to let the query engine access non-XML data by the same API it uses for XML is the feature that enables QtXmlPatterns to query non-XML data with XQuery. It allows XQuery to be used as a mapping layer between different non-XML node models or between a non-XML node model and the built-in XML node model. Once the subclass(es) of QAbstractXmlNodeModel have been written, XQuery can be used to select a set of elements from one node model, transform the selected elements, and then write them out, either as XML using QXmlQuery::evaluateTo() and QXmlSerializer, or as some other format using a subclass of QAbstractXmlReceiver.

Consider a word processor application that must import and export data in several different formats. Rather than writing a lot of C++ code to convert each input format to an intermediate form, and more C++ code to convert the intermediate form back to each output format, one can implement a solution based on QtXmlPatterns that uses simple XQueries to transform each XML or non-XML format (e.g. MathFormula.xml below) to the intermediate form (e.g. the DocumentRepresentation node model class below), and more simple XQueries to transform the intermediate form back to each XML or non-XML format.

Because CSV files are not XML, a subclass of QAbstractXmlNodeModel is used to present the CSV data to the XQuery engine as if it were XML. What are not shown are the subclasses of QAbstractXmlReceiver that would then send the selected elements into the DocumentRepresentation node model, and the subclasses of QAbstractXmlNodeModel that would ultimately write the output files in each format.

Security Considerations

Code Injection

XQuery is vulnerable to code injection attacks in the same way as the SQL language. If an XQuery is constructed by concatenating strings, and the strings come from user input, the constructed XQuery could be malevolent. The best way to prevent code injection attacks is to not construct XQueries from user-written strings, but only accept user data input using QVariant and variable bindings. See QXmlQuery::bindVariable().

The articles Avoid the dangers of XPath injection, by Robi Sen and Blind XPath Injection, by Amit Klein, discuss the XQuery code injection problem in more detail.

Denial of Service Attacks

Applications using QtXmlPatterns are subject to the same limitations of software as other systems. Generally, these can not be checked. This means QtXmlPatterns does not prevent rogue queries from consuming too many resources. For example, a query could take too much time to execute or try to transfer too much data. A query could also do too much recursion, which could crash the system. XQueries can do these things accidentally, but they can also be done as deliberate denial of service attacks.

Features and Conformance

XQuery 1.0

QtXmlPatterns aims at being a conformant XQuery processor. It adheres to Minimal Conformance and supports the Serialization Feature and the Full Axis Feature. QtXmlPatterns currently passes 97% of the tests in the XML Query Test Suite. Areas where conformance may be questionable and where behavior may be changed in future releases include:

  • Some corner cases involving namespaces and element constructors are incorrect.
  • XPath is a subset of XQuery and the implementation of QtXmlPatterns uses XPath 2.0 with XQuery 1.0.

The specifications discusses conformance further: XQuery 1.0: An XML Query Language. W3C's XQuery testing effort can be of interest as well, XML Query Test Suite.

Currently fn:collection() does not access any data set, and there is no API for providing data through the collection. As a result, evaluating fn:collection() returns the empty sequence. We intend to provide functionality for this in a future release of Qt.

Only queries encoded in UTF-8 are supported.

XSLT 2.0

Partial support for XSLT was introduced in Qt 4.5. Future releases of QtXmlPatterns will aim to support these XSLT features:

  • Basic XSLT 2.0 processor
  • Serialization feature
  • Backwards Compatibility feature

For details, see XSL Transformations (XSLT) Version 2.0, 21 Conformance.

Note: In this release, XSLT support is considered experimental.

Unsupported or partially supported XSLT features are documented in the following table. The implementation of XSLT in Qt 4.5 can be seen as XSLT 1.0 but with the data model of XPath 2.0 and XSLT 2.0, and using the using the functionality of XPath 2.0 and its accompanying function library. When QtXmlPatterns encounters an unsupported or partially support feature, it will either report a syntax error or silently continue, unless otherwise noted in the table.

The implementation currently passes 42% of W3C's XSLT test suite, which focus on features introduced in XSLT 2.0.

XSL Feature Support Status
xsl:key and fn:key() not supported
xsl:include not supported
xsl:import not supported
xsl:copy The copy-namespaces and inherit-namespaces attributes have no effect. For copied comments, attributes and processing instructions, the copy has the same node identity as the original.
xsl:copy-of The copy-namespaces attribute has no effect.
fn:format-number() not supported
xsl:message not supported
xsl:use-when not supported
Tunnel Parameters not supported
xsl:attribute-set not supported
xsl:decimal-format not supported
xsl:fallback not supported
xsl:apply-imports not supported
xsl:character-map not supported
xsl:number not supported
xsl:namespace-alias not supported
xsl:output not supported
xsl:output-character not supported
xsl:preserve-space not supported
xsl:result-document not supported
Patterns Complex patterns or patterns with predicates have issues.
2.0 Compatibility Mode Stylesheets are interpreted as XSLT 2.0 stylesheets, even if the version attribute is in the XSLT source is 1.0. In other words, the version attribute is ignored.
Grouping fn:current-group(), fn:grouping-key() and xsl:for-each-group.
Regexp elements xsl:analyze-string, xsl:matching-substring, xsl:non-matching-substring, and fn:regex-group()
Date & Time formatting fn:format-dateTime(), fn:format-date() and fn:format-time().
XPath Conformance Since XPath is a subset of XSLT, its issues are in affect too.

The QtXmlPatterns implementation of the XPath Data Model does not include entities (due to QXmlStreamReader not reporting them). This means that functions unparsed-entity-uri() and unparsed-entity-public-id() always return negatively.

XPath 2.0

Since XPath 2.0 is a subset of XQuery 1.0, XPath 2.0 is supported. Areas where conformance may be questionable and, consequently, where behavior may be changed in future releases include:

  • Regular expression support is currently not conformant but follows Qt's QRegExp standard syntax.
  • Operators for xs:time, xs:date, and xs:dateTime are incomplete.
  • Formatting of very large or very small xs:double, xs:float, and xs:decimal values may be incorrect.

xml:id

Processing of XML files supports xml:id. This allows elements that have an attribute named xml:id to be looked up efficiently with the fn:id() function. See xml:id Version 1.0 for details.

XML Schema 1.0

There are two ways QtXmlPatterns can be used to validate schemas: You can use the C++ API in your Qt application using the classes QXmlSchema and QXmlSchemaValidator, or you can use the command line utility named xmlpatternsvalidator (located in the "bin" directory of your Qt build).

The QtXmlPatterns implementation of XML Schema validation supports the schema specification version 1.0 in large parts. Known problems of the implementation and areas where conformancy may be questionable are:

  • Large minOccurs or maxOccurs values or deeply nested ones require huge amount of memory which might cause the system to freeze. Such a schema should be rewritten to use unbounded as value instead of large numbers. This restriction will hopefully be fixed in a later release.
  • Comparison of really small or large floating point values might lead to wrong results in some cases. However such numbers should not be relevant for day-to-day usage.
  • Regular expression support is currently not conformant but follows Qt's QRegExp standard syntax.
  • Identity constraint checks can not use the values of default or fixed attribute definitions.

Resource Loading

When QtXmlPatterns loads an XML resource, e.g., using the fn:doc() function, the following schemes are supported:

Scheme Name Description
file Local files.
data The bytes are encoded in the URI itself. e.g., data:application/xml,%3Ce%2F%3E is <e/>.
ftp Resources retrieved via FTP.
http Resources retrieved via HTTP.
https Resources retrieved via HTTPS. This will succeed if no SSL errors are encountered.
qrc Qt Resource files. Expressing it as an empty scheme, :/..., is not supported.

XML

XML 1.0 and XML Namespaces 1.0 are supported, as opposed to the 1.1 versions. When a strings is passed to a query as a QString, the characters must be XML 1.0 characters. Otherwise, the behavior is undefined. This is not checked.

URIs are first passed to QAbstractUriResolver. Check QXmlQuery::setUriResolver() for possible rewrites.