A library building live models on top of QtSparql.
To be able to use QtSparqlTrackerLive successfully, the user must know about:
QtSparqlTrackerLive works as follows:
The original query determines the initial data that is inserted to the model. It must also meet certain requirements that enable TrackerLiveQuery to merge data from the update queries into the model.
In SPARQL, each resource is identified by it's URI. Internally in the Tracker data base, each resource has an integer id called tracker:id. The GraphUpdated signal contains these tracker:ids as data, and that's why the operation of the live queries is based on the tracker:ids rather than URI's. URI's can also be present in the original query, but they are not used by TrackerLiveQuery.
Example 1 (e-mails and titles): We have a model where column 0 contains tracker:ids of e-mails and column 1 contains the message subject of the corresponding e-mail. Each e-mail can have only one subject. E-mails without a subject are not included in the model. The model is ordered by the subjects of the e-mails.
The original query is:
SELECT tracker:id(?e) ?s { ?e a nmo:Email ; nmo:messageSubject ?s . } ORDER BY ?s
Identity columns define the identity for a row in the model.
Example 1 (e-mails and subjects): the column 0 containing the tracker:ids of the e-mails is the only identity column in the model. If the subject of an e-mail changes, the e-mail is still the same (its identity has not changed). But two e-mails with different tracker:ids are different, even if they have the same title.
The model will never contain several rows with the same identity.
The set of identity columns is used to determine whether a row in the update query result corresponds to an existing row (in which case the existing row should be updated) or not (in which case the new row should be added).
The set of collation columns determine the ordering between the rows in the model. The columns are used:
The user can define the list of collation columns. For each column, the user can define the type of the sorting (e.g., integer sorting, string sorting) and whether the sorting is ascending or descending.
Example 1 (e-mails and subjects): The collation column is column 1 which contains the subjects of the e-mails. The type of the sorting is ascending string sorting.
The user can specify that the TrackerLiveQuery should react to the changes in a specified ontology class and a subset of its properties (or all properties).
Note: To listen to the changes in a specific ontology class, the class needs to have the property tracker:notify
set to true
.
Example 1 (e-mails and subjects): The TrackerLiveQuery needs to monitor the changes in the nmo:Email
class. The interesting properties are nmo:messageSubject
and rdf:type
. rdf:type
is the property used for specifying that a resource belongs to an ontology class. To identify added and removed e-mails, we need to monitor this property. The changes in other properties can be ignored, since they don't affect the content of the model.
When the data in Tracker changes, TrackerLiveQuery executes an update query. Normally the update query is the original query with additional filters. The filters limit the query to resources which have actually changed.
The update query needs to contain the same columns as the original query. The update query must follow the same sorting as the original query.
Example 1 (e-mails and subjects): Let's assume we have the following results for the original query:
tracker:id(?e) | messageSubject ------------------------------------------ 103 | "hello" 100 | "remember me?" 101 | "special offer for you" 102 | "welcome to my party"
The GraphUpdated signal contains information in the form of triples (subject, predicate, object). Let's assume the data is as follows:
The GraphUpdated signal doesn't contain the literals (e.g., the messageSubject of an e-mail). It only contains information that the messageSubject has been deleted / inserted.
The update query to run is:
SELECT tracker:id(?e) ?s
{
?e a nmo:Email ;
nmo:messageSubject ?s .
FILTER (tracker:id(?e) in (100, 101, 104))
}
order by ?s
where (100, 101, 104) is the list of tracker:ids in the GraphUpdated signal (in the "subject" part of the triples).
The update queries are not limited to be this simple. With complex original queries, more complex update queries are needed.
The update query template (given by the user) is an arbitrary string which contains a placeholder for the "FILTER". The placeholder is replaced with a FILTER statement which is an "OR" of one or more filter snippets. Each filter snippet is an arbitrary string which contains a placeholder for the list of tracker:ids.
The general format of the update query is "... %FILTER ..." (meaning: free text with exactly one "%FILTER" placeholder inside it). FILTER is expanded to "FILTER ( snippet1 || snippet2 || ...)". Each snippet is of the form "... %LIST ... " (meaning: free text with exactly one "%LIST" placeholder inside it). LIST is replaced with a list of tracker:ids.
The final FILTER statement will be of the form "FILTER (tracker:id(?a) in (1, 2, 3) || tracker:id(?b) in (4, 5, 6))".
See Correct update queries for different types of models for examples of more complicated update queries.
To merge the update query results correctly into the model, the TrackerLiveQuery needs information about Identity columns, Collation columns and Affected rows.
To be able to merge the results back to the model (especially, to remove rows which are no longer supposed to be included in the model), the TrackerLiveQuery needs information about which existing rows in the model are affected by the update query results.
The affected rows are expressed as rules of the following form: "column <number> contains one of the id's in this list", where the list of id's is constructed from the GraphChanged signal. The list can contain data either from the subjects of the triples (usual) or the objects of the triples (rare).
For specifying the affected rows, the user must give the:
Example 1 (e-mails and subjects): The affected rows are those which have one of the changed subjects on the column 0. The signal contained the following subjects: (100, 101, 104). The affected rows are marked in the model below:
tracker:id(?e) | messageSubject ------------------------------------------ 103 | "hello" 100 | "remember me?" << this row is affected 101 | "special offer for you" << this row is affected 102 | "welcome to my party"
The removal of a row from the model happens usually when a resource is removed from Tracker. But it can also happen because a resource no longer meets the criteria for the resources in the model. (E.g,. "images with the favourite tags", "e-mails whose subject begins with A".)
When the update query returns, the results of it are merged into the original model. Merging means three different things:
Example 1 (e-mails and subjects): The update query results are:
tracker:id(?e) | messageSubject ------------------------------------------ 100 | "changed subject" 104 | "new e-mail"
The data after merging these results is:
tracker:id(?e) | messageSubject ------------------------------------------ 100 | "changed subject" << this row also got relocated 103 | "hello" 104 | "new e-mail" << this new row in inserted in the correct place 102 | "welcome to my party"
Goals:
The previous section demonstrated the query type "one resource and its properties". The following sections explain the usage of TrackerLiveQuery for more complicated models and its limitations.
See the Example 1 of Live concepts.
The following example requires listening to changes in 2 ontology classes, but the update query for reacting to the changes is the same.
The original query retrives the tracker:id and nie:url of each photo and video. Ordering is not specified, so the resulting model will be in arbitrary order, and new rows will be added to arbitrary places when new data is added to the model.
Original query:
SELECT tracker:id(?u) nie:url(?u) { { ?u a nmm:Photo . } UNION { ?u a nmm:Video . } }
Update query:
SELECT tracker:id(?u) nie:url(?u)
{
{
?u a nmm:Photo .
}
UNION
{
?u a nmm:Video .
}
FILTER(tracker:id(?u) in (...))
}
Identity columns: 0 Collation columns: none (sorting of the resources is arbitrary)
React to the following changes:
nmm:Photo
, properties: rdf:type
, nie:url
nmm:Video
, properties: rdf:type
, nie:url
List of id's is gathered from subjects in the GraphUpdated signal.
Affected rows are those whose value on column 0 is one of the id's.
Original query:
SELECT tracker:id(?c) ?ct tracker:id(?m) ?s { ?c a mfo:FeedChannel ; nie:title ?ct . ?m a mfo:FeedMessage ; nmo:messageSubject ?s ; nie:isLogicalPartOf ?c . }
Identity columns: 0, 2 Collation columns: none, sorting is arbitrary
React to the following changes:
mfo:FeedChannel
, properties: rdf:type
, nie:title
mfo:FeedMessage
, properties: rdf:type
, nmo:messageSubjectUpdate query:
SELECT tracker:id(?c) ?ct tracker:id(?m) ?s { ?c a mfo:FeedChannel ; nie:title ?ct . ?m a mfo:FeedMessage ; nmo:messageSubject ?s ; nie:isLogicalPartOf ?c . FILTER (tracker:id(?c) in list1 || tracker:id(?m) in list2) }
list1 is gathered from the subjects of the GraphUpdated signal related to mfo:FeedChannel
, and list2 is gathered from the subjects of the GraphUpdated signal related to mfo:FeedMessage
.
Affected rows: Rows that have a value in list1 on column 0 or a value in list2 on column 2.
The trick is to minimize the number of queries if both resources "change at the same time" (e.g., both a primary resource and a related secondary resource are added at the same time). That's why we have only one udpate query where we filter both channels and messages.
When a new mfo:FeedMessage
and a new mfo:FeedChannel
are added at the same time, the following inserted triples are received:
The following update query is ran:
SELECT tracker:id(?c) ?ct tracker:id(?m) ?s { ?c a mfo:FeedChannel ; nie:title ?ct . ?m a mfo:FeedMessage ; nmo:messageSubject ?s ; nie:isLogicalPartOf ?c . FILTER (tracker:id(?c) in (200) || tracker:id(?m) in (201)) }
Affected rows is an empty set, since the resources have just been added and are not part of the previous model.
The update query returns only one row:
tracker:id(?c) | ?ct | tracker:id(?m) | ?s -------------------------------------------------------------------- 200 | "new channel" | 201 | "new message"
The row is then correctly added to the existing model.
The original query retrieves the tracker:ids of artists, and for each artist, the number of songs, and the length of the shortest song of that artist.
Original query:
SELECT tracker:id(?a) count(?mp) min(?l) { ?a a nmm:Artist ; nmm:artistName ?an . OPTIONAL { ?mp a nmm:MusicPiece ; nmm:performer ?a ; nmm:length ?l . } } GROUP BY ?a ORDER BY ?an
The correct update query for this case is delightfully non-trivial:
SELECT tracker:id(?a) ?an count(?mp) min(?l) { ?a a nmm:Artist ; nmm:artistName ?an. OPTIONAL { ?mp a nmm:MusicPiece ; nmm:performer ?a ; nmm:length ?l . } FILTER (EXISTS { ?mp_filter a nmm:MusicPiece ; nmm:performer ?a . FILTER(?mp_filter in list1) } << limits the query to the artists who have a specific song || ?a in list2) << limits the query to specific artists } GROUP BY ?a ORDER BY ?an
To keep the aggregate computation correct, the update query must not change the solution set for the part that is shared with the original query. This update query fulfills the requirement: it doesn't modify the main pattern of the original query:
?a a nmm:Artist ; nmm:artistName ?an. OPTIONAL { ?mp a nmm:MusicPiece ; nmm:performer ?a ; nmm:length ?l . }
but only adds a FILTER to it. This means that the set of solutions (all the possible combinations of resources to which ?a, ?an, ?mp and ?l can be bound) remains the same.
If we remove the OPTIONAL block from the previous example, we require that all artists included in the model must have at least one song.
When a song is deleted, the following deleted triples are received:
Specifically, the triple
When a song is deleted, we cannot any more determine whose song it was. TrackerLiveQuery does not store data about the artist-song-relationship, and since the song has been deleted from Tracker, we cannot query more information about it.
As a result, we cannot determine the affected rows for the update we will run. The update result will be empty, since the deleted song no longer exists. If some artist should be deleted from the model, the TrackerLiveQuery cannot know that, and fails to delete the artist.
This problem will disappear if Tracker is changed to include all the properties of the removed resources into the GraphChanged signal.
The problem can also be solved by adding a TrackerFullUpdater which re-runs the original query when this change happens.
The following example demonstrates this limitation.
Example: The model contains artists and their number of songs, but only artists whose minimum length song is longer than 1 minute are included in the model.
The change: the length of a song changes. Some artist might need to be deleted from the model since he no longer has a song longer than 1 minute.
We can re-run the original query limited to the artists who have the changed song. But we cannot determine the Affected rows : The model stores no data about the relationship between the artists and the songs. Thus, we don't know which artists to delete from the model, if they don't appear in the update query results.
Unlike the problem described in Aggregates: limitation 1, this problem will not disappear even if Tracker includes more data in the signal.
The problem can also be solved by adding a TrackerFullUpdater which re-runs the original query when this change happens.
The following classes are needed:
The following example demonstrates the use of TrackerLiveQuery and TrackerPartialUpdater.
#include "TrackerLiveQuery" #include <QSparqlConnection> #include <QApplication> #include <QTableView> int main(int argc, char *argv[]) { QApplication app(argc, argv); QString origQuery("select tracker:id(?e) ?s " "{ ?e a nmo:Email ; " "nmo:messageSubject ?s . " "} " "order by ?s"); QString updateQuery("select tracker:id(?e) ?s " "{ ?e a nmo:Email ; " "nmo:messageSubject ?s . " "%FILTER " "} " "order by ?s"); QSparqlConnection conn("QTRACKER_DIRECT"); // Parameters: original query, number of columns, identity columns, and the // QSparqlConnection to run the queries with. // Identity columns are the set of columns which identify a row in the // model. TrackerLiveQuery liveQuery(origQuery, 2, QList<int>() << 0, conn); // Collation columns define the columns which are used in sorting, and the // sorting type. liveQuery.setCollationColumns(QList<TrackerLiveQuery::CollationColumn>() << TrackerLiveQuery::CollationColumn(1, QVariant::String)); // Parameters: the update query with a "%FILTER" placeholder; the update // query needs to have the same sorting criteria than the original query. TrackerPartialUpdater up(updateQuery); // Parameters: // 1) the class to watch // 2) the interesting properties (empty list == all) // 3) filter snippet for constructing the filter statement which will // replace "%FILTER" in the update query; the snippet contains the "%LIST" // placholder // 4) TrackerPartialUpdater::Subject means that %LIST will be replaced // with a list of tracker:id's extracted from the "subject" part of the // triples (subject, predicate, object) contained in the Tracker // GraphUpdated signal. // 5) The rows in the model which are affected by the update query will // contain one of those id's on column 0. up.watchClass("nmo:Message", QStringList(), "tracker:id(?e) in %LIST", TrackerPartialUpdater::Subject, 0); liveQuery.addUpdater(up); liveQuery.start(); QAbstractItemModel* model = liveQuery.model(); model->setHeaderData(0, Qt::Horizontal, "Email"); model->setHeaderData(1, Qt::Horizontal, "Subject"); QTableView* view = new QTableView(); view->setModel(model); view->show(); return app.exec(); }
The following commands can be used for modifying data in Tracker while the example is running.
tracker-sparql -qu "insert {<email0> a nmo:Email ; nmo:messageSubject \"foo\" .}" tracker-sparql -qu "delete {<email0> a rdfs:Resource . }"
In the example, TrackerFullUpdater is not needed, since all changes can be handled without re-querying the full data.