Using XML and XPath in PostgreSQL database
2009-11-28Everybody knows, that PostgreSQL is one of the best open-source RDBMS. The keyword here is „relational” database. For most purposes relational structure is okay, but sometimes we have to store highly complicated hierarchical data. Mapping such hierarchical structure to relational tables could be a real pain. A simple solution is to use a hierarchical markup language to present our data – XML. Luckily PostgreSQL provides a data type for storing XML documents. I was very optimistic about it, but it didn’t meet all my expectations.
So for the beginning, let’s create our test database, which will be storing info about people arriving at the conference (attendees and speakers). Please notice that the names used bellow are fictional 🙂
CREATE TABLE xmltest ( id SERIAL, data xml NOT NULL );
Now we can populate this table with some example records:
INSERT INTO xmltest (data, id) VALUES (' <attendee> <bio> <name>John Doe</name> <birthYear>1986</birthYear> </bio> <languages> <lang level="5">php</lang> <lang level="4">python</lang> <lang level="2">java</lang> </languages> </attendee>', 1); INSERT INTO xmltest (data, id) VALUES ('
<attendee> <bio> <name>Tom Smith</name> <birthYear>1978</birthYear> </bio> <languages> <lang level="5">python</lang> <lang level="3">java</lang> <lang level="1">ruby</lang> </languages> </attendee>', 2); INSERT INTO xmltest (data, id) VALUES (' <attendee> <bio> <name>Mike Williams</name> <birthYear>1982</birthYear> </bio> <languages> <lang level="5">java</lang> </languages> </attendee>', 3); INSERT INTO xmltest (data, id) VALUES (' <attendee> <bio> <name>Jason Taylor</name> <birthYear>1984</birthYear> </bio> <languages> <lang level="5">java</lang> <lang level="4">php</lang> </languages> </attendee>', 4); INSERT INTO xmltest (data, id) VALUES (' <attendee> <bio> <name>Kris Brown</name> <birthYear>1989</birthYear> </bio> <languages> <lang level="2">python</lang> <lang level="2">ruby</lang> </languages> </attendee>', 5); INSERT INTO xmltest (data, id) VALUES (' <speaker> <bio> <name>Mark Evans</name> <birthYear>1975</birthYear> </bio> <languages> <lang level="5">python</lang> </languages> </speaker>', 6); INSERT INTO xmltest (data, id) VALUES (' <speaker> <bio> <name>Steve Wilson</name> <birthYear>1972</birthYear> </bio> <languages> <lang level="5">php</lang> </languages> </speaker>', 7);
As we can see, we have similar records for speakers and attendees, each of them has a bio, containing a name and a birth year. Also, the list of programming languages (with advance level) is available.
PostgreSQL offers a bunch of functions to create XML content, you can find them here: http://developer.postgresql.org/pgdocs/postgres/functions-xml.htmlThose functions aren’t really useful in my opinion, XML should be produced by the application and only stored in DB, except situation, in which you are using XML from stored procedures.
For me, the main functionality, which should be provided by Postgres is filtering XML content via XPath. Of course, there is an XPath function which returns XML type, but there’s no operators assigned to XML field. In my opinion, the explanation of this fact may be, that XPath function is for retrieving data from XML and return it in SELECT clause. Even if it is true, it will be great to use XPath function in WHERE clause too. There is such a possibility in a limited way – you can cast XML type to text[] (text array) and compare it with a string. Take a look at those queries:
select data from xmltest where cast (xpath('//bio/name/text()', data) as text[]) = '{John Doe}';
This one retrieves John Doe record. But XPath gives us the possibility to filter results, so we can move conditions to it:
select data from xmltest where cast (xpath('//bio[name="John Doe"]', data) as text[]) != '{}';
In this example, we compare the result of XPath function with an empty array representation (‚{}’). Below there is a more complex example of query which uses XPath to filtering and selecting:
select data, cast(xpath('//bio/name/text()', data) as text[]) AS name, cast(xpath('//languages/lang[@level>=4]/text()', data) as text[]) AS langs from xmltest where cast(xpath('//languages/lang[@level>=4]', data) as text[]) != '{}' AND cast(xpath('//bio[birthYear>1980]', data) as text[]) != '{}';
This one retrieves whole XML record and extracted name and languages array from participants who are born after the 1980 year and are programming in languages in which they are at 4 levels of advance.
Above examples shows that PostgreSQL handles XML data in a pretty nice way, but some operators for those fields would be welcome 🙂