IBM Skip to main content
Search for:   within 
      Search help  
     IBM home  |  Products & services  |  Support & downloads   |  My account

developerWorks > Open source projects | Linux | Java technology | Web services
developerWorks
Build an LSID Resolution Service using the Java language
code142 KBe-mail it!
Contents:
Getting started
Installing the LSID package
Hello World, your first LSID authority
Using the Swiss-Prot data set
Importing the data into a MySQL database
Metadata
Making the authority publicly available
Conclusion
Resources
About the authors
Rate this article
Related content:
LSID Resolution Protocol Project
Setting up your own LSID Authority using Perl
Open source in the biosciences
Open source in the lab
Subscriptions:
dW newsletters
dW Subscription
(CDs and downloads)
A Java-based Life Sciences Identifier authority consolidates biological data resources

Level: Advanced

Stefan Atev, Programmer, IBM
Ben Szekely (mailto:bhszekel@us.ibm.com?cc=youngt01@us.ibm.com&subject=Build an LSID Resolution Service using the Java language), Software Engineer, IBM

27 May 2003
Updated 03 March 2004

We take you through a step-by-step approach to building a Java®-based Life Sciences Identifier (LSID) authority from scratch. We demonstrate how to build this on a minimal data set and on data downloaded from the protein sequence database Swiss-Prot, all on the Linux platform.

The amount of biological data being created today is mind-boggling. As a biologist or bioinformaticist, you probably know of places around the network that provide very useful resources for your task at hand -- but remembering the different ways to access this information is often a productivity drain. Maybe you write a few Perl scripts or know someone who will provide you with some code for this or a procedure for that. At this point, you may be thinking that coming up with a common way of naming and finding this data is the only way you will be able to remain a biologist and not a programmer. Of course, the value of having a common way to identify data extends beyond bioinformatics, but for this article we will stay within the life sciences.

The Life Sciences Identifier (LSID) is an I3C Uniform Resource Name (URN) specification in progress. You can read more about the specification at the I3C (see Resources for a link). Conceptually, LSID is a straightforward approach to naming and identifying data resources stored in multiple, distributed data stores in a manner that overcomes the limitations of the naming schemes that are in use today.

An LSID resolver is a software system that implements an agreed-upon LSID resolution protocol to allow higher-level software to locate and access the data uniquely named by any LSID URN. The "server" side of this resolver solution is called an LSID authority. The client stacks and an example client, the LSID LaunchPad, are provided by the LSID Resolution Protocol Project.

In this article, you'll see how to create your own LSID Authority using the LSID resolver stack for the Java language.

Getting started
This article assumes that you have the necessary administrative privileges on the system that will house the authority (most likely you will need root access for some of the steps).

All the steps in this article were tested on Red Hat Linux 7.1 and Red Hat Linux 8. Java JDK versions 1.3.1 and 1.4.0 were tested. Jakarta Tomcat 4.1.18 was used. The sample code works with the IBM® WebSphere® software platform as well.

Prerequisites
First, you need access to a system capable of running the Jakarta Tomcat 4 Web server, Java2 (JDK 1.3.1 and up recommended), as well as a database engine such as MySQL 3.23.x.

Required Java packages
The Java LSID client/server stacks need several Java packages to be installed first:

Copy the .jar files to your Jakarta Tomcat shared/lib directory, or alternatively, make sure they are available to your Java runtime engine through the system class path.

If you opt to set up the sample authority using the Swiss-Prot data set for your own testing purposes, you will also need the file mysql-connector-java-x.x.x-bin.jar from the MySQL Connector/J distribution available from MySQL AB (see Resources for a link). You do not need the latest version of the JDBC drivers; the LGPL licensed version 2.0.14 would do. This module is used by the sample authority server to access a MySQL database containing the Swiss-Prot data and also needs to go into your Jakarta Tomcat shared/lib directory (or be in the system class path).

Installing the LSID package
Once you have downloaded the prerequisites, get the latest version of the LSID Java Client/Server stack (1.0.1 at the time of this writing). Obtain the binary LSID server distribution, the binary LSID client distribution, and copy the files lsid-client.jar and lsid-server.jar into your Jakarta Tomcat shared/lib directory.

The Java LSID server package provides a set of servlets and a simplified interface for quickly creating LSID authorities, as well as fully featured LSID resolution services.

Hello World, your first LSID authority
Before we go any further, let's implement an authority that only knows about one LSID: urn:lsid:ibm.com:hello:world. The parts of this particular LSID are:

  • ibm.com -- the domain of the issuing authority
  • hello -- the namespace of the LSID
  • world -- the object id of the LSID

The easiest way to implement the authority is to extend the com.ibm.lsid.server.impl.SimpleAuthority class, which will get used by the standard authority servlet implemented by com.ibm.lsid.server.AuthorityServlet. The methods we need to provide/override are:

  • initService
  • getDataLocations
  • getMetaDataLocations

The authority will not provide data or metadata services, but will simply describe the locations where data about urn:lsid:ibm.com:hello:world can be retrieved.

You can get the code by downloading lsid-java-samples.tar.gz and extracting HelloWorldAuthority.java or the WAR file helloworld.war.

Listing 1. Hello, code

01  package lsidsamples;
02
03  import com.ibm.lsid.LSID;
04  import com.ibm.lsid.MalformedLSIDException;
05  import com.ibm.lsid.ExpiringResponse;
06  import com.ibm.lsid.wsdl.LSIDDataPort;
07  import com.ibm.lsid.wsdl.LSIDMetadataPort;
08  import com.ibm.lsid.server.LSIDServiceConfig;
09  import com.ibm.lsid.server.LSIDServerException;
10  import com.ibm.lsid.server.impl.SimpleAuthority;
11  import com.ibm.lsid.wsdl.HTTPLocation;
12  import com.ibm.lsid.wsdl.FTPLocation;
13
14  public class HelloWorldAuthority extends SimpleAuthority {
15
16    public void initService(LSIDServiceConfig config) throws LSIDServerException {
17    }
18
19    public LSIDMetadataPort[] getMetaDataLocations(LSID lsid, String url) {
20      return new LSIDMetadataPort[0];
21    }
22
23    public LSIDDataPort[] getDataLocations(LSID lsid, String url) {
24      return new LSIDDataPort[] {
25        new HTTPLocation(
26          "www.ibm.com", 80, "/lsid/hello_world"
27        ),
28        new FTPLocation(
29          "ftp.ibm.com", "/lsid/hello_world.txt"
30        )
31      };
32    }
33  }

Dissecting "Hello World"
Line 01 specifies that our authority's implementation will be a part of the lsidsamples package. Lines 03 - 12 import the classes and interfaces we need to implement the authority. We will use the class com.ibm.lsid.server.impl.SimpleAuthority as the base for our lsidsamples.HelloWorldAuthority implementation (line 14).

Lines 16 - 17 implement the initService method, which will be called upon authority startup. Since we do not need to save any configuration options (accessible through LSIDServiceConfig), we can choose to do nothing.

The function getMetaDataLocations (lines 19 - 21) takes an LSID object as a parameter and returns an array of locations where metadata services about that LSID are available. Since we implement no metadata service in this example, the method returns an array with length 0 (returning null would have indicated an error).

The function getDataLocations is very similar to getMetaDataLocations, but this time we return an array providing two possible locations for the data: the hypothetical URLs http://www.ibm.com:80/lsid/hello_world and ftp://ftp.ibm.com/lsid/hello_world.txt.

Configuring the authority
To configure the authority, we must provide a deployment descriptor that gives the servlet a mapping from LSID to service implementation. The XML in Listing 2 defines a mapping called hello-world that applies to all LSIDs with authority ibm.com and namespace hello. The services section of the XML binds this mapping to our authority implementation.

You can find the deployment descriptor in the file webapps/helloworld/services/hello-world.xml:

Listing 2. Service configuration

<?xml version="1.0" encoding="UTF-8"?>
<deployment-descriptor xmlns="http://www.ibm.com/LSID/Standard/rsdl">
  <maps>
    <map name="hello-world">
      <pattern auth="ibm.com" ns="hello" />
    </map>
  </maps>
  <services>
    <service name="aLSID">
      <components>
        <auth map="hello-world" type="class">lsidsamples.HelloWorldAuthority</auth>
      </components>
    </service>
  </services>
</deployment-descriptor>

Running and testing the authority
To test the authority, you must first deploy it. Copy the file helloworld.war into your Jakarta Tomcat webapps directory. The file will be extracted into webapps/helloworld upon Tomcat startup, and your Hello World authority will be available at http://localhost:8080/helloworld/.

You can find the description of the authority servlets in the file webapps/helloworld/WEB-INF/web.xml:

Listing 3. Servlet configuration

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems,
Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app id="WebApp">
  <display-name>Hello World LSID Authority</display-name>
  <servlet>
    <servlet-name>AuthorityService</servlet-name>
    <display-name>Hello World Authority Servlet</display-name>
    <servlet-class>com.ibm.lsid.server.servlet.AuthorityServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>AuthorityService</servlet-name>
    <url-pattern>/</url-pattern>
  </servlet-mapping>
</web-app>

We did not have to write the Java servlet ourselves, since the standard com.ibm.lsid.server.servlet.AuthorityServlet does all we need. All authority services must implement the com.ibm.lsid.server.LSIDAuthorityService interface. Our sample authority, lsidsamples.HelloWorldAuthority, implements this interface by virtue of extending com.ibm.lsid.server.impl.SimpleAuthority. When AuthorityServlet is loaded, it will instantiate HelloWorldAuthority and will subsequently use the getMetaDataLocations and getDataLocations calls to get the information necessary to build the WSDL response for the standard LSID authority method getAvailableServices.

To test the authority, use the TestClient.java sample client program. The compiled class file for it is in the extracted samples directory, in the file test-client.jar. Enter the following command:


java TestClient urn:lsid:ibm.com:hello:world \
   http://localhost:8080/helloworld/

You will need the .jar files from your Tomcat shared/lib directory in the class path, together with samples/test-client.jar. The first parameter to TestClient is the LSID to test with, and the second temporarily maps the authority service for ibm.com to http://localhost:8080/helloworld/, where the Hello World authority is running. The expected output is:


Data is available at:
(ftp)  ftp://ftp.ibm.com/lsid/hello_world.txt
(http) http://www.ibm.com:80/lsid/hello_world

Using the Swiss-Prot data set
Swiss-Prot records contain an ID and an AC field, corresponding to a human-readable identifier of the record and its accession number. The sample Swiss-Prot authority that we'll implement will understand LSIDs of these various forms:

Table 1. Supported LSIDs
Sample LSID Description
urn:lsid:example.org:swiss-id:hv20_mouse An abstract LSID containing no data that represents the Swiss-Prot record with ID HV20_MOUSE. The LSID is related to the concrete representations of this Swiss-Prot record in various formats.
urn:lsid:example.org:swiss-id:hv20_mouse-sprot A concrete LSID naming the data about the Swiss-Prot record with ID HV20_MOUSE in Swiss-Prot format.
urn:lsid:example.org:swiss-id:hv20_mouse-fasta A concrete LSID naming the data about the Swiss-Prot record with ID HV20_MOUSE in FASTA format. The actual conversion to FASTA will be done on the fly and is left as an exercise for the reader.
urn:lsid:example.org:swiss-ac:p01879 An abstract LSID containing no data that represents the Swiss-Prot record with primary or secondary accession number P01879. The LSID is related to the concrete representations of this Swiss-Prot record in various formats.
urn:lsid:example.org:swiss-ac:p01879-sprot A concrete LSID naming the data about the Swiss-Prot record with accession number P01879 in Swiss-Prot format.
urn:lsid:example.org:swiss-ac:p01879-fasta A concrete LSID naming the data about the Swiss-Prot record with accession number P01879 in FASTA format. The actual conversion to FASTA will be done on the fly.

Obtaining the data set
You can download the Swiss-Prot database as a compressed file from expasy.org via FTP (see Resources for a link). Note that this involves some 63 MB to be transferred. Save the sprot40.dat.gz file in a convenient location (~/lsid). You will then need to extract it using the gunzip program:


cd ~/lsid
gunzip -d sprot40.dat.gz

Once you have done this, you should have a file called sprot40.dat in your lsid directory. The file format of the database is described in the Swiss-Prot user manual (again, please see Resources for a link).

Importing the data into a MySQL database
The first task ahead of us now is to create a MySQL user account to be used by the LSID authority and to create the necessary data tables. If the MySQL daemon is not running, start it up (etc/init.d/mysqld start as root on Red Hat Linux 7 and 8) and start the MySQL client as the root user by typing mysql -u root -p. Enter the appropriate password for the root MySQL user and enter the following:

Listing 4. Creating a user account

create database sprot4;
grant all on sprot4.*
  to lsiduser@localhost identified by 'none';
grant all on sprot4.*
  to lsiduser@localhost.localdomain identified by 'none';
grant all on sprot4.*
  to lsiduser@'%' identified by 'none';
use sprot4;
create table byid (
  id varchar(40) unique,
  version varchar(40),
  rootac varchar(40) unique,
  index(version)
);
create table byac (
  ac varchar(40) unique,
  rootac varchar(40),
  index(rootac)
);
create table acdata (
  rootac varchar(40) unique,
  data blob
);

If you want to save yourself some typing, get the mysql.batch1 file from the samples directory and run the command mysql -f -u root -p < mysql.batch1. A user account for "lsiduser" with password "none" will be created with access to the database sprot4. The three tables that we will create are byid, byac, and byacdata:

Table 2. Table byid
Field (column name) Description
id Unique identifier for LSIDs with namespace swiss-id. Up to 40 characters long. For example, id will contain the string HV20_MOUSE for the LSID urn:lsid:example.com:SWISS-ID:HV20_MOUSE.
version An optional (may be NULL) version string of up to 40 characters. For the LSID urn:lsid:example.com:SWISS-ID:HV20_MOUSE:version2, this field will contain the value version2. We will not use this field for our example.
rootac The primary Swiss-Prot accession number for the corresponding LSID. For the LSID urn:lsid:example.com:SWISS-ID:HV20_MOUSE, this field will contain the value P01789. This is the primary field by which we will access the data about this LSID.

Table 3. Table byac
Field (column name) Description
ac A Swiss-Prot accession number (up to 40 characters). Secondary accession numbers (such as P01234) can be here and will correspond to object IDs in the swiss-ac namespace.
rootac The primary Swiss-Prot accession number for the corresponding accession number. For the LSID urn:lsid:example.com:SWISS-AC:P01234, this field will contain the value P08751. This is the primary field by which we will access the data about this LSID.

Table 4. Table acdata
Field (column name) Description
rootac A primary Swiss-Prot accession number used to identify a Swiss-Prot record.
data A binary data object containing the Swiss-Prot record corresponding to rootac in Swiss-Prot format. This is the actual data that will be returned for the LSID in question.

Loading the data into MySQL
Before we import the data into the newly created tables, we must extract it from the flat file sprot40.dat. You can use the Perl script extract.pl from the samples directory downloaded earlier to do that. The commands are:


cd ~/lsid
perl extract.pl sprot40.dat byid.txt byac.txt acdata.txt

This will certainly take some time as the data set is fairly large. You must also have sufficient disk space to hold the data throughout the import process. Once extract.pl has finished its job, you can start the MySQL client as user "lsiduser" by typing mysql -u lsiduser -pnone and enter the following commands:


use sprot4;
load data local
  infile 'byid.txt' into table byid;
load data local
  infile 'byac.txt' into table byac;
load data local
  infile 'acdata.txt' into table acdata;

This process will also take some time. Once you are done, you can delete the files byid.txt, byac.txt, and acdata.txt. If you want to save yourself the typing, get mysql.batch2 from the samples directory and run mysql -f -u lsiduser -pnone < mysql.batch2.

The Java code
We can now take a look at a less trivial LSID authority. We will provide an authority service for resolving LSIDs, a data service for both Swiss-Prot and FASTA formatted data records, as well as a metadata service with support for a limited amount of metadata. Before we visit the core authority code, let's take a cursory look at the support routines in SampleLSIDDataLookup.java, located in the samples archive.

Listing 5. Looking up data with Java

package com.ibm.lsid.samples;

import java.io.InputStream;
import java.io.IOException;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

import com.ibm.lsid.LSID;
import com.ibm.lsid.MalformedLSIDException;

import com.ibm.lsid.server.LSIDServerException;

public class SampleLSIDDataLookup {

  ...

  public SampleLSIDDataLookup() throws LSIDServerException {
    ...
  }

  public int lsidType(LSID lsid) throws LSIDServerException {
    ...
  }

  public InputStream lsidData(LSID lsid) throws LSIDServerException {
    ...
  }
}

The implementation of lsidType and lsidData is inconsequential; what they basically do is return the type of an LSID (UNKNOWN, ABSTRACT, or CONCRETE) and the data associated with it as an InputStream object. Appropriate LSIDServerException exceptions are thrown if an error is detected.

The core authority functionality is implemented by the class SampleLSIDAuthorityMain:

Listing 6. The SampleLSIDAuthorityMain class

package com.ibm.lsid.samples;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

import com.ibm.lsid.LSID;
import com.ibm.lsid.LSIDException;
import com.ibm.lsid.ExpiringResponse;
import com.ibm.lsid.wsdl.LSIDDataPort;
import com.ibm.lsid.wsdl.LSIDMetadataPort;
import com.ibm.lsid.wsdl.LSIDWSDLWrapper;

import com.ibm.lsid.server.LSIDServerException;
import com.ibm.lsid.server.LSIDServiceConfig;

import com.ibm.lsid.wsdl.HTTPLocation;
import com.ibm.lsid.wsdl.SOAPLocation;
import com.ibm.lsid.server.impl.SimpleAuthority;

public class SampleLSIDAuthorityMain extends SimpleAuthority {

	private SampleLSIDDataLookup lookup = null;

	public void initService(LSIDServiceConfig cf) throws LSIDServerException {
		lookup = new SampleLSIDDataLookup();
	}

	public LSIDMetadataPort[] getMetaDataLocations(LSID lsid, String url) {
		if (lookup == null)
			return null;

		int lsType;
		try {
			lsType = lookup.lsidType(lsid);
		}
		catch (LSIDServerException ex) {
			ex.printStackTrace();
			lsType = SampleLSIDDataLookup.UNKNOWN;
		}
		if (lsType == SampleLSIDDataLookup.UNKNOWN)
			return null;

		HostDescriptor hd = new HostDescriptor(url);
		return new LSIDMetadataPort[] {
			new SOAPLocation(
				hd.baseURL + "metadata"
			)
		};
	}

	public LSIDDataPort[] getDataLocations(LSID lsid, String url) {
		if (lookup == null)
			return null;

		int lsType;
		try {
			lsType = lookup.lsidType(lsid);
		}
		catch (LSIDServerException ex) {
			ex.printStackTrace();
			lsType = SampleLSIDDataLookup.UNKNOWN;
		}
		if (lsType == SampleLSIDDataLookup.UNKNOWN)
			return null;
		if (lsType == SampleLSIDDataLookup.ABSTRACT)
			return new LSIDDataPort[0];

		HostDescriptor hd = new HostDescriptor(url);
		return new LSIDDataPort[] {
			new SOAPLocation(
				hd.baseURL + "data"
			),
			new HTTPLocation(
				hd.host, hd.port,
				hd.pathPrefix + "/authority/data"
			)
		};
	}

	private static final Pattern HOST_PTN = Pattern.compile(
		"https?://([^/:]+)(?::(\\d+))?(.*)/authority(.*)"
	);

	/* Q&D implementation */
	private class HostDescriptor {
		public String host;
		public int port;
		public String pathPrefix;
		public String baseURL;

		public HostDescriptor(String url) {
			host = "localhost";
			port = -1;
			pathPrefix = "";
			if (url != null || url.length() > 0) {
				Matcher m = HOST_PTN.matcher(url);
				if (m.lookingAt()) {
					host = m.group(1);
					if (m.group(2).length() > 0)
						port = Integer.parseInt(m.group(2));
					pathPrefix = m.group(3);
				}
			}
			if (port > 0)
				baseURL = "http://" + host + ":" + port +
					pathPrefix + "/authority/";
			else
				baseURL = "http://" + host + pathPrefix + "/authority/";
		}
	}
}

All we do in the initService function is prepare a SampleLSIDDataLookup object, which we will use to verify the existence of LSIDs we are asked to resolve.

The first crucial method is getMetaDataLocations. It will be called by the authority servlet when SOAP requests for the getAvailableServices service method are handled. After verifying the existence of the given LSID, we return an array containing a single location: the endpoint of our metadata service. The following line needs some elaboration:


new SOAPLocation(
  hd.baseURL + "metadata"
)

The SOAPLocation class is a concrete implementation of the LSIDMetadataPort interface specialized for SOAP endpoints. The argument to the constructor is the fully qualified URL of the metadata service being exposed, which we construct using methods in the private class HostDescriptor.

The getDataLocations method is very similar in appearance. Instead of specifying locations for metadata, it specifies locations where data associated with an LSID can be obtained. Both SOAPLocation and HTTPLocation are concrete implementations of the LSIDDataPort interface. SOAPLocation takes only a fully qualified URL as its argument, while HTTPLocation expects a host name, a data port, and a path to the data.

The next piece of our LSID resolution server is the data service. We implement the LSIDDataService interface and pass it as a parameter to the DataServlet servlet class provided by the LSID package.

Listing 7. The data service

package com.ibm.lsid.samples;

import java.io.InputStream;

import com.ibm.lsid.LSID;

import com.ibm.lsid.server.LSIDDataService;
import com.ibm.lsid.server.LSIDServerException;
import com.ibm.lsid.server.LSIDServiceConfig;

public class SampleLSIDAuthorityData implements LSIDDataService {

  private SampleLSIDDataLookup lookup = null;

  public InputStream getData(LSID lsid) throws LSIDServerException {
    if (lookup == null)
      throw new LSIDServerException(500, "Cannot query database");
    return lookup.lsidData(lsid);
  }

  public InputStream getDataByRange(LSIDRequestContext ctx, int start, int length)
    throws LSIDServerException {
    throw new LSIDServerException
            (LSIDServerException.METHOD_NOT_IMPLEMENTED,
            "getDataByRange not implemented");
    }

  public void initService(LSIDServiceConfig cf) throws LSIDServerException {
    lookup = new SampleLSIDDataLookup();
  }
}

The data service implementation is trivial since the heavy lifting is done in the supporting class SampleLSIDDataLookup. Perhaps now is the time to point out that FASTA formatted data is being generated on the fly from the Swiss-Prot records using the small utility class SwissToFastaConverter available in the samples package. The method getDataByRange was introduced in the latest version of the LSID specification. Most implementations will choose to not to implement this method, deferring chunking functionality to the underlying protocol.

To complete the package, we must provide an implementation of the getMetadata and initService. Nothing unusual happens at initialization time, and getMetadata must simply generate correct RDF description of the recognized LSIDs. The second argument to getMetadata is an array of metadata formats that the client understands. In this example, we ignore these formats. However, we do return the proper metadata format application/xml+rdf in the MetadataResponse.

Listing 8. Getting metadata

package com.ibm.lsid.samples;

import java.io.InputStream;
import java.io.ByteArrayInputStream;

import com.ibm.lsid.LSID;
import com.ibm.lsid.MetadataResponse;
import com.ibm.lsid.MalformedLSIDException;

import com.ibm.lsid.server.LSIDMetadataService;
import com.ibm.lsid.server.LSIDServerException;
import com.ibm.lsid.server.LSIDServiceConfig;
import com.ibm.lsid.server.LSIDRequestContext;

public class SampleLSIDAuthorityMetadata implements LSIDMetadataService {

	private SampleLSIDDataLookup lookup = null;

	public void initService(LSIDServiceConfig cf) throws LSIDServerException {
		lookup = new SampleLSIDDataLookup();
	}

	private static final String RDF_NS=
		"http://www.w3.org/1999/02/22-rdf-syntax-ns#";
	private static final String DC_NS=
		"http://purl.org/dc/elements/1.1/";
	private static final String I3CP_NS=
		"urn:lsid:i3c.org:predicates:";
	private static final String I3C_CONTENT=
		"urn:lsid:i3c.org:types:content";
	private static final String I3C_SPROT=
		"urn:lsid:i3c.org:formats:sprot";
	private static final String I3C_FASTA=
		"urn:lsid:i3c.org:formats:fasta";

	private void appendTripleResource(
		StringBuffer src,
		String subj, String pred, String obj
	) {
		src.append("<rdf:Description rdf:about=\"");
		src.append(subj);
		src.append("<");
		src.append(pred);
		src.append(" rdf:resource=\")");
		src.append(obj);
		src.append("\"/></rdf:Description>");
	}

	public MetadataResponse getMetadata(LSIDRequestContext ctx, String[] formats) 
	   throws LSIDServerException {
           // should check formats[] for RDF format, but will assume client can accept RDF
		LSID lsid = ctx.getLsid();
		int lsType;
		try {
			lsType = lookup.lsidType(lsid);
		}
		catch (LSIDServerException ex) {
			ex.printStackTrace();
			lsType = SampleLSIDDataLookup.UNKNOWN;
		}
		if (lsType == SampleLSIDDataLookup.UNKNOWN)
			throw new LSIDServerException(201, "Unknown LSID");

		StringBuffer result= new StringBuffer();
		result.append("<?xml version=\"1.0\"?><rdf:RDF");
		result.append(" xmlns:rdf=\"");
		result.append(RDF_NS);
		result.append("\" xmlns:dc=\"");
		result.append(DC_NS);
		result.append("\" xmlns:i3cp=\"");
		result.append(I3CP_NS);
		result.append(">");

		String baseLSID= lsid.toString();
		if (baseLSID.endsWith("-fasta") || baseLSID.endsWith("-sprot"))
			baseLSID.substring(0, baseLSID.length() - 6);

		if (lsType == SampleLSIDDataLookup.ABSTRACT) {
			appendTripleResource(
				result,
				baseLSID, "i3cp:storedas", baseLSID + "-fasta"
			);
			appendTripleResource(
				result,
				baseLSID, "i3cp:storedas", baseLSID + "-sprot"
			);
			appendTripleResource(
				result,
				baseLSID + "-fasta", "rdf:type", I3C_CONTENT
			);
			appendTripleResource(
				result,
				baseLSID + "-sprot", "rdf:type", I3C_CONTENT
			);
			appendTripleResource(
				result,
				baseLSID + "-fasta", "dc:format", I3C_FASTA
			);
			appendTripleResource(
				result,
				baseLSID + "-sprot", "dc:format", I3C_SPROT
			);
		}
		else {
			String format= I3C_SPROT;
			if (lsid.getObject().endsWith("-fasta")) {
				format= I3C_FASTA;
			}
			appendTripleResource(
				result,
				baseLSID, "i3cp:storedas", baseLSID + "-fasta"
			);
			appendTripleResource(
				result,
				lsid.toString(), "rdf:type", I3C_CONTENT
			);
			appendTripleResource(
				result,
				lsid.toString(), "dc:format", format
			);
		}
		result.append("</rdf:RDF>");

		return new MetadataResponse(
			new ByteArrayInputStream(
				result.toString().getBytes()
			),
			null,
			MetadataResponse.RDF_FORMAT
		);
	}
}

Running and testing the authority
To test the Swiss-Prot authority, copy the file swiss-prot.war to your Jakarta Tomcat webapps directory. Upon Tomcat startup, this file will expand to webapps/swiss-prot, and the authority service will be available at http://localhost:8080/swiss-prot/authority/. You can test the authority the same way you tested the Hello World LSID authority, by running TestClient:


java TestClient urn:lsid:ibm.com:swiss-id:hv20_mouse \
http://localhost:8080/swiss-prot/authority/

The expected output is:


Meta is available at:
(soap) http://localhost:8080/swiss-prot/authority/metadata
-- META DATA --
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/199....
----

If you decide to test with an LSID that has associated data, you can try the command:


java TestClient urn:lsid:ibm.com:swiss-id:hv20_mouse-fasta \
http://localhost:8080/swiss-prot/authority/

The expected output in this case is:


Data is available at:
(soap) http://localhost:8080/swiss-prot/authority/data
(http) http://localhost:8080/swiss-prot/authority/data?
urn:lsid:ibm.com:swiss-id:hv20_mouse-fasta
Meta is available at:
(soap) http://localhost:8080/swiss-prot/authority/metadata
-- DATA --
>HV20_MOUSE (P01789) Ig heavy chain V region M603.
EVKLVESGGGLVQPGGSLRLSCATSGFTFSDFYMEWVRQPPGKRLEWIAASRNKGNKYTTEYSASVKGRFIVSRDTSQ
SILYLQMNALRAEDTAIYYCARNYYGSTWYFDVWGAGTTVTVSS

----
-- METADATA --
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/......
----

Metadata
Metadata about an LSID can be described using RDF (see the RDF Primer, listed in Resources). The minimum useful information about an LSID is whether it has data associated with it, what format the data is in, or in the case of LSIDs with no data associated, where to go to get a specific rendition of the concept in a particular format.

About RDF
RDF documents are made up of simple statements that consist of three parts each: the subject, the predicate, and the object (value). "Spot chases rabbits" is the English translation of an RDF triple, where "Spot" is the subject, "chases" is the predicate, and "rabbit" is the object. RDF is simply a formal way of encoding such information. Metadata about a particular LSID consists of a collection of RDF statements. Predicates (also known as properties) themselves can be thought of as a particular kind of subject (resources). The RDF Schema Specification (see Resources) specifies how to describe the relationship between predicates using RDF statements. The subject and predicate are always named by a URI, and since an LSID is a URN, which is a kind of URI, LSIDs can be used as either RDF statements or predicates. The objects in RDF are either URIs (in which case you can use LSIDs) or so-called "literal" values that may or may not be typed.

An example
Suppose that the LSID urn:lsid:pets.org:cats:Tom names a cat. As such, this LSID represents the abstract concept of Tom the cat, not any particular and unchanging collection of bits describing him. However, there are things related to Tom, such as a picture of him as a kitten, that have concrete digital representations. Say that urn:lsid:pets.org:cats:Tom-photos:Nov-22-1998 represents a particular photo of Tom. We can "attach" this photo to urn:lsid:pets.org:cats:Tom using the RDF property urn:lsid:i3c.org:predicates:storedAs, like this:


<rdf:Description rdf:about="urn:lsid:pets.org:cats:tom">
    <i3c:storedas rdf:resource="urn:lsid:pets.org:cats:tom-photos:nov-22-1998"/>
</rdf:Description>

Note that URIs in RDF are treated as case sensitive, while LSIDs are case insensitive. To avoid any potential for error, you should always represent LSIDs in RDF using their canonical form: lower case. The peculiar-looking XML tag i3c:storedas is the name of the property. Assuming that the namespace prefix i3c stands for urn:lsid:i3c.org:predicates:, the fully-qualified property name is urn:lsid:i3c.org:predicates:storedas (the concatenation of the prefix and the tag name).

Since Tom's photo has data associated with it, we must describe that fact in our metadata. The class urn:lsid:i3c.org:types:content encompasses all things that have data associated with them, so Tom's photo belongs to that class. We describe this fact with an RDF statement:


<rdf:Description rdf:about="urn:lsid:pets.org:cats:tom-photos:nov-22-1998">
    <rdf:type rdf:resource="urn:lsid:i3c.org:types:content"/>
</rdf:Description>

The rdf:type property is used to denote class membership. The namespace prefix rdf is conventionally used to represent the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#, defined in the RDF specification.

Lastly, Tom's photo is stored in some format, like JPEG. We can use a part of the Dublin Core RDF vocabulary (see Resources) to denote that fact. The LSID urn:lsid:i3c.org:formats:jpg represents the concept of the JPEG data format. The RDF statement that describes all that is:


<rdf:Description rdf:about="urn:lsid:pets.org:cats:tom-photos:nov-22-1998">
    <dc:format rdf:resource="urn:lsid:i3c.org:formats:jpg"/>
</rdf:Description>

The dc:format property is used to describe the format of a resource. The namespace prefix dc is conventionally used to represent the namespace http://purl.org/dc/elements/1.1/.

It is common to display RDF documents as an interlinked graph. For our example, the schematic looks like this:

Figure 1. Graphical representation of relations
Graphical representation of relations

A note about metadata's persistence
Unlike the data associated with an LSID, metadata can expire. That means that the metadata about an LSID is the perfect location for storing transient information about the object in question. For example, a link to Tom the cat's home page could appear in the metadata about Tom and can be modified at any time.

Adding value to your metadata
Whenever possible, standard property names should be used to describe a particular collection of resources. Using standard vocabularies greatly enhances the potential for interoperability between providers and consumers of metadata. However, pre-existing properties sometimes do not adequately describe a given relationship. In that case, you can come up with a new property that serves your purpose. If an RDF Schema or a Web Ontology Language description of the property is provided, RDF Schema-enabled clients will still be able to understand the meaning of your metadata even if they have no specific knowledge of the predicates you use.

Location independence and the guarantee of immutability of LSIDs make them perfect candidates for database cross-references. If you know about a meaningful relation between two LSIDs, you should describe it in the metadata, regardless of who issued the LSIDs. Any client that is able to resolve LSIDs will be able to do so regardless of their origin.

Making the authority publicly available
To make an authority publicly available and conformant to the LSID Resolution Proposal, you need to provide a way for people to resolve LSIDs handled by your authority without knowing the exact location of your service beforehand. That is, clients of your authority should not need to edit their authorities, or do anything similar.

The first step in solving that task is to set up a DNS service record for your authority. To take the authority for pdb.org as an example, you should be able to determine the host name and port number where the LSID service resides. Enter the following command:


host -t srv _lsid._tcp.pdb.org

You are asking DNS for the lsid service record for pdb.org with TCP as the network protocol. The response should look like this:


_lsid._tcp.pdb.org SRV 1 0 8080 lsidauthority.pdb.org.

This tells us that the service for the pdb.org authority is running on the host with name lsidauthority.pdb.org and is waiting for connections on TCP port 8080. Unfortunately, this information is not sufficient to determine the endpoint for the pdb.org authority service. That is why the LSID Resolution Proposal mandates that the service is available on the host path /authority/. In the case of pdb.org, the fully qualified URL of the authority service should therefore be: http://lsidauthority.pdb.org:8080/authority/.

Setting up DNS
All that you -- or your system administrator -- must do is to add a service record for the machine that will run the authority. Suppose the machine is authority.company.net and that it will serve as the authority named company.net. Further suppose that the service will be on port 8080. The record that must be added should go into the master zone file for company.net's DNS server (perhaps a file named /var/named/company.net.zone on company.net):


_lsid._tcp      IN      SRV     1       0       8080    authority.company.net.

If the authority name is supposed to be authority.company.net rather than company.net, the record in company.net's zone file should look like this:


_lsid._tcp.authority    IN      SRV     1       0       8080    authority.company.net.

Conclusion
We hope that the step-by-step approach of this article -- and the extensive samples directory included as the zip file in Resources -- will get you up and running quickly. In the same spirit of saving you time and energy, we also include a copy of a memo you might e-mail to your DNS administrator to request that an SRV record be implemented:

To: <put DNS administrator name here>
cc: <put your department manager name here>
Subject: DNS record to allow resolution to my LSID Authority

Please could you add the following SRV record to the appropriate zone file:

_lsid._tcp IN SRV 1 0 <put port here> <put Authority host name here>.

If you are running BIND v4 or above, or an equivalent Domain Name Service,
then your system will support SRV records (RFC 2782).

The reason for this SRV record is to allow clients running the LSID protocol
(explained at http://www.i3c.org/wgr/ta/resources/lsid/docs/)
to resolve to an LSID Authority server I am running behind port
<put port number here> on <put Authority host name here>.

For more information about the LSID protocol, resolution model, and
Authority server, refer to http://www-124.ibm.com/developerworks/oss/lsid/.

Please let me know when this will start to become active.

Thank you.

Kind regards,

Resources

About the authors
Stefan Atev is an Extreme Blue 2001 alumn who has returned to IBM over several summers. He has worked on the SashXB Web services client and is presently involved with the Life Sciences Identifier project, implementing Life Sciences data stores and applying semantic Web technologies to improve the usability of existing Life Sciences databases. You can reach Stefan at satev@us.ibm.com.


Ben Szekely is an Extreme Blue 2000 alumn who has returned to IBM full time. Ben is the lead developer for LSID Java software and has been instrumental in developing the specification that is under review at the Object Management Group. You can reach Ben at bhszekel@us.ibm.com.



code142 KBe-mail it!

What do you think of this document?
Killer! (5) Good stuff (4) So-so; not bad (3) Needs work (2) Lame! (1)

Comments?



developerWorks > Open source projects | Linux | Java technology | Web services
developerWorks
  About IBM  |  Privacy  |  Terms of use  |  Contact