uk.ac.ebi.ep.mm.app
Class UniprotSaxParser

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by uk.ac.ebi.ep.mm.app.UniprotSaxParser
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler, MmParser

public class UniprotSaxParser
extends org.xml.sax.helpers.DefaultHandler
implements MmParser

UniProt XML parser which takes into account only primary accessions, entry names (IDs), organisms, EC numbers and PDB codes, and indexes/stores them in a mega-map.
Only enzymes - i.e. entries with an EC number assigned - are considered.

Author:
rafa

Field Summary
protected  List<String> accessions
           
protected  StringBuilder currentChars
          The text value of the current element being parsed.
protected  List<String> ecs
           
protected  List<String> entryNames
           
protected  boolean isAccession
           
protected  boolean isDbRef
           
protected  boolean isEntry
           
protected  boolean isEntryName
           
protected  boolean isOrgComName
           
protected  boolean isOrgSciName
           
protected  boolean isProperty
           
protected  boolean isProtRecName
           
protected  String orgComName
           
protected  String orgSciName
           
protected  List<String> pdbCodes
           
protected  String protRecName
           
 
Constructor Summary
UniprotSaxParser()
           
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 void endDocument()
           
 void endElement(String uri, String localName, String qName)
          Stores interesting data into the index.
protected  String getCurrentXpath()
           
static void main(String... args)
          Parses a UniProt XML file and indexes/stores the UniProt accessions, IDs and organisms into a mega-map.
 void parse(String uniprotXml)
          Parses a UniProt XML file and indexes/stores the UniProt accessions, IDs and organisms into a lucene index.
This method is not thread safe.
 void setWriter(MegaMapper mmWriter)
          Sets a writer to make the mega-map persistent.
 void startDocument()
           
 void startElement(String uri, String localName, String qName, Attributes attributes)
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

currentChars

protected StringBuilder currentChars
The text value of the current element being parsed.


isEntry

protected boolean isEntry

isAccession

protected boolean isAccession

isEntryName

protected boolean isEntryName

isOrgSciName

protected boolean isOrgSciName

isOrgComName

protected boolean isOrgComName

isDbRef

protected boolean isDbRef

isProperty

protected boolean isProperty

isProtRecName

protected boolean isProtRecName

accessions

protected List<String> accessions

entryNames

protected List<String> entryNames

orgSciName

protected String orgSciName

orgComName

protected String orgComName

ecs

protected List<String> ecs

pdbCodes

protected List<String> pdbCodes

protRecName

protected String protRecName
Constructor Detail

UniprotSaxParser

public UniprotSaxParser()
Method Detail

main

public static void main(String... args)
                 throws Exception
Parses a UniProt XML file and indexes/stores the UniProt accessions, IDs and organisms into a mega-map.

Parameters:
args - see CliOptionsParser.getCommandLine(String...)
Throws:
Exception - in case of error while parsing.

setWriter

public void setWriter(MegaMapper mmWriter)
Description copied from interface: MmParser
Sets a writer to make the mega-map persistent.

Specified by:
setWriter in interface MmParser

parse

public void parse(String uniprotXml)
           throws Exception
Parses a UniProt XML file and indexes/stores the UniProt accessions, IDs and organisms into a lucene index.
This method is not thread safe.

Specified by:
parse in interface MmParser
Parameters:
uniprotXml - the XML file to parse
Throws:
FileNotFoundException - if the UniProt XML file is not found or not readable.
SAXException - if no default XMLReader can be found or instantiated, or exception during parsing.
IOException - if the lucene index cannot be opened/created, or from the parser.
Exception

startDocument

public void startDocument()
                   throws SAXException
Specified by:
startDocument in interface ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

endDocument

public void endDocument()
                 throws SAXException
Specified by:
endDocument in interface ContentHandler
Overrides:
endDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes attributes)
                  throws SAXException
Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
Specified by:
characters in interface ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

endElement

public void endElement(String uri,
                       String localName,
                       String qName)
                throws SAXException
Stores interesting data into the index.

Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

getCurrentXpath

protected String getCurrentXpath()


Copyright © 2012 EMBL-EBI. All Rights Reserved.