Package com.ribs.pdf

Class PDFPageParser


  • public class PDFPageParser
    extends java.lang.Object
    This class parses the page marking operators for a specific page number (it gets the contents for that page from n RMPDFParser.) It uses the various factory objects for graphic object creation and a MarkupHandler to do the actual drawing.

    Currently unsupported: - Ignores hyperlinks (annotations) - Type 1 & Type 3 fonts - Transparency blend modes other than /Normal ...

    • Constructor Summary

      Constructors 
      Constructor Description
      PDFPageParser​(PDFFile aPdfFile, int aPageIndex)
      Creates a new page parser for a given PDF file and a page index.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void drawImage​(java.awt.Image im)
      Establishes an image transform and tells markup engine to draw the image
      void establishClip​(java.awt.geom.GeneralPath newclip, boolean intersect)
      Called when the clipping path changes.
      void executeForm​(PDFForm f)  
      void executePatternStream​(PDFTilingPattern pat)  
      static int getPDFHexString​(byte[] pageBytes, int start, int end, com.ribs.pdf.Range r)
      replace ascii hex in pageBytes with actual bytes.
      static byte[] getPDFHexString​(java.lang.String s)  
      int getTokens​(byte[] pageBytes, int offset, int end, java.util.List tokens)
      The lexer.
      void parse()
      Main entry point Runs the lexer on the pdf content and passes the list of tokens to the parser.
      void parse​(java.util.List tokenList, byte[] pageBytes)
      The meat and potatoes of the pdf parser.
      int parseInlineImage​(int tIndex, byte[] pageBytes)
      Converts the tokens & data inside a BI/EI block into an image and draws it.
      boolean parseTextOperator​(byte oper, int tindex, int numops, PDFGState gs, byte[] pageBytes)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • PDFPageParser

        public PDFPageParser​(PDFFile aPdfFile,
                             int aPageIndex)
        Creates a new page parser for a given PDF file and a page index.
    • Method Detail

      • getTokens

        public int getTokens​(byte[] pageBytes,
                             int offset,
                             int end,
                             java.util.List tokens)
        The lexer. Fills the tokens list from the page contents. Returns the index of the character after the last succesfully consumed character.
      • getPDFHexString

        public static byte[] getPDFHexString​(java.lang.String s)
      • getPDFHexString

        public static int getPDFHexString​(byte[] pageBytes,
                                          int start,
                                          int end,
                                          com.ribs.pdf.Range r)
        replace ascii hex in pageBytes with actual bytes. start points to first char after the '<', end is the upper limit to seek through. r gets filled with the actual ranbge of the converted bytes return value is index of last character swallowed. See comment for getPDFString... about destructive behavior.
      • parse

        public void parse()
        Main entry point Runs the lexer on the pdf content and passes the list of tokens to the parser. By separating out a routine that operates on the list of tokens, we can implement Forms & patterns by recursively calling the parse routine with a subset of the token list.
      • parse

        public void parse​(java.util.List tokenList,
                          byte[] pageBytes)
        The meat and potatoes of the pdf parser. Translates the token list into a series of calls to either a Factory class, which creates a Java2D object (like GeneralPath, Font, Image, GlyphVector, etc.), or the markup handler, which does the actual drawing.
      • executeForm

        public void executeForm​(PDFForm f)
      • executePatternStream

        public void executePatternStream​(PDFTilingPattern pat)
      • parseTextOperator

        public boolean parseTextOperator​(byte oper,
                                         int tindex,
                                         int numops,
                                         PDFGState gs,
                                         byte[] pageBytes)
      • parseInlineImage

        public int parseInlineImage​(int tIndex,
                                    byte[] pageBytes)
        Converts the tokens & data inside a BI/EI block into an image and draws it. Returns the index of the last token consumed.
      • drawImage

        public void drawImage​(java.awt.Image im)
        Establishes an image transform and tells markup engine to draw the image
      • establishClip

        public void establishClip​(java.awt.geom.GeneralPath newclip,
                                  boolean intersect)
        Called when the clipping path changes. The clip in the gstate is defined to be in page space. Whenever the clip is changed, we calculate the new clip, which can be intersected with the old clip, and save it in the gstate.

        NB. This routine modifies the path that's passed in to it.