Class OfficeReader
This class reads and collects global information about an OOo document. This includes styles, forms, information about indexes and references etc.
-
Constructor Summary
ConstructorsConstructorDescriptionOfficeReader
(OfficeDocument oooDoc, boolean bAllParagraphsAreSoft) Constructor; read a document -
Method Summary
Modifier and TypeMethodDescriptionvoid
addFigureSequenceName
(String sName) Add a sequence name for figure captions.void
addTableSequenceName
(String sName) Add a sequence name for table captions.boolean
bookmarkInHeading
(String sName) Is this bookmark contained in a heading?boolean
bookmarkInList
(String sName) Is this bookmark contained in a list?fixRelativeLink
(String sLink) In OpenDocument package format ../ means "leave the package".int
getBookmarkHeadingLevel
(String sName) Get the level of the heading associated with this bookmarkint
getBookmarkListLevel
(String sName) Get the list level associated with a bookmark in a listgetBookmarkListStyle
(String sName) Get the list style name associated with a bookmark in a listgetCellStyle
(String sName) static int
getCharacterCount
(Node node) Counts the number of characters (text nodes) in this element excluding footnotes etc.getColumnStyle
(String sName) Get the content elementgetDrawingPageStyle
(String sName) getEmbeddedObject
(String sName) Get an embedded object in this office documentGet the very first image in this document, if anyReturns the first master page used in the document.getFontDeclaration
(String sName) Get a specific font declarationGet the collection of all font declarations.getForms()
Get the forms belonging to this document.getFrameStyle
(String sName) getHeadingStyle
(int nLevel) Returns the paragraph style associated with headings of a specific level.getListStyle
(String sName) Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than stylegetMasterPage
(String sName) static char
getNextChar
(Node node) Return the next character in logical ordergetPageLayout
(String sName) static Element
getParagraph
(Element node) Get the paragraph or heading containing a nodegetParStyle
(String sName) getPresentationStyle
(String sName) getRowStyle
(String sName) getSectionStyle
(String sName) getSequenceFromRef
(String sRefName) Get the sequence name associated with a reference namegetSequenceName
(Element par) Get the sequence name associated with a paragraphgetTableReader
(Element node) Read a table from a table:table nodegetTableStyle
(String sName) static String
getTextContent
(Node node) getTextStyle
(String sName) getTocReader
(Element onode) Returns a reader for a specific tocboolean
hasBookmarkRefTo
(String sName) Is there a reference to this bookmark?boolean
hasEndnoteRefTo
(String sId) Is there a reference to this endnote?boolean
hasFootnoteRefTo
(String sId) Is there a reference to this footnote id?boolean
Is there a link to this sequence anchor name?boolean
hasNoteRefTo
(String sId) Is there a reference to this note id?boolean
hasReferenceRefTo
(String sName) Is there a reference to this reference mark?boolean
hasSequenceRefTo
(String sId) Is there a reference to this sequence field?static boolean
isDrawElement
(Node node) Checks, if a node is an element in the draw namespaceboolean
isFigureSequenceName
(String sName) Does this sequence name belong to a lof?boolean
isIndexSourceStyle
(String sStyleName) Is this style used in some toc as an index source style?boolean
isInPackage
(String sUrl) Checks whether this url is internal to the packagestatic boolean
isNoteElement
(Node node) Checks, if a node is an element representing a note (footnote/endnote)static boolean
isNoTextPar
(Node node) Checks, if the only text content of this node is whitespace.boolean
Is this an OASIS OpenDocument or an OOo 1.0 document?boolean
Checks whether or not this document is in package formatboolean
Is this a presentation document?static boolean
isSingleParagraph
(Node node) Checks, if this node contains at most one element, and that this is a paragraph.boolean
Is this a spreadsheet document?static boolean
isTableElement
(Node node) Checks, if a node is an element in the table namespaceboolean
isTableSequenceName
(String sName) Does this sequence name belong to a lot?boolean
isText()
Is this an text document?static boolean
isTextElement
(Node node) Checks, if a node is an element in the text namespacestatic boolean
Checks, if this text is whitespacestatic boolean
isWhitespaceContent
(Node node) Checks, if the only text content of this node is whitespaceboolean
referenceMarkInHeading
(String sName) Is this reference mark contained in a heading?
-
Constructor Details
-
OfficeReader
Constructor; read a document
-
-
Method Details
-
isTextElement
Checks, if a node is an element in the text namespace- Parameters:
node
- the node to check- Returns:
- true if this is a text element
-
isTableElement
Checks, if a node is an element in the table namespace- Parameters:
node
- the node to check- Returns:
- true if this is a table element
-
isDrawElement
Checks, if a node is an element in the draw namespace- Parameters:
node
- the node to check- Returns:
- true if this is a draw element
-
isNoteElement
Checks, if a node is an element representing a note (footnote/endnote)- Parameters:
node
- the node to check- Returns:
- true if this is a note element
-
getParagraph
Get the paragraph or heading containing a node- Parameters:
node
- the node in question- Returns:
- the paragraph or heading
-
isSingleParagraph
Checks, if this node contains at most one element, and that this is a paragraph.- Parameters:
node
- the node to check- Returns:
- true if the node contains a single paragraph or nothing
-
isNoTextPar
Checks, if the only text content of this node is whitespace. Other (draw) content is allowed.- Parameters:
node
- the node to check (should be a paragraph node or a child of a paragraph node)- Returns:
- true if the node contains whitespace only
-
isWhitespaceContent
Checks, if the only text content of this node is whitespace
- Parameters:
node
- the node to check (should be a paragraph node or a child of a paragraph node)- Returns:
- true if the node contains whitespace only
-
isWhitespace
Checks, if this text is whitespace
- Parameters:
s
- the String to check- Returns:
- true if the String contains whitespace only
-
getCharacterCount
Counts the number of characters (text nodes) in this element excluding footnotes etc.- Parameters:
node
- the node to count in- Returns:
- the number of characters
-
getTextContent
-
getNextChar
Return the next character in logical order -
isPackageFormat
public boolean isPackageFormat()Checks whether or not this document is in package format- Returns:
- true if it's in package format
-
isInPackage
Checks whether this url is internal to the package- Parameters:
sUrl
- the url to check- Returns:
- true if the url is internal to the package
-
fixRelativeLink
In OpenDocument package format ../ means "leave the package". Consequently this prefix must be removed to obtain a valid link- Parameters:
sLink
-- Returns:
- the corrected link
-
getEmbeddedObject
Get an embedded object in this office document -
getFontDeclarations
Get the collection of all font declarations.
- Returns:
- the
OfficeStyleFamily
of font declarations
-
getFontDeclaration
Get a specific font declaration
- Parameters:
sName
- the name of the font declaration- Returns:
- a
FontDeclaration
representing the font
-
getTextStyles
-
getTextStyle
-
getParStyles
-
getParStyle
-
getDefaultParStyle
-
getSectionStyles
-
getSectionStyle
-
getTableStyles
-
getTableStyle
-
getColumnStyles
-
getColumnStyle
-
getRowStyles
-
getRowStyle
-
getCellStyles
-
getCellStyle
-
getDefaultCellStyle
-
getFrameStyles
-
getFrameStyle
-
getDefaultFrameStyle
-
getPresentationStyles
-
getPresentationStyle
-
getDefaultPresentationStyle
-
getDrawingPageStyles
-
getDrawingPageStyle
-
getDefaultDrawingPageStyle
-
getListStyles
-
getListStyle
-
getPageLayouts
-
getPageLayout
-
getMasterPages
-
getMasterPage
-
getOutlineStyle
-
getFootnotesConfiguration
-
getEndnotesConfiguration
-
getHeadingStyle
Returns the paragraph style associated with headings of a specific level. Returns
null
if no such style is known.In principle, different styles can be used for each heading, in practice the same (soft) style is used for all headings of a specific level.
- Parameters:
nLevel
- the level of the heading- Returns:
- a
StyleWithProperties
object representing the style
-
getFirstMasterPage
Returns the first master page used in the document. If no master page is used explicitly, the first master page found in the styles is returned. Returns null if no master pages exists.
- Returns:
- a
MasterPage
object representing the master page
-
getMajorityLanguage
Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than style- Returns:
- the iso language
-
getTocReader
Returns a reader for a specific toc
- Parameters:
onode
- thetext:table-of-content-node
- Returns:
- the reader, or null
-
isIndexSourceStyle
Is this style used in some toc as an index source style?
- Parameters:
sStyleName
- the name of the style- Returns:
- true if this is an index source style
-
isFigureSequenceName
Does this sequence name belong to a lof?
- Parameters:
sName
- the name of the sequence- Returns:
- true if it belongs to an index
-
isTableSequenceName
Does this sequence name belong to a lot?
- Parameters:
sName
- the name of the sequence- Returns:
- true if it belongs to an index
-
addTableSequenceName
Add a sequence name for table captions.
OpenDocument has a very weak notion of table captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of tables. If there's no list of tables, captions cannot be identified. Thus this method lets the user add a sequence name to identify the table captions.
- Parameters:
sName
- the name to add
-
addFigureSequenceName
Add a sequence name for figure captions.
OpenDocument has a very weak notion of figure captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of figures. If there's no list of figures, captions cannot be identified. Thus this method lets the user add a sequence name to identify the figure captions.
- Parameters:
sName
- the name to add
-
getSequenceName
Get the sequence name associated with a paragraph
- Parameters:
par
- the paragraph to look up- Returns:
- the sequence name or null
-
getSequenceFromRef
Get the sequence name associated with a reference name
- Parameters:
sRefName
- the reference name to use- Returns:
- the sequence name or null
-
hasNoteRefTo
Is there a reference to this note id?
- Parameters:
sId
- the id of the note- Returns:
- true if there is a reference
-
hasFootnoteRefTo
Is there a reference to this footnote id?
- Parameters:
sId
- the id of the footnote- Returns:
- true if there is a reference
-
hasEndnoteRefTo
Is there a reference to this endnote?
- Parameters:
sId
- the id of the endnote- Returns:
- true if there is a reference
-
referenceMarkInHeading
Is this reference mark contained in a heading?- Parameters:
sName
- the name of the reference mark- Returns:
- true if so
-
hasReferenceRefTo
Is there a reference to this reference mark?- Parameters:
sName
- the name of the reference mark- Returns:
- true if there is a reference
-
bookmarkInHeading
Is this bookmark contained in a heading?- Parameters:
sName
- the name of the bookmark- Returns:
- true if so
-
getBookmarkHeadingLevel
Get the level of the heading associated with this bookmark- Parameters:
sName
- the name of the bookmark- Returns:
- the level or 0 if the bookmark does not exist
-
bookmarkInList
Is this bookmark contained in a list?- Parameters:
sName
- the name of the bookmark- Returns:
- true if so
-
getBookmarkListStyle
Get the list style name associated with a bookmark in a list- Parameters:
sName
- the name of the bookmark- Returns:
- the list style name or null if the bookmark does not exist or the list does not have a style name
-
getBookmarkListLevel
Get the list level associated with a bookmark in a list- Parameters:
sName
- the name of the bookmark- Returns:
- the level or 0 if the bookmark does not exist
-
hasBookmarkRefTo
Is there a reference to this bookmark?
- Parameters:
sName
- the name of the bookmark- Returns:
- true if there is a reference
-
hasSequenceRefTo
Is there a reference to this sequence field?
- Parameters:
sId
- the id of the sequence field- Returns:
- true if there is a reference
-
hasLinkTo
Is there a link to this sequence anchor name?
- Parameters:
sName
- the name of the anchor- Returns:
- true if there is a link
-
isOpenDocument
public boolean isOpenDocument()Is this an OASIS OpenDocument or an OOo 1.0 document?
- Returns:
- true if it's an OASIS OpenDocument
-
isText
public boolean isText()Is this an text document?
- Returns:
- true if it's a text document
-
isSpreadsheet
public boolean isSpreadsheet()Is this a spreadsheet document?
- Returns:
- true if it's a spreadsheet document
-
isPresentation
public boolean isPresentation()Is this a presentation document?
- Returns:
- true if it's a presentation document
-
getContent
Get the content element
In the old file format this means the
office:body
elementIn the OpenDocument format this means a
office:text
,office:spreadsheet
oroffice:presentation
element.- Returns:
- the content
Element
-
getForms
Get the forms belonging to this document.
- Returns:
- a
FormsReader
representing the forms
-
getTableReader
Read a table from a table:table node
- Parameters:
node
- the table:table Element node- Returns:
- a
TableReader
object representing the table
-
getFirstImage
Get the very first image in this document, if any- Returns:
- the first image, or null if no images exists
-