Package org.apache.poi.hwpf.converter
Class WordToTextConverter
java.lang.Object
org.apache.poi.hwpf.converter.AbstractWordConverter
org.apache.poi.hwpf.converter.WordToTextConverter
-
Field Summary
Fields inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
UNICODECHAR_NO_BREAK_SPACE, UNICODECHAR_NONBREAKING_HYPHEN, UNICODECHAR_ZERO_WIDTH_SPACE -
Constructor Summary
ConstructorsConstructorDescriptionCreates new instance ofWordToTextConverter.WordToTextConverter(TextDocumentFacade textDocumentFacade) WordToTextConverter(Document document) Creates new instance ofWordToTextConverter. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidSpecial actions that need to be called after processing complete, like updating stylesheets or building document notes list.getText()static Stringstatic StringgetText(HWPFDocumentCore wordDocument) static StringgetText(DirectoryNode root) booleanstatic voidJava main() interface to interact withWordToTextConverterprotected voidoutputCharacters(Element block, CharacterRun characterRun, String text) protected voidprocessBookmarks(HWPFDocumentCore wordDocument, Element currentBlock, Range range, int currentTableLevel, List<Bookmark> rangeBookmarks) Wrap range into bookmark(s) and process it.protected voidprocessDocumentInformation(SummaryInformation summaryInformation) voidprocessDocumentPart(HWPFDocumentCore wordDocument, Range range) protected voidprocessDrawnObject(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, String path, Element block) protected voidprocessEndnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, Element block, Range endnoteTextRange) protected voidprocessFootnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, Element block, Range footnoteTextRange) protected voidprocessHyperlink(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String hyperlink) protected voidprocessImage(Element currentBlock, boolean inlined, Picture picture) protected voidprocessImage(Element currentBlock, boolean inlined, Picture picture, String url) protected voidprocessImageWithoutPicturesManager(Element currentBlock, boolean inlined, Picture picture) protected voidprocessLineBreak(Element block, CharacterRun characterRun) protected booleanprocessOle2(HWPFDocument wordDocument, Element block, Entry entry) protected voidprocessPageBreak(HWPFDocumentCore wordDocument, Element flow) protected voidprocessPageref(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String pageref) protected voidprocessParagraph(HWPFDocumentCore wordDocument, Element parentElement, int currentTableLevel, Paragraph paragraph, String bulletText) protected voidprocessSection(HWPFDocumentCore wordDocument, Section section, int s) protected voidprocessTable(HWPFDocumentCore wordDocument, Element flow, Table table) voidsetOutputSummaryInformation(boolean outputDocumentInformation) Methods inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
getCharacterRunTriplet, getFontReplacer, getNumberColumnsSpanned, getNumberRowsSpanned, getPicturesManager, processCharacters, processDeadField, processDocument, processDrawnObject, processDropDownList, processField, processNoteAnchor, processParagraphes, processSingleSection, processSymbol, setFontReplacer, setPicturesManager, tryDeadField
-
Constructor Details
-
WordToTextConverter
Creates new instance ofWordToTextConverter. Can be used for output severalHWPFDocuments into single text document.- Throws:
ParserConfigurationException- if an internalDocumentBuildercannot be created
-
WordToTextConverter
Creates new instance ofWordToTextConverter. Can be used for output severalHWPFDocuments into single text document.- Parameters:
document- XML DOM Document used as storage for text pieces
-
WordToTextConverter
-
-
Method Details
-
getText
- Throws:
Exception
-
getText
- Throws:
Exception
-
getText
- Throws:
Exception
-
main
Java main() interface to interact withWordToTextConverterUsage: WordToTextConverter infile outfile
Where infile is an input .doc file ( Word 95-2007) which will be rendered as plain text into outfile- Throws:
Exception
-
afterProcess
protected void afterProcess()Description copied from class:AbstractWordConverterSpecial actions that need to be called after processing complete, like updating stylesheets or building document notes list. Usually they are called once, but it's okay to call them several times.- Overrides:
afterProcessin classAbstractWordConverter
-
getDocument
- Specified by:
getDocumentin classAbstractWordConverter
-
getText
- Throws:
Exception
-
isOutputSummaryInformation
public boolean isOutputSummaryInformation() -
outputCharacters
- Specified by:
outputCharactersin classAbstractWordConverter
-
processBookmarks
protected void processBookmarks(HWPFDocumentCore wordDocument, Element currentBlock, Range range, int currentTableLevel, List<Bookmark> rangeBookmarks) Description copied from class:AbstractWordConverterWrap range into bookmark(s) and process it. All bookmarks have starts equal to range start and ends equal to range end. Usually it's only one bookmark.- Specified by:
processBookmarksin classAbstractWordConverter
-
processDocumentInformation
- Specified by:
processDocumentInformationin classAbstractWordConverter
-
processDocumentPart
- Overrides:
processDocumentPartin classAbstractWordConverter
-
processDrawnObject
protected void processDrawnObject(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, String path, Element block) - Specified by:
processDrawnObjectin classAbstractWordConverter
-
processEndnoteAutonumbered
protected void processEndnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, Element block, Range endnoteTextRange) - Specified by:
processEndnoteAutonumberedin classAbstractWordConverter
-
processFootnoteAutonumbered
protected void processFootnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, Element block, Range footnoteTextRange) - Specified by:
processFootnoteAutonumberedin classAbstractWordConverter
-
processHyperlink
protected void processHyperlink(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String hyperlink) - Specified by:
processHyperlinkin classAbstractWordConverter
-
processImage
- Overrides:
processImagein classAbstractWordConverter
-
processImage
- Specified by:
processImagein classAbstractWordConverter
-
processImageWithoutPicturesManager
protected void processImageWithoutPicturesManager(Element currentBlock, boolean inlined, Picture picture) - Specified by:
processImageWithoutPicturesManagerin classAbstractWordConverter
-
processLineBreak
- Specified by:
processLineBreakin classAbstractWordConverter
-
processOle2
protected boolean processOle2(HWPFDocument wordDocument, Element block, Entry entry) throws Exception - Overrides:
processOle2in classAbstractWordConverter- Throws:
Exception
-
processPageBreak
- Specified by:
processPageBreakin classAbstractWordConverter
-
processPageref
protected void processPageref(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String pageref) - Specified by:
processPagerefin classAbstractWordConverter
-
processParagraph
protected void processParagraph(HWPFDocumentCore wordDocument, Element parentElement, int currentTableLevel, Paragraph paragraph, String bulletText) - Specified by:
processParagraphin classAbstractWordConverter
-
processSection
- Specified by:
processSectionin classAbstractWordConverter
-
processTable
- Specified by:
processTablein classAbstractWordConverter
-
setOutputSummaryInformation
public void setOutputSummaryInformation(boolean outputDocumentInformation)
-