Package org.apache.poi.xwpf.extractor
Class XWPFWordExtractor
java.lang.Object
org.apache.poi.extractor.POITextExtractor
org.apache.poi.ooxml.extractor.POIXMLTextExtractor
org.apache.poi.xwpf.extractor.XWPFWordExtractor
- All Implemented Interfaces:
Closeable,AutoCloseable
Helper class to extract text from an OOXML Word file
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionXWPFWordExtractor(OPCPackage container) XWPFWordExtractor(XWPFDocument document) -
Method Summary
Modifier and TypeMethodDescriptionvoidvoidappendParagraphText(StringBuilder text, XWPFParagraph paragraph) getText()Retrieves all the text from the document.static voidvoidsetConcatenatePhoneticRuns(boolean concatenatePhoneticRuns) Should we concatenate phonetic runs in extraction.voidsetFetchHyperlinks(boolean fetch) Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contentsMethods inherited from class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
checkMaxTextSize, close, getCoreProperties, getCustomProperties, getDocument, getExtendedProperties, getMetadataTextExtractor, getPackageMethods inherited from class org.apache.poi.extractor.POITextExtractor
setFilesystem
-
Field Details
-
SUPPORTED_TYPES
-
-
Constructor Details
-
XWPFWordExtractor
public XWPFWordExtractor(OPCPackage container) throws org.apache.xmlbeans.XmlException, OpenXML4JException, IOException - Throws:
org.apache.xmlbeans.XmlExceptionOpenXML4JExceptionIOException
-
XWPFWordExtractor
-
-
Method Details
-
main
- Throws:
Exception
-
setFetchHyperlinks
public void setFetchHyperlinks(boolean fetch) Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents -
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns) Should we concatenate phonetic runs in extraction. Default istrue- Parameters:
concatenatePhoneticRuns-
-
getText
Description copied from class:POITextExtractorRetrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Specified by:
getTextin classPOITextExtractor- Returns:
- All the text from the document
-
appendBodyElementText
-
appendParagraphText
-