How to Extract Images from Word Document using Java

This quick tutorial will focus on the process to extract Images from Word document using Java and provides a sample code snippet to demonstrate the working and implementation of the Java image extractor from Word application. We will use one of the best document extraction library for extracting images from documents and complete this feature with a few simple API calls. Moreover, this guide does not rely on any third-party tool and can be followed on any operating system such as Windows, macOS, and Linux.

Steps to Extract Images from Word Document using Java

  1. Setup GroupDocs.Parser for Java from the Maven repository in the Java project to extract images from the Word document
  2. Import essential classes for developing the functionality for extracting images from Word file
  3. Create an instance of the Parser class for loading the input Word document
  4. Invoke getImages method of the Parser class and get a collection of the image objects
  5. Finally, iterate through the collection of image objects to get the size, type, and contents of the image

The preceding stepwise instructions help you to quickly implement the extract all images from Word document in Java functionality. The Parser class allows you to load the input Word file after setting up the required library from the Maven repository and including the necessary class. After that, you can get a collection of image objects by calling the getImages method of the Parser class and then iterate over the image collection for displaying the image data.

Code to Extract Images from Word Document using Java

import com.groupdocs.parser.Parser;
import com.groupdocs.parser.data.PageImageArea;
public class ExtractImagesFromWordDocumentUsingJava {
public static void main(String[] args) { // Main function to extract images from Word documents in Java
// Create an instance of Parser class
try (Parser parser = new Parser("sample.docx")) {
// Extract images
Iterable < PageImageArea > images = parser.getImages();
// Check if images extraction is supported
if (images == null) {
System.out.println("Images extraction isn't supported");
return;
}
// Iterate over images
for (PageImageArea image: images) {
// Print a page index, rectangle and image type:
System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), image.getRectangle(), image.getFileType()));
}
}
}
}

We have developed the application to show you how to get images from Word file using Java. This sample code is completed by writing a few lines of code and consuming a couple of API calls. You can further enhance this example as per your requirement and can also use this sample code for extracting images from other document formats such as PDF, HTML, XLSX, PPTX, EPUB, and many more.

We have discussed the detailed process of how to get image from Word document in Java and produced a sample code for it. Recently, we published an article on extracting images from PDF using Java, have a look at how to Extract Images from PDF in Java guide for more information.

 English