KishoR NaiK: C#.net - Extract image from PDF file.

Tuesday, January 11, 2011

C#.net - Extract image from PDF file.

In this article i will show you how to extract image from PDF file.

Step 1
First you need to download "ITextSharp.dll" from the following link.
http://sourceforge.net/projects/itextsharp/

Step 2
Create a Console application and give the solution name as ConExtractImagefromPDF.

Step 3
Add two assembly reference to the project from solution explorer.

1.ITextSharp.dll

2.System.Drawing.dll

Step 4

Write a static method for extracting image from pdf file,it is look like this

/// <summary>
        ///  Extract Image from PDF file and Store in Image Object
        /// </summary>
        /// <param name="PDFSourcePath">Specify PDF Source Path</param>
        /// <returns>List</returns>
        private static List<System.Drawing.Image> ExtractImages(String PDFSourcePath)
        {
            List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();

            iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
            iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
            iTextSharp.text.pdf.PdfObject PDFObj = null;
            iTextSharp.text.pdf.PdfStream PDFStremObj = null;

            try
            {
                RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
                PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

                for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
                {
                    PDFObj = PDFReaderObj.GetPdfObject(i);

                    if ((PDFObj != null) && PDFObj.IsStream())
                    {
                        PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                        iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                        if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                        {
                             try
                                {

                                    iTextSharp.text.pdf.parser.PdfImageObject PdfImageObj =
                             new iTextSharp.text.pdf.parser.PdfImageObject((iTextSharp.text.pdf.PRStream)PDFStremObj);
                                    
                                    System.Drawing.Image ImgPDF = PdfImageObj.GetDrawingImage();
                                   

                                    ImgList.Add(ImgPDF);

                                }
                                catch (Exception)
                                {
                                    
                                }
                        }
                    }
                }
                PDFReaderObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
            return ImgList;
        }

Step 5
Write a static method for store extracting image file in folder,it is look like this

 /// <summary>
        ///  Write Image File
        /// </summary>
        private static void WriteImageFile()
        {
            try
            {
                System.Console.WriteLine("Wait for extracting image from PDF file....");

                // Get a List of Image
                List<System.Drawing.Image> ListImage = ExtractImages(@"C:\Users\Kishor\Desktop\TuterPDF\ASP.net\ASP.NET 3.5 Unleashed.pdf");

                for (int i = 0; i < ListImage.Count; i++)
                {
                    try
                    {
                        // Write Image File
                        ListImage[i].Save(AppDomain.CurrentDomain.BaseDirectory + "ImageStore\\Image" + i + ".jpeg", System.Drawing.Imaging.ImageFormat.Jpeg);
                        System.Console.WriteLine("Image" + i + ".jpeg write sucessfully"); 
                    }
                    catch (Exception)
                    { }
                }

            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }

Step 6
Call above function in main method,it is look like this

static void Main(string[] args)
        {
            try
            {
                WriteImageFile(); // write image file
            }
            catch (Exception ex)
            {
                System.Console.WriteLine(ex.Message);  
            }
        }

Full Code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConExtractImagefromPDF
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                WriteImageFile(); // write image file
            }
            catch (Exception ex)
            {
                System.Console.WriteLine(ex.Message);  
            }
        }

        #region Methods

        /// <summary>
        ///  Extract Image from PDF file and Store in Image Object
        /// </summary>
        /// <param name="PDFSourcePath">Specify PDF Source Path</param>
        /// <returns>List</returns>
        private static List<System.Drawing.Image> ExtractImages(String PDFSourcePath)
        {
            List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();

            iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
            iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
            iTextSharp.text.pdf.PdfObject PDFObj = null;
            iTextSharp.text.pdf.PdfStream PDFStremObj = null;

            try
            {
                RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
                PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

                for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
                {
                    PDFObj = PDFReaderObj.GetPdfObject(i);

                    if ((PDFObj != null) && PDFObj.IsStream())
                    {
                        PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                        iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                        if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                        {
                            try
                                {

                                    iTextSharp.text.pdf.parser.PdfImageObject PdfImageObj =
                             new iTextSharp.text.pdf.parser.PdfImageObject((iTextSharp.text.pdf.PRStream)PDFStremObj);
                                    
                                    System.Drawing.Image ImgPDF = PdfImageObj.GetDrawingImage();
                                   

                                    ImgList.Add(ImgPDF);

                                }
                                catch (Exception)
                                {
                                    
                                }
                        }
                    }
                }
                PDFReaderObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
            return ImgList;
        }


        /// <summary>
        ///  Write Image File
        /// </summary>
        private static void WriteImageFile()
        {
            try
            {
                System.Console.WriteLine("Wait for extracting image from PDF file....");

                // Get a List of Image
                List<System.Drawing.Image> ListImage = ExtractImages(@"C:\Users\Kishor\Desktop\TuterPDF\ASP.net\ASP.NET 3.5 Unleashed.pdf");

                for (int i = 0; i < ListImage.Count; i++)
                {
                    try
                    {
                        // Write Image File
                        ListImage[i].Save(AppDomain.CurrentDomain.BaseDirectory + "ImageStore\\Image" + i + ".jpeg", System.Drawing.Imaging.ImageFormat.Jpeg);
                        System.Console.WriteLine("Image" + i + ".jpeg write sucessfully"); 
                    }
                    catch (Exception)
                    { }
                }

            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }
        #endregion
    }
}

Download
Download Source Code

71 comments:

AnonymousJanuary 17, 2011 at 11:43 PM
Nice article..Thanks for Sharing code..
you help me man..........
ReplyDelete
Replies
Kishor NaikJanuary 17, 2011 at 11:45 PM
Most Welcome Brother........
ReplyDelete
Replies
peterbunnyFebruary 17, 2011 at 2:02 AM
was really looking for it, tested it and it works beautifully.
but wanted instead to extract the files, convert each page of a PDF image, can anyone help?
ReplyDelete
Replies
Kishor NaikFebruary 22, 2011 at 2:23 AM
Peterbunny use this following link.
I hope this article will help you.

http://bytescout.com/products/developer/pdfextractorsdk/how-to-extract-images-from-pdf-page-by-page-in-c-%2523
ReplyDelete
Replies
HitarthMay 29, 2011 at 6:51 PM
kishor bhai ek number...........
ReplyDelete
Replies
AnonymousSeptember 7, 2011 at 3:50 AM
Hi great code but I have some pdf and i can't extract images because the Images objects are not jpeg or bitmap.Can you help me?
ReplyDelete
Replies
Kishor NaikSeptember 7, 2011 at 8:48 PM
Can you tell me about Image type(Extension)?????
ReplyDelete
Replies
AnonymousNovember 16, 2011 at 8:28 PM
Hi kishor..

Even im facing the same problem. I have a pdf having image which has filter type "CCITFaxDecode". So im not able to extract the image out of the pdf. Cud u please help me with this....

Thank you
Mugdha
ReplyDelete
Replies
AnonymousNovember 30, 2011 at 7:45 AM
Same problem for me...
ReplyDelete
Replies
AnonymousDecember 9, 2011 at 7:51 AM
thank you for sharing the code, I can't extract images I'm using .tif
ReplyDelete
Replies
AnonymousJanuary 27, 2012 at 11:23 PM
Fantastic!! I got the code up and running in ten minutes and it worked perfectly. Saved me hours of work. I did not use the class as static in a console app but simply pulled the two main functions to read and write and plopped them in my existing class and ran it. Worked first time. Thank you Kishor.
ReplyDelete
Replies
AnonymousFebruary 14, 2012 at 2:58 AM
HI I am getting error "parameter is invalid".
Please help
ReplyDelete
Replies
AnonymousMarch 7, 2012 at 8:04 PM
thanks for shared :)
ReplyDelete
Replies
AnonymousApril 3, 2012 at 11:13 AM
Thanks that is what I need. But for CCITT images the .net framework has no the support... however to have the stream with usefull data is the main first step.
ReplyDelete
Replies
AnonymousApril 4, 2012 at 10:16 PM
I am getting error
"A generic error occurred in GDI+."
Please help me.
ReplyDelete
Replies
AnonymousMay 3, 2012 at 1:26 AM
hi Kishor Naik

Can You help me please. i Developing a System which read text from PDF Image. I try to contact you, but i cannot find out any contact details. please dude help me. i stuck here. i try lot of ways. my email address is ganeshrasi.lk@gmail.com.
please replay me.
ReplyDelete
Replies
UnknownJune 27, 2012 at 2:16 PM
Hi Kishore

Do you have code for extracting text from pdf into a text file, using any other package except iTextSharp dll. as iTextsharp is getting it in left to right manner, which is spoiling the textual information. for example I am not getting the address in one string, its embedding the right side text into the address if it is on the left side. i want it should maintain the location information to get the data from the tables. it'll great if you could help me. thanks in advance.
if you have the code pls send it to my email id simyg17@gmail.com

Thanks,
Simy
ReplyDelete
Replies
AnonymousSeptember 21, 2012 at 12:41 AM
The code is not working. It shows the error as 'Parameter is not valid'. Pls help
ReplyDelete
Replies
AnonymousOctober 2, 2012 at 9:05 AM
I too get a parameter is not valid exception thrown when trying to run this code.

the invalid parameter is 'MS' in line

"Image ImgPDF = Image.FromStream(MS);"

Any advice on what the cause is would be greatly appreciated.
ReplyDelete
Replies
AnonymousOctober 14, 2012 at 9:34 AM
Hello Kishore,
I am able to run the program but the image count is showing zero. Does it means Images from PDF not being extracted or am I missing something.
P.S. My PDF file comes from scan pages.
ReplyDelete
Replies
ProvasNovember 18, 2012 at 8:06 PM
Thanks Kishor... Its really a nice work. But all images are saving as jpeg. Is it possible to save with its original extension then may be the quality of the images will be same as source images.

Thanking again.
ReplyDelete
Replies
jobinelvDecember 3, 2012 at 11:14 PM
i have a requirement where i need to get dimension and top left coordinate of image, is that posible with iText
ReplyDelete
Replies
AnonymousJune 26, 2013 at 2:56 AM
Sir do you know how to determine if the image extracted is grayscale or truecolor? Need help please. Thank you!
ReplyDelete
Replies
AnonymousJuly 18, 2013 at 9:15 AM
nice article, however often you need to do the opposite - get pdf converted to image. It's not possible to convert pdf to image with iText and I used Apitron PDF Rasterizer for .NET for this task
ReplyDelete
Replies
AnonymousAugust 7, 2013 at 3:31 AM
Great Thanks!
Alistair in England
ReplyDelete
Replies
UnknownJanuary 16, 2014 at 10:26 PM
i have one issue how do i get one by one pdf page image
ReplyDelete
Replies
AnonymousFebruary 9, 2014 at 7:59 PM
How can i extract image details from a image and store in database?
ReplyDelete
Replies
DomFilkOctober 8, 2015 at 1:11 AM
I'm not a developer, i always use this free online service to extract image from pdf online
ReplyDelete
Replies
AnonymousOctober 17, 2015 at 11:24 AM
hi, kishor, can u plz help me to extract image from specific coordinates point of pdf file using c# and store the image in database..... i need this as soon as possible.......... plz send me the code on this E-Mail ID - pioneer.shanky@gmail.com
ReplyDelete
Replies
Cell TechsNovember 14, 2015 at 12:13 AM
Hello everyone.
Can anyone help me in saving the images with their original names.

Thank you
ReplyDelete
Replies
AnonymousDecember 24, 2015 at 8:18 AM
hi, kishor, can u plz tell me how to find coordinates of extracted image
ReplyDelete
Replies
UnknownMay 11, 2016 at 9:33 PM
Here is the link for you to c# .net extract text from pdf. Hope this gives you a start on rasteredge page ttp://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-text/
ReplyDelete
Replies
Mantu SwainJune 16, 2018 at 9:36 PM
its vb.net version, if you convert it then please reply
ReplyDelete
Replies
UnknownJuly 13, 2018 at 10:44 PM
I am looking pdf to image in asp.net and c#
output looks with good quality.
i tried one dll its coming blur image
ReplyDelete
Replies
AnonymousAugust 23, 2018 at 6:15 AM
Hi Kishor,

Could you please re-upload the solution as it is not available in the download link or can you please send your code to my mail: sajuu.cs@gmail.com

Thanks!
ReplyDelete
Replies
AdminOctober 11, 2018 at 2:44 PM
ZetPDF is also a useful link o generate PDF files on C#
ReplyDelete
Replies
Jim GreenDecember 24, 2018 at 12:40 AM
I also have a nice SDK for extract image from PDF for your reference. The code is easy to understand.
ReplyDelete
Replies
BlurMarch 18, 2020 at 12:03 AM
amazing job i like it very much
ReplyDelete
Replies
premiumappapkMay 19, 2020 at 11:09 PM
amazing job i like it online video downloader

ReplyDelete
Replies
apps apkSeptember 29, 2020 at 9:04 AM
I am very thankful to you for sharing this.
a video downloader app

ReplyDelete
Replies
apps apkOctober 2, 2020 at 5:44 AM
I would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well. pip camera photo editor

ReplyDelete
Replies
premiumappapkOctober 2, 2020 at 9:20 AM
I would like to thank you for the efforts you have made in writng in this article. File Manger

ReplyDelete
Replies
Kids Coloring PagesNovember 8, 2020 at 5:21 AM

I would like to thank you for the efforts you have made in writng in this article.

pip photo editor
ReplyDelete
Replies
erection pillsNovember 25, 2020 at 12:50 AM
I think the admin of this site is actually working hard in support of his site, as here every information is quality based data.
ReplyDelete
Replies
Danny DanialsNovember 29, 2020 at 1:05 AM
I have been using this app Ninja warrior Mod Apk : and downloaded and gambling it frequently.
ReplyDelete
Replies
apps apkDecember 12, 2020 at 12:42 AM

Photo editor collage is an online photo editor that to make collage maker cover crepe has many features.
Free Collage Maker Online
ReplyDelete
Replies
apps apkJanuary 8, 2021 at 2:05 AM

Video Editor is the smart catchable and downloadable tool. Which can be easily founded on the Play Store and Apple Store also.
video editor
ReplyDelete
Replies
apps apkJanuary 14, 2021 at 6:20 AM

If you are looking for any of these things, the best Route finder is right in your hand and is named as:GPS Navigation –Voice search and Route Finder
voice search

ReplyDelete
Replies
Rohit RanaAugust 31, 2021 at 10:04 AM
Good Article, thanks for this Useful article PDF Files Phone Number and Email Extractor
ReplyDelete
Replies
pdfguideJanuary 4, 2022 at 2:58 AM
I Always prefer to read The Quality and glad I found this thing in you post. Thanks Extract images from pdf
ReplyDelete
Replies
Hüdayi5October 8, 2023 at 11:03 PM
whatsapp görüntülü show
ücretli.show
ACS046
ReplyDelete
Replies
BinaryPioneer42October 9, 2023 at 4:03 AM
görüntülü.show
whatsapp ücretli show
JY5
ReplyDelete
Replies
AnonymousNovember 28, 2024 at 3:43 AM
شركة عزل اسطح بالجبيل MC3jmuK42H
ReplyDelete
Replies
AnonymousNovember 30, 2024 at 12:49 AM
شركة تنظيف سجاد بخميس مشيط F11uzTUkkG
ReplyDelete
Replies

Add comment

KishoR NaiK

Pages

Tuesday, January 11, 2011

C#.net - Extract image from PDF file.

71 comments:

Followers

Visitors