16.0.5
📅 2021-07-22
Improvements
High Quality OCR for Latin alphabet
The support of High Quality OCR is extended to most languages using the Latin alphabet.
This change is transparent to you and improves OCR accuracy, especially on challenging documents such as newspapers, magazines, poor quality scans or photos.
That concerns the following languages:
|
Mandatory files
The list of files to be redistributed with your application has been updated accordingly for the languages mentioned above. Go to Files required for your application for more details.
|
Improved PDF image loading
The quality of the image generated by the iDRS when rasterizing a PDF page is improved, resulting in a better visual appearance and improved OCR.
However, this improvement requires an extra processing when detecting black and white original content.
That is why a new enum CImageLoadOptionsPdf.eBlackAndWhiteDetectionMode is introduced.
Possible values are:
-
BLACK_AND_WHITE_DETECTION_DISABLED(C++) oreBlackAndWhiteDetectionMode.Disabled(.NET)
Black and white content is not detected, but loaded as greyscale only. This is the fastest mode. -
BLACK_AND_WHITE_DETECTION_FAST(C++) oreBlackAndWhiteDetectionMode.Fast(.NET)
The PDF page is inspected and loaded as black and white only if containing such images. In any other cases, the page is loaded as greyscale. -
BLACK_AND_WHITE_DETECTION_ACCURATE(C++) oreBlackAndWhiteDetectionMode.Accurate(.NET)
The PDF page raster is analyzed in detail to detect black and white content. As this may involve two rasterizations (with and without smoothing), this mode is the slowest but the most accurate to detect black and white pages. This is the default mode.
The black and white detection mode can be accessed or modified via the method CImageLoadOptionsPdf::Get/SetBlackAndWhiteDetectionMode() (C++) or the property CImageLoadOptionsPdf.BlackAndWhiteDetectionMode (.NET).
Deprecated/removed features
Output formats WordML and XPS
The output formats WordML and XPS are removed from the iDRS API, preventing you from creating such documents.
The XML Format of Microsoft Office Word 2003, or WordML, has been replaced since 2007 by the new Office Open XML formats (DOCX, XLSX, PPTX).
Microsoft XML Paper Specification, or XPS, is also deprecated because it has low business value compared to its immediate competitor, PDF.
Fixed bugs
| ID | Description |
|---|---|
IDRSRD-5666 |
The iDRS PDF loading may erase some parts of the text on the rasterized image |
IDRSRD-5747 |
The iDRS takes a long time to analyze a specific image |
IDRSRD-5925 |
The iDRS can create invalid pdf files when integrators specify custom fonts with postscript names containing spaces |
IDRSRD-5927 |
The iDRS can recognize diacritics without base characters, leading to PDF creation failure |
IDRSRD-5955 |
The High Quality OCR engine does not find all characters on a specific image |
IDRSRD-5958 |
The iDRS fails to create output PDF when OCR engine recognizes Arial Unicode symbols |
IDRSRD-5970 |
The iDRS should allow creating an image with dimensions larger than OCR limitations |
IDRSRD-5971 |
The page analysis is taking too much time processing this specific image |
IDRSRD-5977 |
The iDRS is not able to load a specific PDF |
IDRSRD-5980 |
The iDRS license installer does not check for the correct Visual Studio redistributable |
IDRSRD-5981 |
The docx created with Editable display do not indicate the expected document language when no text is selected |
IDRSRD-5983 |
Implementations of IFontProviderCallback provided by integrators via the .NET API are not called by the iDRS |
IDRSRD-5984 |
The iDRS does not set BaseLine property in CPageTextLine when loading content from a pdf file |
IDRSRD-5985 |
The iDRS may leak memory when the idrsbarcodeext engine encounters a timeout |
IDRSRD-5986 |
The iDRS cannot load a specific png image |
IDRSRD-5987 |
The iDRS does not include information about the pdf extension in the output pdf files |
IDRSRD-5989 |
The iDRS is generating a non compliant PDF/A-1b document |
IDRSRD-5991 |
When the iDRS updates an existing PDF with several signatures, all signatures have the same title, which is incorrect |
IDRSRD-5992 |
The iDRS does not properly load the text layer of a specific PDF document |
IDRSRD-5993 |
The iDRS can request font data with incorrect bold and italic properties when generating a PDF document |
IDRSRD-6004 |
The PDF loading with page content throws an exception when a PDF object has the coordinates out of the bound of the page |
IDRSRD-6007 |
The PDF loading with page content throws an exception when a text element is out of the bound of the page |
IDRSRD-6009 |
The iDRS is setting DropCapFont property for a paragraph when loading page content |
IDRSRD-6017 |
The iDRS cannot use the CPageResultsParser on a CPage without source image |