BERJAYA

The Vision API can detect and transcribe text from PDF, TIFF, and GIF files stored in Google Cloud Storage. This includes online text detection and annotation of 5 frames (gif) or pages (pdf or tiff) of your choosing for each file in a batch of files (application/pdf, image/tiff and image/gif).

Document text detection from PDF and TIFF is requested using the annotate function, which performs an online request and provides you an immediate JSON response.

Limitations

At most 5 pages will be annotated. Users can specify the specific 5 pages to be annotated.

Authentication

API keys are not supported for annotate requests. See Using a service account for instructions on authenticating with a service account.

Currently supported feature types

All feature types
`FACE_DETECTION`	Run face detection.
`LANDMARK_DETECTION`	Run landmark detection.
`LOGO_DETECTION`	Run logo detection.
`LABEL_DETECTION`	Run label detection.
`TEXT_DETECTION`	Run text detection / optical character recognition (OCR). Text detection is optimized for areas of text within a larger image; if the image is a document, use `DOCUMENT_TEXT_DETECTION` instead.
`DOCUMENT_TEXT_DETECTION`	Run dense text document OCR. Takes precedence when both `DOCUMENT_TEXT_DETECTION` and `TEXT_DETECTION` are present.
`SAFE_SEARCH_DETECTION`	Run Safe Search to detect potentially unsafe or undesirable content.
`IMAGE_PROPERTIES`	Compute a set of image properties, such as the image's dominant colors.
`CROP_HINTS`	Run crop hints.
`WEB_DETECTION`	Run web detection.
`OBJECT_LOCALIZATION`	Run localizer for object detection.

Sample code

You can either send an annotation request with a locally stored file, or use a file that is stored on Google Cloud Storage.

Using a locally stored file

Use the following code samples to get any feature annotation for a locally stored file.

Command-line

To perform online PDF/TIFF/GIF document text detection for a small batch of files, make a POST request and provide the appropriate request body:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://vision.googleapis.com/v1/files:annotate" -d "{
  'requests': [
    {
      'inputConfig': {
        'content': 'JVBERi0xLjUNCiW1tbW1...base64-encoded-file...ydHhyZWYNCjk5NzM2OQ0KJSVFT0Y=',
        'mimeType': 'application/pdf'
      },
      'features': [
        {
          'type': 'DOCUMENT_TEXT_DETECTION'
        }
      ],
      'pages': [
        2
      ]
    }
  ]
}"

Where:

inputConfig replaces the image field used in other Vision API requests. It contains two child fields:
- content - The file content (PDF, TIFF, or GIF), represented as a stream of bytes.
- mimeType - One of the following: "application/pdf", "image/tiff" or "image/gif".
The pages field specifies the specific pages of the file to perform text detection.

Response

A successful annotate request immediately returns a JSON response. The returned JSON response is similar to that of an image's document text detection request, with bounding boxes for blocks broken down by paragraphs, words, and individual symbols, as well as the full text detected. The response also contain a context field showing the location of the PDF or TIFF that was specified and the result's page number in the file.

Full response

{
  "responses": [
    {
      "responses": [
        {
          "fullTextAnnotation": {
            "pages": [
              {
                "property": {
                  "detectedLanguages": [
                    {
                      "languageCode": "en",
                      "confidence": 0.99
                    },
                    {
                      "languageCode": "pl",
                      "confidence": 0.01
                    }
                  ]
                },
                "width": 1342,
                "height": 2234,
                "blocks": [
                  {
                    "boundingBox": {
                      "vertices": [
                      ...
                      ]
                    },
                    "paragraphs": [
                      {
                        "boundingBox": {
                          "vertices": [
                          ...
                          ]
                        },
                        "words": [
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ]
                            },
                            "boundingBox": {
                              "vertices": [
                              ...
                              ]
                            },
                            "symbols": [
                              {
                                "property": {
                                  "detectedLanguages": [
                                    {
                                      "languageCode": "en"
                                    }
                                  ],
                                  "detectedBreak": {
                                    "type": "SPACE"
                                  }
                                },
                                "boundingBox": {
                                  "vertices": [
                                ...
                                  ]
                                },
                                "text": "#",
                                "confidence": 0.07
                              }
                            ],
                            "confidence": 0.07
                          },
                          ...
                    ],
                    "blockType": "TEXT",
                    "confidence": 0.88
                  },
                  ...
            ...
            "text": "# THE STATE OF TEXAS\n0\nOIL, GAS AND MINERAL LEASE\n
            COUNTY OF MAVERICK\nTHIS AGREEMENT made this 14 day of_June\n1954,
            between Norvel J. Chittim and his wife, Lieschen G. Chittim;\nMary
            Anne Chittim Parker, joined herein pro forma by her husband,\nJoseph
            Bright Parker; Dorothea Chittim Oppenheimer, joined herein\nji pro
            forma by her husband, Fred J. Oppenheimer; Tuleta Chittim\nWright,
            joined herein pro forma by her husband, Gilbert G. Wright,\nJr.;
            Gilbert G. Wright, III; Delă Wright White, joined herein pro\nforma
            by her husband, John H. White; Anne Wright Basse, joined\nherein
            pro forma by her husband, E. A. Basse, Jr.; Norvel J.\nChittim,
            Independent Executor and Trustee for Estate of Marstella\nChittim,
            Deceased; Mary Louise Roswell, joined herein pro forma by\nher
            husband, Charles M. 'Roswell; and James M. Chittim and his wife\n
            Thelma Neal Chittim; as LESSORS, and W. L. Scheig of San Antonio,\n
            Texas, as LESSEE,\n10\nW ITNESS ETH:\nLessors, in consideration of
            $10.00, cash in hand paid, i\nof the royalties herein provided,
            and of the agreement of Lessee\nherein contained, hereby grant,
            lease and let exclusively unto\nLessee the tracts of land
            hereinafter described for the purpose of\ntesting for mineral
            indications, and in such tests use the Seismo-\ngraph, Torsion
            Balance, Core Drill, or any other tools, machinery,\nequipment
            or explosive necessary and proper; and also prospecting,\ndrilling
            and mining for and producing oil, gas and other minerals i\n
            (except metallic minerals), laying pipe lines, building tanks,\n
            power stations, telephone lines and other structures thereon to\n
            produce, save, take care of, treat, transport and own said pro-\n
            ducts and housing its employees (Lessee to conduct its geophysical\n
            work in such manner as not to damage the buildings, water tanks\n
            or wells of Lessors, or the livestock of Lessors or Lessors' ten-\n
            ants, ) said lands being situated in Maverick, Zavalla and Dimmit\n
            Counties, Texas, to-wit:\n3 -1.\n"
          },
          "context": {
            "pageNumber": 2
          }
        }
      ]
    }
  ]
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Java API reference documentation .

View on GitHub Feedback

/*
 * Please include the following imports to run this sample.
 *
 * import com.google.cloud.vision.v1.AnnotateFileRequest;
 * import com.google.cloud.vision.v1.AnnotateImageResponse;
 * import com.google.cloud.vision.v1.BatchAnnotateFilesRequest;
 * import com.google.cloud.vision.v1.BatchAnnotateFilesResponse;
 * import com.google.cloud.vision.v1.Block;
 * import com.google.cloud.vision.v1.Feature;
 * import com.google.cloud.vision.v1.ImageAnnotatorClient;
 * import com.google.cloud.vision.v1.InputConfig;
 * import com.google.cloud.vision.v1.Page;
 * import com.google.cloud.vision.v1.Paragraph;
 * import com.google.cloud.vision.v1.Symbol;
 * import com.google.cloud.vision.v1.Word;
 * import com.google.protobuf.ByteString;
 * import java.nio.file.Files;
 * import java.nio.file.Path;
 * import java.nio.file.Paths;
 * import java.util.Arrays;
 * import java.util.List;
 */

/**
 * Perform batch file annotation
 *
 * @param filePath Path to local pdf file, e.g. /path/document.pdf
 */
public static void sampleBatchAnnotateFiles(String filePath) {
  try (ImageAnnotatorClient imageAnnotatorClient = ImageAnnotatorClient.create()) {
    // filePath = "resources/kafka.pdf";

    // Supported mime_type: application/pdf, image/tiff, image/gif
    String mimeType = "application/pdf";
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);
    ByteString content = ByteString.copyFrom(data);
    InputConfig inputConfig =
        InputConfig.newBuilder().setMimeType(mimeType).setContent(content).build();
    Feature.Type type = Feature.Type.DOCUMENT_TEXT_DETECTION;
    Feature featuresElement = Feature.newBuilder().setType(type).build();
    List<Feature> features = Arrays.asList(featuresElement);

    // The service can process up to 5 pages per document file. Here we specify the first, second,
    // and
    // last page of the document to be processed.
    int pagesElement = 1;
    int pagesElement2 = 2;
    int pagesElement3 = -1;
    List<Integer> pages = Arrays.asList(pagesElement, pagesElement2, pagesElement3);
    AnnotateFileRequest requestsElement =
        AnnotateFileRequest.newBuilder()
            .setInputConfig(inputConfig)
            .addAllFeatures(features)
            .addAllPages(pages)
            .build();
    List<AnnotateFileRequest> requests = Arrays.asList(requestsElement);
    BatchAnnotateFilesRequest request =
        BatchAnnotateFilesRequest.newBuilder().addAllRequests(requests).build();
    BatchAnnotateFilesResponse response = imageAnnotatorClient.batchAnnotateFiles(request);
    for (AnnotateImageResponse imageResponse :
        response.getResponsesList().get(0).getResponsesList()) {
      System.out.printf("Full text: %s\n", imageResponse.getFullTextAnnotation().getText());
      for (Page page : imageResponse.getFullTextAnnotation().getPagesList()) {
        for (Block block : page.getBlocksList()) {
          System.out.printf("\nBlock confidence: %s\n", block.getConfidence());
          for (Paragraph par : block.getParagraphsList()) {
            System.out.printf("\tParagraph confidence: %s\n", par.getConfidence());
            for (Word word : par.getWordsList()) {
              System.out.printf("\t\tWord confidence: %s\n", word.getConfidence());
              for (Symbol symbol : word.getSymbolsList()) {
                System.out.printf(
                    "\t\t\tSymbol: %s, (confidence: %s)\n",
                    symbol.getText(), symbol.getConfidence());
              }
            }
          }
        }
      }
    }
  } catch (Exception exception) {
    System.err.println("Failed to create the client due to: " + exception);
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .

View on GitHub Feedback

const vision = require('@google-cloud/vision').v1;

const fs = require('fs');
/**
 * Perform batch file annotation
 *
 * @param filePath {string} Path to local pdf file, e.g. /path/document.pdf
 */
function sampleBatchAnnotateFiles(filePath) {
  const client = new vision.ImageAnnotatorClient();
  // const filePath = 'resources/kafka.pdf';

  // Supported mime_type: application/pdf, image/tiff, image/gif
  const mimeType = 'application/pdf';
  const content = fs.readFileSync(filePath).toString('base64');
  const inputConfig = {
    mimeType: mimeType,
    content: content,
  };
  const type = 'DOCUMENT_TEXT_DETECTION';
  const featuresElement = {
    type: type,
  };
  const features = [featuresElement];

  // The service can process up to 5 pages per document file. Here we specify the first, second, and
  // last page of the document to be processed.
  const pagesElement = 1;
  const pagesElement2 = 2;
  const pagesElement3 = -1;
  const pages = [pagesElement, pagesElement2, pagesElement3];
  const requestsElement = {
    inputConfig: inputConfig,
    features: features,
    pages: pages,
  };
  const requests = [requestsElement];
  client.batchAnnotateFiles({requests: requests})
    .then(responses => {
      const response = responses[0];
      for (const imageResponse of response.responses[0].responses) {
        console.log(`Full text: ${imageResponse.fullTextAnnotation.text}`);
        for (const page of imageResponse.fullTextAnnotation.pages) {
          for (const block of page.blocks) {
            console.log(`\nBlock confidence: ${block.confidence}`);
            for (const par of block.paragraphs) {
              console.log(`\tParagraph confidence: ${par.confidence}`);
              for (const word of par.words) {
                console.log(`\t\tWord confidence: ${word.confidence}`);
                for (const symbol of word.symbols) {
                  console.log(`\t\t\tSymbol: ${symbol.text}, (confidence: ${symbol.confidence})`);
                }
              }
            }
          }
        }
      }
    })
    .catch(err => {
      console.error(err);
    });
}

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

View on GitHub Feedback

require __DIR__.'/../../vendor/autoload.php';

use Google\Cloud\Vision\V1\ImageAnnotatorClient;
use Google\Cloud\Vision\V1\AnnotateFileRequest;
use Google\Cloud\Vision\V1\Feature;
use Google\Cloud\Vision\V1\Feature\Type;
use Google\Cloud\Vision\V1\InputConfig;

/**
 * Perform batch file annotation.
 *
 * @param string $filePath Path to local pdf file, e.g. /path/document.pdf
 */
function sampleBatchAnnotateFiles($filePath)
{

    $imageAnnotatorClient = new ImageAnnotatorClient();

    // $filePath = 'resources/kafka.pdf';

    // Supported mime_type: application/pdf, image/tiff, image/gif
    $mimeType = 'application/pdf';
    $content = file_get_contents($filePath);
    $inputConfig = new InputConfig();
    $inputConfig->setMimeType($mimeType);
    $inputConfig->setContent($content);
    $type = Type::DOCUMENT_TEXT_DETECTION;
    $featuresElement = new Feature();
    $featuresElement->setType($type);
    $features = [$featuresElement];

    // The service can process up to 5 pages per document file. Here we specify the first, second, and
    // last page of the document to be processed.
    $pagesElement = 1;
    $pagesElement2 = 2;
    $pagesElement3 = -1;
    $pages = [$pagesElement, $pagesElement2, $pagesElement3];
    $requestsElement = new AnnotateFileRequest();
    $requestsElement->setInputConfig($inputConfig);
    $requestsElement->setFeatures($features);
    $requestsElement->setPages($pages);
    $requests = [$requestsElement];

    try {
        $response = $imageAnnotatorClient->batchAnnotateFiles($requests);
        foreach ($response->getResponses()[0]->getResponses() as $imageResponse) {
            printf('Full text: %s'.PHP_EOL, $imageResponse->getFullTextAnnotation()->getText());
            foreach ($imageResponse->getFullTextAnnotation()->getPages() as $page) {
                foreach ($page->getBlocks() as $block) {
                    printf("\nBlock confidence: %s".PHP_EOL, $block->getConfidence());
                    foreach ($block->getParagraphs() as $par) {
                        printf("\tParagraph confidence: %s".PHP_EOL, $par->getConfidence());
                        foreach ($par->getWords() as $word) {
                            printf("\t\tWord confidence: %s".PHP_EOL, $word->getConfidence());
                            foreach ($word->getSymbols() as $symbol) {
                                printf("\t\t\tSymbol: %s, (confidence: %s)".PHP_EOL, $symbol->getText(), $symbol->getConfidence());
                            }
                        }
                    }
                }
            }
        }
    } finally {
        $imageAnnotatorClient->close();
    }

}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

View on GitHub Feedback

from google.cloud import vision_v1
from google.cloud.vision_v1 import enums
import io
import six

def sample_batch_annotate_files(file_path):
  """
    Perform batch file annotation

    Args:
      file_path Path to local pdf file, e.g. /path/document.pdf
    """

  client = vision_v1.ImageAnnotatorClient()

  # file_path = 'resources/kafka.pdf'

  if isinstance(file_path, six.binary_type):
    file_path = file_path.decode('utf-8')

  # Supported mime_type: application/pdf, image/tiff, image/gif
  mime_type = 'application/pdf'
  with io.open(file_path, 'rb') as f:
    content = f.read()
  input_config = {'mime_type': mime_type, 'content': content}
  type_ = enums.Feature.Type.DOCUMENT_TEXT_DETECTION
  features_element = {'type': type_}
  features = [features_element]

  # The service can process up to 5 pages per document file. Here we specify the
  # first, second, and last page of the document to be processed.
  pages_element = 1
  pages_element_2 = 2
  pages_element_3 = -1
  pages = [pages_element, pages_element_2, pages_element_3]
  requests_element = {'input_config': input_config, 'features': features, 'pages': pages}
  requests = [requests_element]

  response = client.batch_annotate_files(requests)
  for image_response in response.responses[0].responses:
    print('Full text: {}'.format(image_response.full_text_annotation.text))
    for page in image_response.full_text_annotation.pages:
      for block in page.blocks:
        print('\nBlock confidence: {}'.format(block.confidence))
        for par in block.paragraphs:
          print('\tParagraph confidence: {}'.format(par.confidence))
          for word in par.words:
            print('\t\tWord confidence: {}'.format(word.confidence))
            for symbol in word.symbols:
              print('\t\t\tSymbol: {}, (confidence: {})'.format(symbol.text, symbol.confidence))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

View on GitHub Feedback

 # Perform batch file annotation
 #
 # @param file_path {String} Path to local pdf file, e.g. /path/document.pdf
def sample_batch_annotate_files(file_path)
  # Instantiate a client
  image_annotator_client = Google::Cloud::Vision::ImageAnnotator.new version: :v1

  # file_path = "resources/kafka.pdf"

  # Supported mime_type: application/pdf, image/tiff, image/gif
  mime_type = "application/pdf"
  content = File.binread file_path
  input_config = { mime_type: mime_type, content: content }
  type = :DOCUMENT_TEXT_DETECTION
  features_element = { type: type }
  features = [features_element]

  # The service can process up to 5 pages per document file. Here we specify the first, second, and
  # last page of the document to be processed.
  pages_element = 1
  pages_element_2 = 2
  pages_element_3 = -1
  pages = [pages_element, pages_element_2, pages_element_3]
  requests_element = {
    input_config: input_config,
    features: features,
    pages: pages
  }
  requests = [requests_element]

  response = image_annotator_client.batch_annotate_files(requests)
  response.responses[0].responses.each do |image_response|
    puts "Full text: #{image_response.full_text_annotation.text}"
    image_response.full_text_annotation.pages.each do |page|
      page.blocks.each do |block|
        puts "\nBlock confidence: #{block.confidence}"
        block.paragraphs.each do |par|
          puts "\tParagraph confidence: #{par.confidence}"
          par.words.each do |word|
            puts "\t\tWord confidence: #{word.confidence}"
            word.symbols.each do |symbol|
              puts "\t\t\tSymbol: #{symbol.text}, (confidence: #{symbol.confidence})"
            end
          end
        end
      end
    end
  end

end

Using a file on Google Cloud Storage

Use the following code samples to get any feature annotation for a file on Google Cloud Storage.

Command-line

To perform online PDF/TIFF/GIF document text detection for a small batch of files, make a POST request and provide the appropriate request body:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://vision.googleapis.com/v1/files:annotate" -d "{
  'requests': [
    {
      'inputConfig': {
        'gcsSource': {
          'uri': 'gs://cloud-samples-data/vision/document_understanding/custom_0773375000.pdf'
        },
        'mimeType': 'application/pdf'
      },
      'features': [
        {
          'type': 'DOCUMENT_TEXT_DETECTION'
        }
      ],
      'pages': [
        2
      ]
    }
  ]
}"

Where:

inputConfig replaces the image field used in other Vision API requests. It contains two child fields:
- gcsSource.uri - The Google Cloud Storage URI of the PDF, TIFF, or GIF file (accessible to the user or service account making the request)
- mimeType - One of the following: "application/pdf", "image/tiff" or "image/gif" .
The pages field specifies the specific pages of the file to perform text detection.

Response

Full response

{
  "responses": [
    {
      "responses": [
        {
          "fullTextAnnotation": {
            "pages": [
              {
                "property": {
                  "detectedLanguages": [
                    {
                      "languageCode": "en",
                      "confidence": 0.99
                    },
                    {
                      "languageCode": "pl",
                      "confidence": 0.01
                    }
                  ]
                },
                "width": 1342,
                "height": 2234,
                "blocks": [
                  {
                    "boundingBox": {
                      "vertices": [
                      ...
                      ]
                    },
                    "paragraphs": [
                      {
                        "boundingBox": {
                          "vertices": [
                          ...
                          ]
                        },
                        "words": [
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ]
                            },
                            "boundingBox": {
                              "vertices": [
                              ...
                              ]
                            },
                            "symbols": [
                              {
                                "property": {
                                  "detectedLanguages": [
                                    {
                                      "languageCode": "en"
                                    }
                                  ],
                                  "detectedBreak": {
                                    "type": "SPACE"
                                  }
                                },
                                "boundingBox": {
                                  "vertices": [
                                ...
                                  ]
                                },
                                "text": "#",
                                "confidence": 0.07
                              }
                            ],
                            "confidence": 0.07
                          },
                          ...
                    ],
                    "blockType": "TEXT",
                    "confidence": 0.88
                  },
                  ...
            ...
            "text": "# THE STATE OF TEXAS\n0\nOIL, GAS AND MINERAL LEASE\n
            COUNTY OF MAVERICK\nTHIS AGREEMENT made this 14 day of_June\n1954,
            between Norvel J. Chittim and his wife, Lieschen G. Chittim;\nMary
            Anne Chittim Parker, joined herein pro forma by her husband,\nJoseph
            Bright Parker; Dorothea Chittim Oppenheimer, joined herein\nji pro
            forma by her husband, Fred J. Oppenheimer; Tuleta Chittim\nWright,
            joined herein pro forma by her husband, Gilbert G. Wright,\nJr.;
            Gilbert G. Wright, III; Delă Wright White, joined herein pro\nforma
            by her husband, John H. White; Anne Wright Basse, joined\nherein
            pro forma by her husband, E. A. Basse, Jr.; Norvel J.\nChittim,
            Independent Executor and Trustee for Estate of Marstella\nChittim,
            Deceased; Mary Louise Roswell, joined herein pro forma by\nher
            husband, Charles M. 'Roswell; and James M. Chittim and his wife\n
            Thelma Neal Chittim; as LESSORS, and W. L. Scheig of San Antonio,\n
            Texas, as LESSEE,\n10\nW ITNESS ETH:\nLessors, in consideration of
            $10.00, cash in hand paid, i\nof the royalties herein provided,
            and of the agreement of Lessee\nherein contained, hereby grant,
            lease and let exclusively unto\nLessee the tracts of land
            hereinafter described for the purpose of\ntesting for mineral
            indications, and in such tests use the Seismo-\ngraph, Torsion
            Balance, Core Drill, or any other tools, machinery,\nequipment
            or explosive necessary and proper; and also prospecting,\ndrilling
            and mining for and producing oil, gas and other minerals i\n
            (except metallic minerals), laying pipe lines, building tanks,\n
            power stations, telephone lines and other structures thereon to\n
            produce, save, take care of, treat, transport and own said pro-\n
            ducts and housing its employees (Lessee to conduct its geophysical\n
            work in such manner as not to damage the buildings, water tanks\n
            or wells of Lessors, or the livestock of Lessors or Lessors' ten-\n
            ants, ) said lands being situated in Maverick, Zavalla and Dimmit\n
            Counties, Texas, to-wit:\n3 -1.\n"
          },
          "context": {
            "pageNumber": 2
          }
        }
      ]
    }
  ]
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Java API reference documentation .

View on GitHub Feedback

/*
 * Please include the following imports to run this sample.
 *
 * import com.google.cloud.vision.v1.AnnotateFileRequest;
 * import com.google.cloud.vision.v1.AnnotateImageResponse;
 * import com.google.cloud.vision.v1.BatchAnnotateFilesRequest;
 * import com.google.cloud.vision.v1.BatchAnnotateFilesResponse;
 * import com.google.cloud.vision.v1.Block;
 * import com.google.cloud.vision.v1.Feature;
 * import com.google.cloud.vision.v1.GcsSource;
 * import com.google.cloud.vision.v1.ImageAnnotatorClient;
 * import com.google.cloud.vision.v1.InputConfig;
 * import com.google.cloud.vision.v1.Page;
 * import com.google.cloud.vision.v1.Paragraph;
 * import com.google.cloud.vision.v1.Symbol;
 * import com.google.cloud.vision.v1.Word;
 * import java.util.Arrays;
 * import java.util.List;
 */

/**
 * Perform batch file annotation
 *
 * @param storageUri Cloud Storage URI to source image in the format gs://[bucket]/[file]
 */
public static void sampleBatchAnnotateFiles(String storageUri) {
  try (ImageAnnotatorClient imageAnnotatorClient = ImageAnnotatorClient.create()) {
    // storageUri = "gs://cloud-samples-data/vision/document_understanding/kafka.pdf";
    GcsSource gcsSource = GcsSource.newBuilder().setUri(storageUri).build();
    InputConfig inputConfig = InputConfig.newBuilder().setGcsSource(gcsSource).build();
    Feature.Type type = Feature.Type.DOCUMENT_TEXT_DETECTION;
    Feature featuresElement = Feature.newBuilder().setType(type).build();
    List<Feature> features = Arrays.asList(featuresElement);

    // The service can process up to 5 pages per document file.
    // Here we specify the first, second, and last page of the document to be processed.
    int pagesElement = 1;
    int pagesElement2 = 2;
    int pagesElement3 = -1;
    List<Integer> pages = Arrays.asList(pagesElement, pagesElement2, pagesElement3);
    AnnotateFileRequest requestsElement =
        AnnotateFileRequest.newBuilder()
            .setInputConfig(inputConfig)
            .addAllFeatures(features)
            .addAllPages(pages)
            .build();
    List<AnnotateFileRequest> requests = Arrays.asList(requestsElement);
    BatchAnnotateFilesRequest request =
        BatchAnnotateFilesRequest.newBuilder().addAllRequests(requests).build();
    BatchAnnotateFilesResponse response = imageAnnotatorClient.batchAnnotateFiles(request);
    for (AnnotateImageResponse imageResponse :
        response.getResponsesList().get(0).getResponsesList()) {
      System.out.printf("Full text: %s\n", imageResponse.getFullTextAnnotation().getText());
      for (Page page : imageResponse.getFullTextAnnotation().getPagesList()) {
        for (Block block : page.getBlocksList()) {
          System.out.printf("\nBlock confidence: %s\n", block.getConfidence());
          for (Paragraph par : block.getParagraphsList()) {
            System.out.printf("\tParagraph confidence: %s\n", par.getConfidence());
            for (Word word : par.getWordsList()) {
              System.out.printf("\t\tWord confidence: %s\n", word.getConfidence());
              for (Symbol symbol : word.getSymbolsList()) {
                System.out.printf(
                    "\t\t\tSymbol: %s, (confidence: %s)\n",
                    symbol.getText(), symbol.getConfidence());
              }
            }
          }
        }
      }
    }
  } catch (Exception exception) {
    System.err.println("Failed to create the client due to: " + exception);
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .

View on GitHub Feedback

const vision = require('@google-cloud/vision').v1;

/**
 * Perform batch file annotation
 *
 * @param storageUri {string} Cloud Storage URI to source image in the format gs://[bucket]/[file]
 */
function sampleBatchAnnotateFiles(storageUri) {
  const client = new vision.ImageAnnotatorClient();
  // const storageUri = 'gs://cloud-samples-data/vision/document_understanding/kafka.pdf';
  const gcsSource = {
    uri: storageUri,
  };
  const inputConfig = {
    gcsSource: gcsSource,
  };
  const type = 'DOCUMENT_TEXT_DETECTION';
  const featuresElement = {
    type: type,
  };
  const features = [featuresElement];

  // The service can process up to 5 pages per document file.
  // Here we specify the first, second, and last page of the document to be processed.
  const pagesElement = 1;
  const pagesElement2 = 2;
  const pagesElement3 = -1;
  const pages = [pagesElement, pagesElement2, pagesElement3];
  const requestsElement = {
    inputConfig: inputConfig,
    features: features,
    pages: pages,
  };
  const requests = [requestsElement];
  client.batchAnnotateFiles({requests: requests})
    .then(responses => {
      const response = responses[0];
      for (const imageResponse of response.responses[0].responses) {
        console.log(`Full text: ${imageResponse.fullTextAnnotation.text}`);
        for (const page of imageResponse.fullTextAnnotation.pages) {
          for (const block of page.blocks) {
            console.log(`\nBlock confidence: ${block.confidence}`);
            for (const par of block.paragraphs) {
              console.log(`\tParagraph confidence: ${par.confidence}`);
              for (const word of par.words) {
                console.log(`\t\tWord confidence: ${word.confidence}`);
                for (const symbol of word.symbols) {
                  console.log(`\t\t\tSymbol: ${symbol.text}, (confidence: ${symbol.confidence})`);
                }
              }
            }
          }
        }
      }
    })
    .catch(err => {
      console.error(err);
    });
}

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

View on GitHub Feedback

require __DIR__.'/../../vendor/autoload.php';

use Google\Cloud\Vision\V1\ImageAnnotatorClient;
use Google\Cloud\Vision\V1\AnnotateFileRequest;
use Google\Cloud\Vision\V1\Feature;
use Google\Cloud\Vision\V1\Feature\Type;
use Google\Cloud\Vision\V1\GcsSource;
use Google\Cloud\Vision\V1\InputConfig;

/**
 * Perform batch file annotation.
 *
 * @param string $storageUri Cloud Storage URI to source image in the format gs://[bucket]/[file]
 */
function sampleBatchAnnotateFiles($storageUri)
{

    $imageAnnotatorClient = new ImageAnnotatorClient();

    // $storageUri = 'gs://cloud-samples-data/vision/document_understanding/kafka.pdf';
    $gcsSource = new GcsSource();
    $gcsSource->setUri($storageUri);
    $inputConfig = new InputConfig();
    $inputConfig->setGcsSource($gcsSource);
    $type = Type::DOCUMENT_TEXT_DETECTION;
    $featuresElement = new Feature();
    $featuresElement->setType($type);
    $features = [$featuresElement];

    // The service can process up to 5 pages per document file.
    // Here we specify the first, second, and last page of the document to be processed.
    $pagesElement = 1;
    $pagesElement2 = 2;
    $pagesElement3 = -1;
    $pages = [$pagesElement, $pagesElement2, $pagesElement3];
    $requestsElement = new AnnotateFileRequest();
    $requestsElement->setInputConfig($inputConfig);
    $requestsElement->setFeatures($features);
    $requestsElement->setPages($pages);
    $requests = [$requestsElement];

    try {
        $response = $imageAnnotatorClient->batchAnnotateFiles($requests);
        foreach ($response->getResponses()[0]->getResponses() as $imageResponse) {
            printf('Full text: %s'.PHP_EOL, $imageResponse->getFullTextAnnotation()->getText());
            foreach ($imageResponse->getFullTextAnnotation()->getPages() as $page) {
                foreach ($page->getBlocks() as $block) {
                    printf("\nBlock confidence: %s".PHP_EOL, $block->getConfidence());
                    foreach ($block->getParagraphs() as $par) {
                        printf("\tParagraph confidence: %s".PHP_EOL, $par->getConfidence());
                        foreach ($par->getWords() as $word) {
                            printf("\t\tWord confidence: %s".PHP_EOL, $word->getConfidence());
                            foreach ($word->getSymbols() as $symbol) {
                                printf("\t\t\tSymbol: %s, (confidence: %s)".PHP_EOL, $symbol->getText(), $symbol->getConfidence());
                            }
                        }
                    }
                }
            }
        }
    } finally {
        $imageAnnotatorClient->close();
    }

}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

View on GitHub Feedback

from google.cloud import vision_v1
from google.cloud.vision_v1 import enums
import six

def sample_batch_annotate_files(storage_uri):
  """
    Perform batch file annotation

    Args:
      storage_uri Cloud Storage URI to source image in the format gs://[bucket]/
      [file]
    """

  client = vision_v1.ImageAnnotatorClient()

  # storage_uri = 'gs://cloud-samples-data/vision/document_understanding/kafka.pdf'

  if isinstance(storage_uri, six.binary_type):
    storage_uri = storage_uri.decode('utf-8')
  gcs_source = {'uri': storage_uri}
  input_config = {'gcs_source': gcs_source}
  type_ = enums.Feature.Type.DOCUMENT_TEXT_DETECTION
  features_element = {'type': type_}
  features = [features_element]

  # The service can process up to 5 pages per document file.
  # Here we specify the first, second, and last page of the document to be
  # processed.
  pages_element = 1
  pages_element_2 = 2
  pages_element_3 = -1
  pages = [pages_element, pages_element_2, pages_element_3]
  requests_element = {'input_config': input_config, 'features': features, 'pages': pages}
  requests = [requests_element]

  response = client.batch_annotate_files(requests)
  for image_response in response.responses[0].responses:
    print('Full text: {}'.format(image_response.full_text_annotation.text))
    for page in image_response.full_text_annotation.pages:
      for block in page.blocks:
        print('\nBlock confidence: {}'.format(block.confidence))
        for par in block.paragraphs:
          print('\tParagraph confidence: {}'.format(par.confidence))
          for word in par.words:
            print('\t\tWord confidence: {}'.format(word.confidence))
            for symbol in word.symbols:
              print('\t\t\tSymbol: {}, (confidence: {})'.format(symbol.text, symbol.confidence))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

View on GitHub Feedback

 # Perform batch file annotation
 #
 # @param storage_uri {String} Cloud Storage URI to source image in the format gs://[bucket]/[file]
def sample_batch_annotate_files(storage_uri)
  # Instantiate a client
  image_annotator_client = Google::Cloud::Vision::ImageAnnotator.new version: :v1

  # storage_uri = "gs://cloud-samples-data/vision/document_understanding/kafka.pdf"
  gcs_source = { uri: storage_uri }
  input_config = { gcs_source: gcs_source }
  type = :DOCUMENT_TEXT_DETECTION
  features_element = { type: type }
  features = [features_element]

  # The service can process up to 5 pages per document file.
  # Here we specify the first, second, and last page of the document to be processed.
  pages_element = 1
  pages_element_2 = 2
  pages_element_3 = -1
  pages = [pages_element, pages_element_2, pages_element_3]
  requests_element = {
    input_config: input_config,
    features: features,
    pages: pages
  }
  requests = [requests_element]

  response = image_annotator_client.batch_annotate_files(requests)
  response.responses[0].responses.each do |image_response|
    puts "Full text: #{image_response.full_text_annotation.text}"
    image_response.full_text_annotation.pages.each do |page|
      page.blocks.each do |block|
        puts "\nBlock confidence: #{block.confidence}"
        block.paragraphs.each do |par|
          puts "\tParagraph confidence: #{par.confidence}"
          par.words.each do |word|
            puts "\t\tWord confidence: #{word.confidence}"
            word.symbols.each do |symbol|
              puts "\t\t\tSymbol: #{symbol.text}, (confidence: #{symbol.confidence})"
            end
          end
        end
      end
    end
  end

end

Jun	JUL	Aug
	19
2018	2019	2020

Small batch file annotation online

Limitations

Authentication

Currently supported feature types

Sample code

Using a locally stored file

Command-line

Response

Java

Node.js

PHP

Python

Ruby

Using a file on Google Cloud Storage

Command-line

Response

Java

Node.js

PHP

Python

Ruby

Send feedback about...