The Vision API can detect and transcribe text from PDF, TIFF, and GIF
files stored in Google Cloud Storage. This includes online text detection and
annotation of 5 frames (gif) or pages (pdf or tiff) of your choosing for
each file
in a batch of files (application/pdf, image/tiff and image/gif).
Document text detection from PDF and TIFF is requested using the
annotate function, which performs an online request and
provides you an immediate JSON response.
Limitations
At most 5 pages will be annotated. Users can specify the specific 5 pages to be annotated.
Authentication
API keys are not supported for annotate requests. See
Using a service account for
instructions on authenticating with a service account.
Currently supported feature types
| All feature types | |
|---|---|
FACE_DETECTION |
Run face detection. |
LANDMARK_DETECTION |
Run landmark detection. |
LOGO_DETECTION |
Run logo detection. |
LABEL_DETECTION |
Run label detection. |
TEXT_DETECTION |
Run text detection / optical character recognition (OCR). Text detection is optimized for areas of text within a larger image; if the image is a document, use DOCUMENT_TEXT_DETECTION instead. |
DOCUMENT_TEXT_DETECTION |
Run dense text document OCR. Takes precedence when both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are present. |
SAFE_SEARCH_DETECTION |
Run Safe Search to detect potentially unsafe or undesirable content. |
IMAGE_PROPERTIES |
Compute a set of image properties, such as the image's dominant colors. |
CROP_HINTS |
Run crop hints. |
WEB_DETECTION |
Run web detection. |
OBJECT_LOCALIZATION |
Run localizer for object detection. |
Sample code
You can either send an annotation request with a locally stored file, or use a file that is stored on Google Cloud Storage.
Using a locally stored file
Use the following code samples to get any feature annotation for a locally stored file.
Command-line
To perform online PDF/TIFF/GIF document text detection for a small batch of files, make a POST request and provide the appropriate request body:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://vision.googleapis.com/v1/files:annotate" -d "{
'requests': [
{
'inputConfig': {
'content': 'JVBERi0xLjUNCiW1tbW1...base64-encoded-file...ydHhyZWYNCjk5NzM2OQ0KJSVFT0Y=',
'mimeType': 'application/pdf'
},
'features': [
{
'type': 'DOCUMENT_TEXT_DETECTION'
}
],
'pages': [
2
]
}
]
}"
Where:
inputConfigreplaces theimagefield used in other Vision API requests. It contains two child fields:content- The file content (PDF, TIFF, or GIF), represented as a stream of bytes.mimeType- One of the following: "application/pdf", "image/tiff" or "image/gif".
The
pagesfield specifies the specific pages of the file to perform text detection.
Response
A successful annotate request immediately returns a JSON response. The
returned JSON response is similar to that of an image's
document text detection request, with bounding boxes
for blocks broken down by paragraphs, words, and individual symbols, as well
as the full text detected. The response also contain a context field showing
the location of the PDF or TIFF that was specified and
the result's page number in the file.
Java
Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Java API reference documentation .
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .
PHP
Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .
Python
Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .
Ruby
Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .
Using a file on Google Cloud Storage
Use the following code samples to get any feature annotation for a file on Google Cloud Storage.
Command-line
To perform online PDF/TIFF/GIF document text detection for a small batch of files, make a POST request and provide the appropriate request body:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://vision.googleapis.com/v1/files:annotate" -d "{
'requests': [
{
'inputConfig': {
'gcsSource': {
'uri': 'gs://cloud-samples-data/vision/document_understanding/custom_0773375000.pdf'
},
'mimeType': 'application/pdf'
},
'features': [
{
'type': 'DOCUMENT_TEXT_DETECTION'
}
],
'pages': [
2
]
}
]
}"
Where:
inputConfigreplaces theimagefield used in other Vision API requests. It contains two child fields:gcsSource.uri- The Google Cloud Storage URI of the PDF, TIFF, or GIF file (accessible to the user or service account making the request)mimeType- One of the following: "application/pdf", "image/tiff" or "image/gif" .
The
pagesfield specifies the specific pages of the file to perform text detection.
Response
A successful annotate request immediately returns a JSON response. The
returned JSON response is similar to that of an image's
document text detection request, with bounding boxes
for blocks broken down by paragraphs, words, and individual symbols, as well
as the full text detected. The response also contain a context field showing
the location of the PDF or TIFF that was specified and
the result's page number in the file.
Java
Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Java API reference documentation .
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .
PHP
Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .
Python
Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .
Ruby
Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .


