Date Published: 03-31-2023
Date of Last Update: 03-31-2023
The PDC Metadata Application Programming Interface (API) is an online resource that provides access to PDC metadata records. This service is implemented as a collection of URL endpoints that can be invoked using HTTP, in order to receive a response. In general, this service is analogous to a query system where API requests (i.e. queries) are made and corresponding responses (i.e. results) are returned. In particular, any given response is a representation of zero or more structured metadata records. This document will describe how to use this metadata API, in order to support activities such as ad hoc queries, automated metadata harvesting, programmatic data mining, and data visualization development.
Currently, this API provides both XML and JSON responses. The XML is compliant with ISO 19115 standards and the contents are equivalent to the XML that can be extracted from the Geospatial Search Tool. More details on the JSON format are provided below.
PDC JSON responses provide schema.org compliant JSON-LD. As a result, the contents will be structured and nested to represent a Dataset, where a Dataset element corresponds to a PDC metadata record. If a response contains one or more metadata records, then each individual Dataset element will be contained within a wrapping itemListElement, where the set of itemListElements are contained within an itemList (i.e. an array of itemListElements). Therefore, it is necessary to understand the structure of the metadata within the API response, in order to be able to effectively extract content of interest (see Consuming API Responses).
There are 2 ways to use curl on your machine:
Option 1: Use curl from Command Line/Terminal
C:\users\anyuser> curl --version
curl 7.55.1 (Windows) libcurl/7.55.1 WinSSL
Release-Date: 2017-11-14, security patched: 2019-11-05
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL
Windows
curl -version
Mac OS
brew -v
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
brew install curl
Linux
sudo apt-get install curl
Option 2: Execute curl commands directly from your browser.
There are multiple websites like Reqbin that allow you to use curl to send a request to an API. The website then displays the JSON response. Users don't need to have curl installed on their machine and don't need to use terminal/console/command line to use curl. This is a great option if the user just needs a few api requests.
This API provides access to publicly available PDC metadata records. There are a variety of methods for acquiring records that correspond to the different API endpoints.
Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records
https://polardata.ca/api/metadata?page={page}
Parameters | Description |
---|---|
{page} | A page number used to retrieve a corresponding batch of records |
A JSON request to the API for page 5 would look like this:
curl -X GET "https://polardata.ca/api/metadata?page=5" -H "accept: */*"
An XML request to the API for page 5 would look like this:
curl -X GET "https://polardata.ca/api/metadata/xml?page=5" -H "accept: */*"
Responses
Code | Description |
---|---|
200 | OK - The requested page or id was found |
400 | Bad Request - The request was malformed or invalid |
404 | Not Found - The requested page or id was not found |
429 | Too Many Requests - This API is rate limited |
https://polardata.ca/api/metadata/id/{id}
Parameters | Description |
---|---|
{id} | An assigned CCIN Reference Number. Must be an already existing CCIN Reference Number. |
A JSON request to the API for a record with CCIN Reference Number 13264 would look like this:
curl -X GET "https://polardata.ca/api/metadata/id/13264" -H "accept: */*"
An XML request to the API for a record with CCIN Reference Number 13264 would look like this:
curl -X GET "https://polardata.ca/api/metadata/xml/id/13264" -H "accept: */*"
Responses
Code | Description |
---|---|
200 | OK - The requested page or id was found |
400 | Bad Request - The request was malformed or invalid |
404 | Not Found - The requested page or id was not found |
429 | Too Many Requests - This API is rate limited |
The search is not case sensitive. Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records
https://polardata.ca/api/metadata/program/{name}?page={page}
Parameters | Description |
---|---|
{page} | A page number used to retrieve a corresponding batch of records. Default value : 0 |
{name} | Text that is equal to, or part of, an existing program name, using %20 as the space character |
A JSON request to the API for records from 'Amundsen Science' with page '0' would look like this:
curl -X GET "https://polardata.ca/api/metadata/program/amundsen%20science?page=0" -H "accept: */*"
Am XML request to the API for records from 'Amundsen Science' with page '0' would look like this:
curl -X GET "https://polardata.ca/api/metadata/xml/program/amundsen%20science?page=0" -H "accept: */*"
Responses
Code | Description |
---|---|
200 | OK - The requested page or id was found |
400 | Bad Request - The request was malformed or invalid |
404 | Not Found - The requested page or id was not found |
429 | Too Many Requests - This API is rate limited |
Must be a valid date represented as a string in YYYY-MM-DD format Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records
https://polardata.ca/api/metadata/since/{date}?page={page}
Parameters | Description |
---|---|
{page} | A page number used to retrieve a corresponding batch of records. Default value : 0 |
{date} | A cutoff date from which to start the search, up to the present date. This endpoint is intended to keep collections updated by providing delta harvest capability. Therefore, the since date cannot be earlier than one year from the current date. |
A JSON request to the API for records since 2022-12-31 would look like this:
curl -X GET "https://polardata.ca/api/metadata/since/2022-12-31?page=0" -H "accept: */*"
An XML request to the API for records since 2022-12-31 would look like this:
curl -X GET "https://polardata.ca/api/metadata/xml/since/2022-12-31?page=0" -H "accept: */*"
Responses
Code | Description |
---|---|
200 | OK - The requested page or id was found |
400 | Bad Request - The request was malformed or invalid |
404 | Not Found - The requested page or id was not found |
429 | Too Many Requests - This API is rate limited |
The search is not case sensitive Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records
https://polardata.ca/api/metadata/title/{title}?page={page}
Parameters | Description |
---|---|
page | A page number used to retrieve a corresponding batch of records. Default value : 0 |
title | Text that is equal to, or part of, an existing record title, using %20 as the space character |
A JSON request to the API for records with title "ice cap" and page '0' would look like this:
curl -X GET "https://polardata.ca/api/metadata/title/ice%20cap?page=0" -H "accept: */*"
An XML request to the API for records with title "ice cap" and page '0' would look like this:
curl -X GET "https://polardata.ca/api/metadata/xml/title/ice%20cap?page=0" -H "accept: */*"
Responses
Code | Description |
---|---|
200 | OK - The requested page or id was found |
400 | Bad Request - The request was malformed or invalid |
404 | Not Found - The requested page or id was not found |
429 | Too Many Requests - This API is rate limited |
This API allows a maximum of 500 requests per day. Excess requests will results in an HTTP 429 - Too Many Requests response status code, indicating that the user has sent too many requests in the given amount of time. As a reference, requesting the entire PDC metadata collection requires only about 150 API calls.
This section shows how to programmatically invoke API responses and capture the contents for further programmatic manipulation. The following examples use the R programming language which can be used for many data related activities, such as inferential statistics, plot generation, and interactive data visualizations. Therefore, all code snippets will show how to convert API responses from JSON into an R data frame.
Because the metadata API only returns a fixed number of records per call, we will need to use conditional looping to sequentially harvest and aggregate the responses. The general strategy involves incrementing the requested page number until no further responses are received, as indicated by the response code. More advanced implementations are encouraged to use R's tryCatch feature which can be very helpful for detecting and skipping malformed URLs, as well as reporting other issues as console ouput (see below).
metadata <- data.frame()
apiURL <- "https://polardata.ca/api/metadata/?page="
page <- 0
status <- 200
while (status == 200) {
apiCall <- paste0(apiURL, page)
response <- GET(apiCall)
status <- status_code(response)
print(paste0(apiCall, ": ", status))
if(status == 200) {
json_text <- content(response, "text", encoding = "UTF-8")
json_data <- fromJSON(json_text, flatten = TRUE)
metadata <- bind_rows(metadata, json_data$"itemListElement")
page <- page + 1
}
}
View(metadata)
We can approximate a topic search by using the Search by Title endpoint. In this example, we will setup a variable called topic, so that only the value of topic needs to be changed for a new search. Also, nested looping could be used for a list of topics, instead of a single value, where the outer loop cycles through the list. Note the use of the tryCatch block in this implementation.
metadata <- data.frame()
apiURL <- "https://polardata.ca/api/metadata/title/"
topic <- "ice cap"
page <- 0
status <- 200
tryCatch({
while (status == 200) {
apiCall <-
paste0(apiURL,
str_replace_all(str_replace_all(topic, "/", ""), " ", "%20"),
"?page=",
page)
response <- GET(apiCall)
status <- status_code(response)
print(paste0(apiCall, ": ", status))
if (status == 200) {
json_text <- content(response, "text", encoding = "UTF-8")
json_data <- fromJSON(json_text, flatten = TRUE)
metadata <- bind_rows(metadata, json_data$"itemListElement")
page <- page + 1
}
}
}, error = function(e) {
print(paste0("Error: ", e))
})
View(metadata)
Characters like spaces and slashes, as well as other ASCII characters, can cause issues with API requests when those characters are not properly encoded within the request. For this API, all spaces should be replaced with %20 (e.g. ice%20cap, instead of ice cap) and all slashes should be omitted (e.g. POLARCHARS, instead of POLAR/CHARS).
Ensure that arguments, like page numbers, make sense and remain within their expected scope, to avoid 404 responses. For example, requesting a negative page number violates the expected range of values. Similarly, requesting a page number greater than the maximum value will also cause a 404 response. However, in this case, since the maximum value is unknown by the caller, it might be necessary to attempt an out-of-bounds call, as shown in the example above. In such cases, response testing should immediately terminate incremental loops, in order to truncate invalid request attempts.
If an API request is correctly structured but still fails, this is indicative of an issue with the endpoint itself. In most cases this is the result of the API service, or some needed dependency, being unavailable. However, it is also possible that the API has been updated to a newer version that has resulted in an endpoint being no longer available. While such revisions are generally avoided to prevent broken requests, eventually such revisions might become unavoidable. This can be confirmed by checking the API version information and the general documentation. Another potential issue is that API could have migrated to another server.
The PDC Metadata API provides a service for acquiring metadata records using a variety of endpoints. The API can be used manually, or it can be invoked in an automated fashion. The API is intended to support activities such as ad hoc queries, automated metadata harvesting, programmatic data mining, and data visualization development. Ultimately, our goal is to provide easy access to project resources for all interested parties, in the spirit of the FAIR Data Principles.