Title: | Interface to the iDigBio Data API |
---|---|
Description: | An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio. |
Authors: | Francois Michonneau [aut, cph] (Original Author), Matthew Collins [aut] (Original Author), Scott Chamberlain [ctb], Kevin Love [ctb], Hem Nalini Morzaria-Luna [ctb], Michelle L. Gaynor [ctb, aut], Jesse Bennett [cre] (Maintainer) |
Maintainer: | Jesse Bennett <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.1 |
Built: | 2025-01-25 05:42:05 UTC |
Source: | https://github.com/idigbio/ridigbio |
Given the desired fields to be returned, intelligently add an exclusion for the data array if warranted and handle the "all" keyword. And do so without setting both fields and fields_exclude due to fact that the API will return wrong results if are passed. This is still posssible if the user deliberately sets both. Not exported.
build_field_lists(fields, type)
build_field_lists(fields, type)
fields |
character vector of fields user wants returned |
type |
type of records to get fields for |
list list with fields key for df fields and query key for parameters to be merged with the query sent
Function to build attribution dataframe from a query to the iDigBio API
idig_build_attrib(dat)
idig_build_attrib(dat)
dat |
dataframe generated by idig_search method |
This function differs from the attribution metadata that is attached to the dataframe returned by the idig_search_* methods. It summarizes the record sets used by records in the dataframe, not the record sets that have records that match the query sent to iDigBio. This is useful if only part of the records for a query are downloaded, for example with the limit and offset parameters.
Exported.
a data frame
Kevin Love
Checks for HTTP error codes and JSON errors.
idig_check(req)
idig_check(req)
req |
the returned request |
Part 1 of the error checking process. This part handles HTTP error codes and then calls part 2 which handles JSON errors in the responses. Not exported.
nothing. Stops if HTTP code is >= 400
Francois Michonneau
Checks for error messages that can be returned by the API in JSON.
idig_check_error(req)
idig_check_error(req)
req |
the returned request |
Part 2 of the error checking process. Checks the JSON response for error messages and stops if any are found. Not exported.
nothing. Stops if request contains an error.
Francois Michonneau
Count media records matching a query.
idig_count_media(rq = FALSE, mq = FALSE, ...)
idig_count_media(rq = FALSE, mq = FALSE, ...)
rq |
iDigBio record query in nested list format |
mq |
iDigBio media query in nested list format |
... |
additional parameters |
Quickly return a count of the media records matching the query(s) provided.
count of media records matching the query(s)
Matthew Collins
Count specimen records matching a query.
idig_count_records(rq = FALSE, ...)
idig_count_records(rq = FALSE, ...)
rq |
iDigBio record query in nested list format |
... |
additional parameters |
Quickly return a count of the specimen records matching the query(s) provided.
count of specimen records matching the query(s)
Matthew Collins
Internal function for GET requests.
idig_GET(path, ...)
idig_GET(path, ...)
path |
endpoint |
... |
additional arguments to be passed to httr::GET |
Generates a GET request and performs the checks on what is returned. Not exported.
the request (as a list)
Francois Michonneau
List of fields in iDigBio.
idig_meta_fields(type = "records", subset = FALSE, ...)
idig_meta_fields(type = "records", subset = FALSE, ...)
type |
string type of fields to return, defaults to "records" |
subset |
set of fields to return, "indexed", "raw", or unset for all |
... |
additional parameters |
Return a list of media or specimen fields that are contained in iDigBio.
list of fields of the requested type
Matthew Collins
Parses output of successful query to return a list.
idig_parse(req)
idig_parse(req)
req |
the returned request |
Not exported.
a list
Francois Michonneau
Internal function for POST requests.
idig_POST(path, body, ...)
idig_POST(path, body, ...)
path |
endpoint |
body |
a list of parameters for the endpoint |
... |
additional arguments to be passed to httr::POST |
Generates a POST request and performs the checks on what is returned. Not exported.
the request (as a list)
Francois Michonneau
Base function to query the iDigBio API
idig_search( type = "records", mq = FALSE, rq = FALSE, fields = FALSE, max_items = 1e+05, limit = 0, offset = 0, sort = FALSE, ... )
idig_search( type = "records", mq = FALSE, rq = FALSE, fields = FALSE, max_items = 1e+05, limit = 0, offset = 0, sort = FALSE, ... )
type |
string type of records to query, defaults to "records" |
mq |
iDigBio media query in nested list format |
rq |
iDigBio record query in nested list format |
fields |
vector of fields that will be contained in the data.frame |
max_items |
CURRENTLY IGNORED, SEE ISSUE #33 maximum number of results allowed to be retrieved (fail-safe) |
limit |
maximum number of results returned |
offset |
number of results to skip before returning results |
sort |
vector of fields to use for sorting, UUID is always appended to make paging safe |
... |
additional parameters |
This function is wrapped for media and specimen record searches. Please
consider using idig_search_media
or
idig_search_records
instead as they supply nice defaults to
this function depending on the type of records desired.
Fuller documentation of parameters is in the
idig_search_records
function's help.
Exported to facilitate wrapping this package in other packages.
a data frame
Francois Michonneau
## Not run: # Ten media records related to genus Acer specimens idig_search(type="media", rq=list(genus="acer"), limit=10) ## End(Not run)
## Not run: # Ten media records related to genus Acer specimens idig_search(type="media", rq=list(genus="acer"), limit=10) ## End(Not run)
Function to query the iDigBio API for media records
idig_search_media( mq = FALSE, rq = FALSE, fields = FALSE, max_items = 1e+05, limit = 0, offset = 0, sort = FALSE, ... )
idig_search_media( mq = FALSE, rq = FALSE, fields = FALSE, max_items = 1e+05, limit = 0, offset = 0, sort = FALSE, ... )
mq |
iDigBio media query in nested list format |
rq |
iDigBio record query in nested list format |
fields |
vector of fields that will be contained in the data.frame, defaults to "all" which is all indexed fields |
max_items |
maximum number of results allowed to be retrieved (fail -safe) |
limit |
maximum number of results returned |
offset |
number of results to skip before returning results |
sort |
vector of fields to use for sorting, UUID is always appended to make paging safe |
... |
additional parameters |
Also see idig_search_records
for the full examples of all the
parameters related to searching iDigBio.
Wraps idig_search
to provide defaults specific to searching
media records. Using this function instead of idig_search
directly is recommened. Record queries and media queries objects are allowed
(rq and mq parameters) and media records returned will match the
requirements of both.
This function defaults to returning all indexed media record fields.
A data frame with fields requested or the following default fields:
datemodified: Date last modified, which is assigned by iDigBio.
dqs: Data quality score assigned by iDigBio.
etag: Tag assigned by iDigBio.
flags: Data quality flag assigned by iDigBio.
hasSpecimen: TRUE or FALSE, indicates if there is an associated record for this media.
mediatype: Media object type.
recordids: List of UUID for associated records.
records: UUID for the associated record.
recordset: Record set ID assigned by iDigBio.
uuid: Unique identifier assigned by iDigBio.
version: Media record version assigned by iDigBio.
xpixels: As defined by EXIF, x dimension in pixel.
ypixels: As defined by EXIF,y dimension in pixels.
Matthew Collins
## Not run: # Searching for media using a query on related specimen information - first # 10 media records with image URIs related to a specimen in the genus Acer: df <- idig_search_media(rq=list(genus="acer"), mq=list("data.ac:accessURI"=list("type"="exists")), fields=c("uuid","data.ac:accessURI"), limit=10) ## End(Not run)
## Not run: # Searching for media using a query on related specimen information - first # 10 media records with image URIs related to a specimen in the genus Acer: df <- idig_search_media(rq=list(genus="acer"), mq=list("data.ac:accessURI"=list("type"="exists")), fields=c("uuid","data.ac:accessURI"), limit=10) ## End(Not run)
Function to query the iDigBio API for specimen records
idig_search_records( rq, fields = FALSE, max_items = 1e+05, limit = 0, offset = 0, sort = FALSE, ... )
idig_search_records( rq, fields = FALSE, max_items = 1e+05, limit = 0, offset = 0, sort = FALSE, ... )
rq |
iDigBio record query in nested list format |
fields |
vector of fields that will be contained in the data.frame, limited set returned by default, use "all" to get all indexed fields |
max_items |
maximum number of results allowed to be retrieved (fail -safe) |
limit |
maximum number of results returned |
offset |
number of results to skip before returning results |
sort |
vector of fields to use for sorting, UUID is always appended to make paging safe |
... |
additional parameters |
Wraps idig_search
to provide defaults specific to searching
specimen records. Using this function instead of idig_search
directly is recommened.
Queries need to be specified as a nested list structure that will serialize to an iDigBio query object's JSON as expected by the iDigBio API: https://github.com/iDigBio/idigbio-search-api/wiki/Query-Format
As an example, the first sample query looks like this in JSON in the API documentation:
{ "scientificname": { "type": "exists" }, "family": "asteraceae" }
To rewrite this in R for use as the rq parameter to
idig_search_records
or idig_search_media
, it would look like
this:
rq <- list("scientificname"=list("type"="exists"), "family"="asteraceae" )
An example of a more complex JSON query with nested structures:
{ "geopoint": { "type": "geo_bounding_box", "top_left": { "lat": 19.23, "lon": -130 }, "bottom_right": { "lat": -45.1119, "lon": 179.99999 } } }
To rewrite this in R for use as the rq parameter, use nested calls to the list() function:
rq <- list(geopoint=list( type="geo_bounding_box", top_left=list(lat=19.23, lon=-130), bottom_right=list(lat=-45.1119, lon= 179.99999) ) )
See the Examples section below for more samples of simpler and more complex queries. Please refer to the API documentation for the full functionality availible in queries.
All matching results are returned up to the max_items cap (default 100,000).
If more results are wanted, a higher max_items can be passed as an option.
This API loads records 5,000 at a time using HTTP so performance with large
sets of data is not very good. Expect result sets over 50,000 records to
take tens of minutes. You can use the idig_count_records
or
idig_count_media
functions to find out how many records a
query will return; these are fast.
The iDigBio API will only return 5,000 records at a time but this function will automatically page through the results and return them all. Limit and offset are availible if manual paging of results is needed though the max_items cap still applies. The item count comes from the results header not the count of actual records in the limit/offset window.
Return is a data.frame containing the requested fields (or the default
fields). The columns in the data frame are untyped and no factors are pre-
built. Attribution and other metadata is attached to the dataframe in the
data.frame's attributes. (I.e. attributes(df)
)
A data frame with fields requested or the following default fields:
UUID: Unique identifier assigned by iDigBio.
family - may be reassigned by iDigBio
genus - may be reassigned by iDigBio
scientificname - may be reassigned by iDigBio
country - may be modified by iDigBio
geopoint: Assigned by iDigBio.
datecollected: May be reassigned by iDigBio, see more here
collector: Assigned by iDigBio.
recordset: Assigned by iDigBio.
Matthew Collins
## Not run: # Simple example of retriving records in a genus: idig_search_records(rq=list(genus="acer"), limit=10) # This complex query shows that booleans passed to the API are represented # as strings in R, fields used in the query don't have to be returned, and # the syntax for accessing raw data fields: idig_search_records(rq=list("hasImage"="true", genus="acer"), fields=c("uuid", "data.dwc:verbatimLatitude"), limit=100) # Searching inside a raw data field for a string, note that raw data fields # are searched as full text, indexed fields are search with exact matches: idig_search_records(rq=list("data.dwc:dynamicProperties"="parasite"), fields=c("uuid", "data.dwc:dynamicProperties"), limit=100) # Retriving a data.frame for use with MaxEnt. Notice geopoint is expanded # to two columns in the data.frame: gepoint.lat and geopoint.lon: df <- idig_search_records(rq=list(genus="acer", geopoint=list(type="exists")), fields=c("uuid", "geopoint"), limit=10) write.csv(df[c("uuid", "geopoint.lon", "geopoint.lat")], file="acer_occurrences.csv", row.names=FALSE) ## End(Not run)
## Not run: # Simple example of retriving records in a genus: idig_search_records(rq=list(genus="acer"), limit=10) # This complex query shows that booleans passed to the API are represented # as strings in R, fields used in the query don't have to be returned, and # the syntax for accessing raw data fields: idig_search_records(rq=list("hasImage"="true", genus="acer"), fields=c("uuid", "data.dwc:verbatimLatitude"), limit=100) # Searching inside a raw data field for a string, note that raw data fields # are searched as full text, indexed fields are search with exact matches: idig_search_records(rq=list("data.dwc:dynamicProperties"="parasite"), fields=c("uuid", "data.dwc:dynamicProperties"), limit=100) # Retriving a data.frame for use with MaxEnt. Notice geopoint is expanded # to two columns in the data.frame: gepoint.lat and geopoint.lon: df <- idig_search_records(rq=list(genus="acer", geopoint=list(type="exists")), fields=c("uuid", "geopoint"), limit=10) write.csv(df[c("uuid", "geopoint.lon", "geopoint.lat")], file="acer_occurrences.csv", row.names=FALSE) ## End(Not run)
Top media records summaries.
idig_top_media(rq = FALSE, mq = FALSE, top_fields = FALSE, count = 0, ...)
idig_top_media(rq = FALSE, mq = FALSE, top_fields = FALSE, count = 0, ...)
rq |
iDigBio record query in nested list format |
mq |
iDigBio media query in nested list format |
top_fields |
vector of field names to summarize by |
count |
maximum number of results to return, capped at 1000 |
... |
additional parameters |
Summarize the count of media records in iDigBio according to unique values in the fields passed. This operates similarly to a SELECT DISTINCT count( field_name) query in SQL. When multiple fields are passed, the summaries are nested eg fields=c("country", "genus") would result in counting the top 10 genera in each of the top 10 countries for a total of 100 counts.
nested list of field values with counts of media records
Matthew Collins
Top specimen records summaries.
idig_top_records(rq = FALSE, top_fields = FALSE, count = 0, ...)
idig_top_records(rq = FALSE, top_fields = FALSE, count = 0, ...)
rq |
iDigBio record query in nested list format |
top_fields |
vector of field names to summarize by |
count |
maximum number of results to return, capped at 1000 |
... |
additional parameters |
Summarize the count of specimen records in iDigBio according to unique values in the fields passed. This operates similarly to a SELECT DISTINCT count(field_name) query in SQL. When multiple fields are passed, the summaries are nested eg fields=c("country", "genus") would result in counting the top 10 genera in each of the top 10 countries for a total of 100 counts.
nested list of field values with counts of specimen records
Matthew Collins
Return base URL for the API calls.
idig_url(dev = FALSE)
idig_url(dev = FALSE)
dev |
Should be the beta version of the API be used? |
Defaults to use beta URL. Not exported.
string for the URL
Francois Michonneau
Stub function for validating parameters.
idig_validate(inputs)
idig_validate(inputs)
inputs |
list of inputs to validate |
Takes list of inputs named by validation rule eg:
number:[2, 3]
and returns
a vector of strings with any validation errors. If the vector is 0 length,
everything is valid. Not exported.
boolean
Matthew Collins
Return the version number to use for the API calls.
idig_version(version = "v2")
idig_version(version = "v2")
version |
optional argument giving the version of the API to use |
The current default is "v2". Not exported.
string for the version to use
Francois Michonneau
View individual media records.
idig_view_media(uuid, ...)
idig_view_media(uuid, ...)
uuid |
uuid of media record |
... |
additional parameters |
View all information about a specific media record.
nested list of data
Matthew Collins
View individual specimen records.
idig_view_records(uuid, ...)
idig_view_records(uuid, ...)
uuid |
uuid of specimen record |
... |
additional parameters |
View all information about a specific specimen record.
nested list of data
Matthew Collins
Stub function for passing import checks
ignore_unused_imports()
ignore_unused_imports()
Retrieve data from the iDigBio specimen data repository.
ridigbio provides an interface to the iDigBio data API described here: https://www.idigbio.org/wiki/index.php/IDigBio_API. With this package you can retrieve specimen and media records from the iDigBio data repository. The iDigBio portal https://portal.idigbio.org/ uses the same API so you should be able to retrieve the same information as shown in the portal.
iDigBio contains nearly 30 million data records on museum specimens held at United States institutions. It also holds nearly 5 million images of these specimens.
The main function is idig_search_records
and reviewing its
documenation first with ?idig_search_records
is recommended.
This package does not yet provide an interface to the mapping or the download APIs.
To cite the ridigbio package in your work, please use the following format:
Michonneau F, Collins M, Chamberlain SA (2016). ridigbio: An interface to iDigBio's search API that allows downloading specimen records. R package version 0.3.8. https://github.com/iDigBio/ridigbio
Francois Michonneau [email protected]
Matthew Collins [email protected]