Identification of Data Flags

Code here written by Erica Krimmel.

General Overview

In this use case for the iDigBio API we look at how to search for specimen records that have a specific data quality flag. See here for more information about iDigBio’s data quality flags.

In this demo we will cover how to:

  1. Write a query to search for specimens using idig_search_records
  2. Explore data quality flags

Load Packages

# Load core libraries; install these packages if you have not already
library(ridigbio)
library(tidyverse)

# Load library for making nice HTML output
library(kableExtra)

Write a query to search for specimen records

First, let’s find all the specimen records for the data quality flag we are interested in. Do this using the idig_search_records function from the ridigbio package. You can learn more about this function from the iDigBio API documentation and ridigbio documentation.

In this example, we want to start by searching for specimens flagged with “rev_geocode_flip” which means that iDigBio has swapped the values of the latitude and longitude fields in order to place the coordinate point in the country stated by the record. For example, iDigBio ingests a record with the coordinates “-87.646166, 41.89542” that says it was collected in the United States, but the verbatim coordinates actually plot to Antarctica. If the latitude and longitude are flipped, then the coordinates plot to the United States, so iDigBio assumes that this is what the data provider meant.

# Edit the fields (e.g. `flags` or `institutioncode`) and values (e.g. 
# "rev_geocode_flip" or "fmnh") in `list()` to adjust your query and the fields
# (e.g. `uuid`) in `fields` to adjust the columns returned in your results
records <- idig_search_records(rq = list(flags = "rev_geocode_flip",
                                              institutioncode = "fmnh"),
                    fields = c("uuid",
                               "institutioncode",
                               "collectioncode",
                               "country",
                               "data.dwc:country",
                               "stateprovince",
                               "county",
                               "locality",
                               "geopoint",
                               "data.dwc:decimalLongitude",
                               "data.dwc:decimalLatitude"),
                    limit = 100000) %>% 
  # Rename fields to more easily reflect their provenance (either from the
  # data provider directly or modified by the data aggregator)
  rename(provider_lon = `data.dwc:decimalLongitude`,
         provider_lat = `data.dwc:decimalLatitude`,
         provider_country = `data.dwc:country`,
         aggregator_lon = `geopoint.lon`,
         aggregator_lat = `geopoint.lat`,
         aggregator_country = country,
         aggregator_stateprovince = stateprovince,
         aggregator_county = county,
         aggregator_locality = locality) %>% 
  # Reorder columns for easier viewing
  select(uuid, institutioncode, collectioncode, provider_lat, aggregator_lat,
         provider_lon, aggregator_lon, provider_country, aggregator_country,
         aggregator_stateprovince, aggregator_county, aggregator_locality)

Here is what our query result data looks like:

uuid institutioncode collectioncode provider_lat aggregator_lat provider_lon aggregator_lon provider_country aggregator_country aggregator_stateprovince aggregator_county aggregator_locality
04dba613-bb9a-4281-8dba-eb4bf59cd777 fmnh mammals -88.107013 41.86614 41.86614 -88.107013 United States of America united states illinois dupage wheaton
05679624-d82c-4488-bd4b-ab13f40abb0b fmnh mammals 75 38.00000 38 75.000000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
0bdf0231-dae7-4de5-a43b-c756e96cb74e fmnh mammals -87.818397 42.03420 42.034196 -87.818397 United States of America united states illinois cook co. pheasent and harlem
0de28396-f117-4a0f-bca7-0d08cc58dc5a fmnh mammals -88.140531 41.79461 41.79461 -88.140531 United States of America united states illinois dupage co. naperville, 1520 maple knoll ct.
109555a6-3fcf-43ec-ae83-450ea6e85e5e fmnh fishes -80.85 -6.45000 -6.45 -80.850000 Peru peru NA NA lobos de tierra bay
1252e5dc-1fe6-4d78-a775-ba4c0ae5af67 fmnh mammals -88.067012 41.87753 41.877529 -88.067012 United States of America united states illinois dupage co. roosevelt & park
1561e1ce-23b9-43ab-a59c-2b299037f5b2 fmnh mammals -87.973949 41.75198 41.751975 -87.973949 United States of America united states illinois dupage co. darien
1ac87a63-1df9-48c4-984b-85f52d8d1f95 fmnh mammals -88.050341 41.74697 41.746975 -88.050341 United States of America united states illinois dupage woodridge
1f734bc6-130c-48d2-b47f-26cacfa5c722 fmnh mammals 31.3999996 24.86667 24.8666706 31.400000 Egypt egypt matruh NA salum, sidi omar
22717ba0-9ec5-4e1c-88fc-26452b9cdb22 fmnh mammals 75 38.00000 38 75.000000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
27236f31-f92b-4f42-8c76-be4f38599fc7 fmnh mammals -88.050341 41.74697 41.746975 -88.050341 United States of America united states illinois dupage woodridge
2c475317-1113-4dca-a32c-9f7673026a98 fmnh mammals 31.3999996 24.86667 24.8666706 31.400000 Egypt egypt matruh NA salum, sidi omar
30fe1434-1e75-45e2-97d4-84520e0d1f90 fmnh mammals -88.107013 41.86614 41.86614 -88.107013 United States of America united states illinois dupage wheaton
33551039-8928-43fe-be46-26a2fb0f0150 fmnh invertebrate zoology -73 -41.67000 -41.67 -73.000000 Chile chile NA NA chaica, senode reloncavi, llongothue
37c644b4-b1d8-4ab4-8a82-306502700307 fmnh mammals -89.97818 42.08053 42.080535 -89.978180 United States of America united states illinois carroll co. 1 mile south of mount carroll
40f45ef7-1fc3-430e-9b84-d198ef87124a fmnh invertebrate zoology -70.012086 43.74296 43.742961 -70.012086 United States of America united states maine cumberland south harpswell
419092db-710b-4823-9ada-cef1dc27d413 fmnh mammals -87.67913 41.96874 41.968745 -87.679130 United States of America united states illinois cook damen and lawrence
422e0874-3c59-4e97-8838-ab0faed00b16 fmnh mammals -87.968099 42.27394 42.273935 -87.968099 United States of America united states illinois lake co. 911 creastfield ave.
46e7dca6-bd0f-4710-a2ae-066e47a96e59 fmnh invertebrate zoology -73 -41.66670 -41.6667 -73.000000 Chile chile NA NA llangothie, senode, relocnavi, chaica
4b6340c2-8d61-4f06-8539-0c174cd03f3b fmnh mammals 75 38.00000 38 75.000000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘subashi’ pass
4c5a8228-8b47-4c9b-b7b7-8a4748061691 fmnh mammals -88.058783 41.79092 41.790922 -88.058783 United States of America united states illinois dupage co. lisle, 5321 westview, 60532
4f4ecf74-48cd-4d44-bbec-117ce36cc805 fmnh mammals -89.869212 42.25056 42.250559 -89.869212 United States of America united states illinois stephenson co. near pearl city-loran/nw
4f56899f-6bfd-482a-89e8-d47f31ca6b73 fmnh mammals -88.107013 41.86614 41.86614 -88.107013 United States of America united states illinois dupage wheaton
5c836443-dfbc-4298-a0b9-499f587117b9 fmnh mammals -88.056212 41.88147 41.881469 -88.056212 United States of America united states illinois dupage co. glen ellyn, 735 cresent blvd.
6385b5e2-4219-4154-a4b1-aee2e297f0ee fmnh mammals -88.050341 41.74697 41.746975 -88.050341 United States of America united states illinois dupage woodridge
6b14ca0a-5a3c-4078-9629-385a0fbb0768 fmnh mammals -88.060564 41.84333 41.843331 -88.060564 United States of America united states illinois dupage co. glen ellyn, willowbrook nature trail
6ee726f3-18a8-402f-bdd1-5b8da939dfba fmnh mammals -88.107013 41.86614 41.86614 -88.107013 United States of America united states illinois dupage wheaton
7204a2b2-512b-4431-a95b-c0ed166a0633 fmnh mammals 29.75 24.83333 24.833334 29.750000 Egypt egypt matruh NA siwa oasis, el malfa swamp
7437a46e-f784-4b2b-ba13-b98254b5255b fmnh mammals 29.75 24.83333 24.833334 29.750000 Egypt egypt matruh NA el malfa, siwa, 110 km w
763c9b19-74a7-43a3-9a5b-4684fef8a585 fmnh mammals -88.174751 41.76673 41.766727 -88.174751 United States of America united states illinois dupage co. naperville, river and aurora
79ff24fc-5a16-4adc-8270-a4576176666c fmnh mammals -88.058356 41.87121 41.871205 -88.058356 United States of America united states illinois dupage co. glen ellyn, montclaire and turner
7e51a7c1-de80-4e19-82e8-231cbd440fb7 fmnh mammals -88.107013 41.86614 41.86614 -88.107013 United States of America united states illinois dupage wheaton
7e8abc27-d38a-47a6-937b-978822afc72f fmnh mammals -87.73599 41.79169 41.79169 -87.735990 United States of America united states illinois cook co. chicago, 5555 s. kolmar ave
7f5970d1-b69b-4255-a780-aed284ba1ac8 fmnh mammals -89.4903273 45.59772 45.5977178 -89.490327 United States of America united states wisconsin NA oneida, sec 29, town 36 n, range 8e
7fdb9011-93c3-4d11-a6c2-96e3a4764d19 fmnh mammals 31.3999996 24.86667 24.8666706 31.400000 Egypt egypt matruh NA salum, sidi omar
823e6998-3bc1-43b1-ab51-3f83b945219d fmnh mammals -87.670626 42.02282 42.022825 -87.670626 United States of America united states illinois cook 1550 w. juneway terrace, 60626
836d1a77-3eed-4785-8f3c-7f2bfb33d8ed fmnh mammals -88.087113 41.86226 41.862257 -88.087113 United States of America united states illinois dupage co. blanchard and illinois
920b9297-a114-4474-aa66-ddaaf6e5ca36 fmnh mammals 29.75 24.83333 24.833334 29.750000 Egypt egypt matruh NA siwa oasis, el malfa swamp
92535d43-dcaf-42b9-8e0d-6236a746847d fmnh mammals 75 38.00000 38 75.000000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
9757887b-ebef-485e-9acc-cd3ed0aa88e4 fmnh mammals -88.060564 41.84333 41.843331 -88.060564 United States of America united states illinois dupage co. glen ellyn, willowbrook nature trail
97d7edc5-17e3-44b1-8fa8-b2ccb11a9ab2 fmnh mammals -87.92895 41.83281 41.832808 -87.928950 United States of America united states illinois dupage co. kimberly and charlatan
9e1b6b23-7b91-4f95-9a95-8c7ef0a232c8 fmnh mammals -87.963927 44.52909 44.529095 -87.963927 United States of America united states wisconsin brown co. 1660 e. shore dr. 54302
9e925438-7d3f-4c68-8e5b-3406ba816543 fmnh mammals -88.011741 41.84293 41.842926 -88.011741 United States of America united states illinois dupage co. lombard, 190 oakton dr.
9f9f8d71-8ab6-4e1e-abf3-2b2219b93918 fmnh mammals 75 38.00000 38 75.000000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
a251909a-4f19-4e09-a7ca-ace1ef25bc71 fmnh mammals 75 38.00000 38 75.000000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
a251bd28-a8dc-4c12-b8e9-b5be8825d83e fmnh mammals -88.107013 41.86614 41.86614 -88.107013 United States of America united states illinois dupage wheaton
a25bcfa6-f0da-4c5d-81db-feb235afcb21 fmnh mammals -88.007844 41.88003 41.88003 -88.007844 United States of America united states illinois dupage co. lombard
a7ce2a66-865a-496b-ba4c-563c507886e0 fmnh mammals -88.261218 41.74877 41.748768 -88.261218 United States of America united states illinois kane co. 326 meadowview lane, 60502
b37a9d8d-6ba3-43d5-92ce-e6ccd69ae5a7 fmnh mammals -88.007844 41.88003 41.88003 -88.007844 United States of America united states illinois dupage co. lombard
b4495ecc-3e58-45a2-8b39-fddd4a575f85 fmnh mammals 5.3535261 52.44456 52.444561 5.353526 Netherlands netherlands flevoland prov NA oostvaardersplassen

If a data provider wants to fix these records in a local collection management system, it might be useful to have them in a CSV format rather than only in R. Here is how we can save our results as a CSV:

# Save `records` as a CSV for reintegration into a local collection management
# system
write_csv(records, "records.csv")

It is important for you as a data provider or data user to review the results of the data quality flags and confirm that iDigBio’s interpretation matches your expectations. For example, coordinates representing marine localities and localities in or near Antarctica are prone to misinterpretation.