-
Notifications
You must be signed in to change notification settings - Fork 1
iDigBio LD
The goal of this project is prototype an automated workflow to enrich or link all existing iDigBio specimen records to external taxonomies, ontologies or vocabularies.
The result allows to answer requests like:
Give me all iDigBio records that corresponds to a fungal taxon with MycoBank 123.
Give me all names that did not match (misspelling, typo) against either MycoBank or any of the global names data sources.
Give me all iDigBio records that contains outdates names.
Give me all records that might describe a species interaction.
| Component | Name | Status |
|---|---|---|
| Archive Processor | Jorrit | |
| Name Normalizer | Dima | |
| GUID Generator | Dima | |
| Global Names Resolver | Dima | |
| MycoBank Resolver | Scott | |
| iDigBio LD | ? | |
| iDigBio LD Web App | John |
Python, Node.js, Apache Spark or similar.
Something like:
def process(Dict: specimenRecord):
var enrichedRecord = specimenRecord.copy()
# do something like
var someName = specimenRcord.get("dwc:scientificName")
enrichedRecord.update({"external:id": lookup(someName)})
return enrichedDictionary;We create UUID v5 out of scientific name strings and use them as identifiers of these strings.
| source | UUID | URL |
|---|---|---|
| GlobalNames | 16f235a0-e4a3-529c-9b83-bd15fe722110 | http://gni.globalnames.org/name_strings/16f235a0-e4a3-529c-9b83-bd15fe722110 |
| GlobalNames | 813583ad-c364-5c15-b01a-43eaa1446fee | http://gni.globalnames.org/name_strings/813583ad-c364-5c15-b01a-43eaa1446fee |
| GBIF | 215 | http://www.gbif.org/species/215 |
| GBIF Image | GBIF:215 | http://api.globalbioticinteractions.org/images/GBIF:215 |
| iDigBio | 00f8efa0-75ee-45c1-a88d-8a853705c6dd | http://beta-search.idigbio.org/v2/view/records/00f8efa0-75ee-45c1-a88d-8a853705c6dd |
| GenBank | 9a7d8ad8-60ec-48a0-9b36-a9cc0cf0b223 | http://www.ncbi.nlm.nih.gov/nuccore/?term=AY803322+OR+AV50248+OR+HM583371 |
Here is the Regular Expression for extracting GB Accession numbers:
[a-zA-Z]{1,2}\-?_?\d{5,6}
| iDigBio UUID | dwc:associatedSequences field | Extracted Accessions | NCBI Search Link |
|---|---|---|---|
| 4c5f122d-4686-4514-bf94-c38ecb4e98ab | GenBank FJ266907 (cytb) GenBank FJ267193 (ND4) | FJ266907|FJ267193 | http://www.ncbi.nlm.nih.gov/nuccore/?term=FJ266907+OR+FJ267193 |
| 4d4b08ca-a552-481b-b3c4-7819548880eb | http://www.ncbi.nlm.nih.gov/nuccore/AF285919 ; http://www.ncbi.nlm.nih.gov/nuccore/AF285941 | AF285919|AF285941 | http://www.ncbi.nlm.nih.gov/nuccore/?term=AF285919+OR+AF285941 |
| 0b089c97-e451-4f0d-a8ea-940582096f38 | , , , , , , , | null | null |


