Skip to content

Conversation

@lionel42
Copy link

@lionel42 lionel42 commented Nov 6, 2025

Hello,

We are the laboratory for Air Pollution of Empa and we would like to contribute to MassBank with our spectras.

I wanted to test the format locally but ran into issues with the check software. see MassBank/MassBank-web#414 and MassBank/MassBank-web#413

This is just a draft for now, we have hundreds of spectra to upload, but we wanted first to ask about the format and the metatdata.

I created names and identifiers for our lab: EAP for Empa Air Pollution

Happy to receive any feedback ;)

@lionel42
Copy link
Author

lionel42 commented Nov 6, 2025

I have opened an issue for asking help.

I will be away for one week (holidays) so I will continue working on this later on.

@lionel42
Copy link
Author

lionel42 commented Nov 6, 2025

One point that i find wierd is that the validator seems to not like the Accession strings

@schymane
Copy link
Member

schymane commented Nov 6, 2025

One point that i find wierd is that the validator seems to not like the Accession strings

You can find the details about how to construct the Accession IDs here:
https://github.com/MassBank/MassBank-web/blob/main/Documentation/MassBankRecordFormat.md#2.1.1

It appears that you've put the name in the Accession, whereas we expect a number, e.g.: ACCESSION: MSBNK-AAFC-AC000101

@schymane
Copy link
Member

schymane commented Nov 6, 2025

I have opened an issue for asking help.

I will be away for one week (holidays) so I will continue working on this later on.

Please note that we have detailed record specifications to help explain what is needed in the various record entries:
https://github.com/MassBank/MassBank-web/blob/main/Documentation/MassBankRecordFormat.md#table-1--massbank-record-format-summary
...and then lots of details and examples in subsequent subsections.

It seems from the validation output that at least one other compulsory field is missing: AC$INSTRUMENT

The IPB Halle team are at BioHackEU25 this week, so they are a bit distracted, but will look into this once they are back.

@lionel42
Copy link
Author

@schymane Thanks for the answers, I managed to fix the format of our files.

Before I add the whole library, is it possible to confirm/register our laboratory and the prefix ? do you need any additional information from our side ?

@meier-rene
Copy link
Collaborator

Hi Lionel,
do you consider this contribution as complete? At the moment there are just two little issues left. One space too much and an empty table with peak annotations which needs to go. If yes, I can finish this minor things and merge your contribution. We also maintain a table with our contributors: https://github.com/MassBank/MassBank-data/blob/dev/List_of_Contributors_Prefixes_and_Projects.md. It would be welcome if you tell me what you want to see there or I will guess something for you.
Best, Rene

@lionel42
Copy link
Author

Hi Lionel, do you consider this contribution as complete? At the moment there are just two little issues left. One space too much and an empty table with peak annotations which needs to go. If yes, I can finish this minor things and merge your contribution. We also maintain a table with our contributors: https://github.com/MassBank/MassBank-data/blob/dev/List_of_Contributors_Prefixes_and_Projects.md. It would be welcome if you tell me what you want to see there or I will guess something for you. Best, Rene

Hi Rene,

Thanks for reaching out,

we would still need more time (we want to go manually though all files to do a quality check.
Also we build them automatically, so I will try to fix the 2 issues in our code.

About the table of contributors, we discussed and suggest the following :

  • Database: Empa_Air_Pollution
  • Research Group / Research Project: Empa - Laboratory for Air Pollution / Environmental Technology
  • Country: Switzerland
  • Prefix of ID: EAP
  • Project Tag: HALOHUNTER

I had initially also changed in the file in the PR, should I do it this way or do you want to update it from a separate PR ?

We will notify you when ready to merge ;)

AC$CHROMATOGRAPHY: KOVATS_RTI 818
PK$SPLASH: splash10-000t-9000000000-90ef1466a5c67cf33c97
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
49.98421 1 49.99178 151.43 H3CCl+ 0.76

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete or review. unlikely to be H3CCl+ from structure and if so missing isotope signal

69.94142 1 69.93716 -60.95 Cl2+ 1.00
71.93848 1 71.93421 -59.40 Cl[37Cl]+ 0.77
81.94018 1 81.93716 -36.89 CCl2+ 1.00
83.94540 1 83.93421 -133.34 CCl[37Cl]+ 0.60

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird, twice in a row 83.94540 m/z with two different assignments? Also, NIST spectrum has strong signal at 83 m/z HCCl2, but here it is absent?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are generally more than one formula assigned to a given mass? I see that up to 3 formulas assigned per formula for this compound (some other compounds have up to 4 assignments. Do we want that? Seems weird to me

81.94018 1 81.93716 -36.89 CCl2+ 1.00
83.94540 1 83.93421 -133.34 CCl[37Cl]+ 0.60
83.94540 2 83.95281 88.23 H2CCl2+ 0.86
85.94626 1 85.94986 41.85 H2CCl[37Cl]+ 1.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIST has a strong 85 m/z signal (maybe H3CCl2)...but it is absent here? --> OH no I see that the unassigned peaks are listed separately below...but this seems silly to me that two of the most abundant peaks 83 and 85 m/z are not assigned and not listed here...

93.93877 1 93.93716 -17.17 C2Cl2+ 0.57
94.94653 1 94.94498 -16.31 HC2Cl2+ 1.00
95.95457 1 95.94834 -64.96 HC[13C]Cl2+ 0.01
95.95457 2 95.95281 -18.37 H2C2Cl2+ 1.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is number 1 and number 2 assignment decided? It looks like intensity_fraction shows how much of the mass is assignable to the formula, wouldn't it make more sense to have the higher intensity_fraction assigned as 1?

AC$CHROMATOGRAPHY: KOVATS_RTI 566
PK$SPLASH: splash10-002o-9000000000-17c33adb4eb05f58d77f
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
23.98798 1 0.00000 0.00 - 0.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's happening here? something seems wrong...

AC$CHROMATOGRAPHY: KOVATS_RTI 536
PK$SPLASH: splash10-002o-9000000000-47ee5923217847ad72ce
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
23.98803 1 0.00000 0.00 - 0.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here there is also something wrong all 0s

AC$CHROMATOGRAPHY: KOVATS_RTI 415
PK$SPLASH: splash10-03fr-9000000000-a2695f0ec68bc37ebdd1
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
23.98785 1 0.00000 0.00 - 0.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all 0s problem

AC$CHROMATOGRAPHY: KOVATS_RTI 396
PK$SPLASH: splash10-0udi-3900000000-5d7701f39c27b4d50277
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
42.99847 1 42.99785 -14.31 C2F+ 0.98

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so many peaks and so few assignments, what's going on here?

AC$CHROMATOGRAPHY: KOVATS_RTI 200
PK$SPLASH: splash10-004i-9000000000-bf642cfd2c96c56eb2e0
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
26.01452 1 0.00000 0.00 - 1.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something wrong many 0s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants