Skip to content

Conversation

@CarolineDenis
Copy link
Contributor

@CarolineDenis CarolineDenis commented Jun 16, 2025

Fixes #2931
Fixes #4832
Fixes #6210

Adds an initial set up page for configuring your database much like the Spwizard for Specify 6.

image

TODO:

  • Title colors are white-on-white.
  • Make certain fields required (at least for user)
  • Error on backend doesn't stop you from progressing onto the next step.
  • Defaults for fields (So certain tree ranks can be true by default)
  • Progress indicator?
  • Backend should tell frontend which step to go to next. Needed to conditionally skip geography tree setup pages.
  • Setup can be finished before schema is finished being created. Need to have some sort of waiting page, or speed improvement? Thinking I can change the setup so you fill out the forms first, then at the very end the database is created.
  • Use new default files Feat (setup tool): Create default files for the new setup tool #7444
  • Create trees
  • Add documentation links
  • Only use localized strings
  • Add better error handling. User should be notified and sent back to the forms if anything goes wrong.
  • Check if worker is set up before continuing.
  • specifyuser_spprincipal table needs to exist before the setup process works on a new sp7 database. It also looks like the admin user isn't actually a proper admin. (fixed)

Future TODO:

  • Password confirmation input doesn't work if you input the password into it first, then fill out the normal password field.
  • Notification polling seems to be causing errors on the backend when lingering on the setup page. Will investigate.
  • Need schema for API
  • Save form to localStorage. Also change the Save & Continue text maybe.
  • Normalize all keys (recursively) on backend api requests
  • Institution should have a value for all treedef fields once all trees are created. There should be api.finalize_institution or something similar.
  • There should be a functioning progress bar during the setup.
  • Speed up schema config localization setup. This is the biggest bottleneck in the setup process and it can be sped up significantly.
  • Fix two geography tree forms showing up on the overview sidebar.
  • Add documentation links and descriptions to all resources
  • Move over default tree creation functions to tree folder

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests
  • Add a reverse migration if a migration is present in the PR

Testing instructions

  • Use a blank DB to test this PR.
  • Try to create dbs with different disciplines. At least a paleo/geo discipline and a non-paleo/geo discipline.
  • Fill out the setup forms
    • Make sure the forms look good in light mode and dark mode (It uses your system settings.)
    • You can only progress to the next form if you filled out all required fields.
    • Make sure your choices are shown in the Overview sidebar.
    • Make sure you can submit at the end.
  • Make sure you can log into the database.
  • NOTE: The dialog about the DB schema is expected behavior in this PR
  • Make sure the schema config defaults were applied correctly.
  • Make sure default picklists were created correctly.
  • Make sure prep types were created correctly.
  • Make sure all tree viewer pages load.
  • Make sure the admin user's agent was created correctly. It should always have a last name.
  • Local testing:
    • Use an empty DB again (restore it or use a different one).
    • Stop the specify worker and try to start the setup. You should get a clear error on the frontend.
    • Make sure API endpoints work. Look at the stetup_tool section in http://localhost/documentation/api/operations/all/ for more information.

@grantfitzsimmons
Copy link
Member

Any specifyuser can be set as an institution admin by inserting the following, replacing the specifyuser_id with the ID of the newly created user:

INSERT INTO spuserpolicy (id, resource, action, collection_id, specifyuser_id) VALUES (1, '%', '%', null, 1);

@CarolineDenis
Copy link
Contributor Author

CarolineDenis commented Jul 9, 2025

NOTES:

@CarolineDenis
Copy link
Contributor Author

Small UI notes:
Could we center the subsections?
Screenshot 2025-12-03 at 10 48 27 AM
Screenshot 2025-12-03 at 10 47 47 AM

There are some issues with the bottom of the page
Screenshot 2025-12-03 at 10 46 32 AM

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing instructions

  • Use a blank DB to test this PR.

New DB was created!

  • Fill out the setup forms
    • Make sure the forms look good in light mode and dark mode (It uses your system settings.)
    • You can only progress to the next form if you filled out all required fields.
    • Make sure your choices are shown in the Overview sidebar.
    • Make sure you can submit at the end.

Some database restrictions (e.g. max field length) are not accounted for, say for instance when you enter more text than is able to fit in a field (e.g. institution.Name)
Image

  • Make sure you can log into the database.

There is still a reported schema mismatch (it expects 6.8.03, returning 7)
Image

  • Make sure the schema config defaults were applied correctly.
  • Make sure default picklists were created correctly.
  • Make sure prep types were created correctly.
  • Make sure all tree viewer pages load.

When creating a 'Herbarium' collection, my Taxon tree looks like this:

Image Image

See Geography as well:

Image

Same story for Storage...

  • Local testing:
    • Use an empty DB again (restore it or use a different one).
    • Stop the specify worker and try to start the setup. You should get a clear error on the frontend.
Image

It just pauses on loading Institution?

- [X] Make sure API endpoints work. Look at the stetup_tool section in http://localhost/documentation/api/operations/all/ for more information. You should be able to manually set up a blank database by creating these resources in order.
    - [X] `setup_tool/institution/create`
    - [X] `setup_tool/division/create`
    - [X] `setup_tool/discipline/create`
    - [X] `setup_tool/collection/create`
    - [X] `setup_tool/specifyuser/create`

@github-project-automation github-project-automation bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Dec 5, 2025
@alesan99
Copy link
Contributor

alesan99 commented Dec 12, 2025

@grantfitzsimmons thanks for the review!

Some database restrictions (e.g. max field length) are not accounted for, say for instance when you enter more text than is able to fit in a field (e.g. institution.Name)

Fixed for institution name and password 👍
It was off by one (256 rather than varchar(255)), so the rest should be good.

  • Make sure all tree viewer pages load.

Once the tree PR is merged I can create a PR to properly create the trees.
I can make an issue detailing what I should add.

EDIT: #7593 <- to be addressed once the tree creation PR is merged

It just pauses on loading Institution?

Hm I'm getting an error (it's not descriptive, though). I might need to look at your set up to see if the check isn't working or if the error isn't being reported.

image

EDIT: Hopefully addressed by my new changes

Fix key not being lowercase
Triggered by a895d0c on branch refs/heads/issue-2931-1
Triggered by 92dcf63 on branch refs/heads/issue-2931-1
Triggered by 72763a3 on branch refs/heads/issue-2931-1
'insert into specifyuser_spprincipal(specifyuserid, spprincipalid) values (%s, %s)',
[user.id, gp.id]
)
# TODO: UNCOMMENT THIS. Commented specifically for testing PR https://github.com/specify/specify7/pull/6671
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

WHERE Name = 'Administrator'
)
""", [self.id])
# TODO: UNCOMMENT THIS. Commented specifically for testing PR https://github.com/specify/specify7/pull/6671
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Copy link
Contributor

@melton-jason melton-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far! 🚀

I've only reviewed the functionality and backend code, I've yet to finish a review on the frontend.

A lot of the "required" changes I've noticed should be fairly easy to undertake and were relatively small, or related to database structure from #6389.
My larger comments were mainly related to code maintainability, compatibility, and security.

Let me know if you have any questions with these comments!

Comment on lines 34 to 35
if not spmodels.Institution.objects.exists():
return view(request, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, as long as there is not an Institution, all requests are unauthenticated and allowed.

Practically, this would mean an attacker could trivially do something like:

  • Create their own Institution, Division, Discipline, Collection, and SpecifyUser

And theoretically:

  • Create any other record in the database without going through the setup tool
  • Including their own SpecifyUser with Institution Admin permissions

For the current setup process to be secure with no other changes, an administrator would need to perform the setup tool locally and not expose the server to the internet until the setup process is complete.
This may be infeasible for some setups where the Specify instance is only accessible remotely; configuring networking to allow a specific machine to connect (and removing that rule after setup is complete) might be unreasonably complex and place too much of the burden on the user to setup Specify.

Here are two more secure approaches which might be sufficient:

  • Have a password or similar setting that can be set before the container is started that an administrator can use to "login" to the Setup page
  • Similar to feat: add ALLOW_SUPPORT_LOGIN to Dockerfile #7399, generate a token at container startup which gets emitted (in the logs?) that the user can use to create an authenticated session

It would be very tricky to have authentication entirely handled by the browser. Though a user would only need to authenticate once for a secure session (unless the session expires, e.g., via closing the browser).

With Django's authentication systems, we can authenticate a user without having a corresponding SpecifyUser.

This would involve setting up a SetupTool Authentication Backend (and maybe a middleware, that can be disabled once the setup tool process is complete) that would handle whichever method we choose to verify the identity of the admin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the changes to login_maybe_required 👍
cf48ece

This leaves only the new setup endpoints exposed, still not the best. Leaving this open for discussion to see what everyone thinks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

The complexity of update_table_schema_config_with_defaults and update_table_field_schema_config_with_defaults have increased quite a bit in this branch.
Especially with calls to update_table_schema_config_with_defaults, this added complexity can make it somewhat difficult to parse and understand what could be happening, so I think it might be time to reconsider refactoring these helper functions.

Below is something like a Control Flow Graph (CFG) depicting the work done compared to main (left) and this branch (right) for creating the description of a table when calling update_table_schema_config_with_defaults.

flowchart TD
    A("description: str | None")
    A -- == None --> B(Default Table Name)
    A -- != None --> C
    B --> C(Container Desc)

    D("defaults: dict | None")
    D -- == None --> E(Empty Dict)
    D -- != None --> G("defaults.get(tablename)")
    G --> F
    E --> F(Table Defaults)
    F --has desc--> H("default.get(desc)")
    F -- missing desc --> I(Default Table Name)
    H --> J(Container Desc)
    I --> J
Loading

Similarly, below is a CFG for the process of creating the field labels when calling update_table_schema_config_with_defaults in main (left) and this branch (right).

Nodes in green depict the work performed in update_table_schema_config_with_defaults, and nodes in orange depict the work performed in update_table_field_schema_config_with_defaults.

flowchart TD
    classDef fieldLvl stroke:#ffa500
    classDef tblLvl stroke:#0f0

    A(Get table from datamodel)
    A --> B(Collect fields from table)
    B -- for each field --> C(Convert name to Default Field Label)
    C --> CC(Create field label)

    class A,B tblLvl;
    class C,CC fieldLvl;

    D("default:s dict | None")
    E("overrides: dict | None")
    D --> AA(Get table from datamodel)
    E --> AA
    AA -- table in defaults --> G(field_overrides += defaults)
    G -- table in overrides --> H(Get fields from overrides)
    H -- for field in overrides --> I(field_overrides += field)
    I --> BB(Collect fields from table)
    AA -- table not in defaults --> BB
    G -- table not in overrides --> BB
    BB -- for each field --> J(Label = Convert name to Default Field Label)
    J -- field in field_overrides --> K("Label = field_override")
    K --> L(Create field label)
    J -- field not in field_overrides --> L

    class D,E,AA,G,H,I,BB tblLvl;
    class J,K,L fieldLvl;
Loading

It looks to me like our current functions are solely responsible for a lot of work.
I think adopting the Single responsibility principle with the Chain of responsibility pattern might be beneficial here.

Dividing the required functionalities into the smallest unit of work, I think there are four main responsibilities with our current functions:

  • Creating the SpLocaleContainer (table in schemaconfig)
  • Creating the table label and description (the SpLocaleItemStr for each)
  • Creating the SpLocaleContainerItem (field in schemaconfig)
  • Creating the field label and description (the SpLocaleItemStr for each)

Each of these can be converted to a function as the "basic building blocks" and used as needed: either in helper functions for common use cases or directly by the caller if more fine-grained behavior is desired.

For example, the following might be how I would refactor our update_table_schema_config_with_defaults

Suggested Implementation in this dropdown

The overall design/CFG:

flowchart TD
	classDef main stroke:#FF00FF
    classDef table stroke:#f00
    classDef tableStr stroke:#FFA500
    classDef field stroke:#228B22
    classDef fieldStr stroke:#4169E1

    A("Get table from datamodel") --> B("Create SpLocaleContainer")
    B --> C("Create table label and descriptions")
    C --> D("Gather fields from table")
    D -- for each field --> E("Create SpLocaleContainerItem")
    E --> F("Create field label and description")
	
	class A,D main;
	class B table
    class C tableStr;
    class E field;
    class F fieldStr;
Loading

A basic implementation might look like the following:

def create_default_table_schema_config(
        table_name: str,
        discipline_id: int,
        apps
):
    table = datamodel.get_table(table_name)
    if table is None: 
        logger.warning(
            f"Table does not exist in latest state of the datamodel, skipping Schema Config entry for: {table_name}"
        )
        return
    
    # If we need a more complicated datatype- like TableSchemaConfig- we can define/create it here
    # and pass it to the basic functions
	
	# creates and returns the SplocaleContainer
    container = create_table_config(apps, table, discipline_id)
    create_table_strings(apps, container, table)

    # It'd be very easy to define another function that's just a wrapper for creating the Container or Item 
    # and corresponding strings
    # e.g., container = create_table_locale(apps, table, discipline_id)
	# And for an Item and it's strings:
    #       item = create_field_locale(apps, field, container)

    for field in table.all_fields:
		# creates and returns the SpLocaleContainerItem
        item = create_field_config(apps, field, container)
        create_field_strings(apps, item, field)

The above implementation would be only for creating with the "defaults" .
If a caller has a non-default table or field description, we can easily define another helper which accepts a TableSchemaConfig object or similar and handles creating the object(s) (we could then even simplify the above implementation even further by using said helper).

Here's how a more general implementation mirroring the current implementation might look like:

def create_table_schema_config(
        config: TableSchemaConfig,
        discipline_id: int,
        apps
):
    table = datamodel.get_table(config.table_name)
    if table is None:
        logger.warning(
            f"Table does not exist in latest state of the datamodel, skipping Schema Config entry for: {config.table_name}"
        )
        return
    
    container = create_table_config(apps, config, discipline_id)
    create_table_strings(apps, container, config)

    for field in table.all_fields:
        field_config = config.fields.get(field.name, None)
        item = create_field_config(apps, field, container, field_config)
        create_field_strings(apps, item, field, field_config)

# This can be used like the following:

# Everything not defined here is assumed to be the "default" value
my_config = TableSchemaConfig(
    table_name="SomeTable",
    label="Custom Name",
    fields={
        "text1": FieldSchemaConfig(
            description="Use this as needed!"
        )
    }
)
create_table_schema_config(my_config, discipline_id, apps)

Essentially, the goal is to keep implementations simple with a single responsibility. With the basic building blocks and helpers for common use cases, the caller has a clean interface regardless of their required complexity.


I could also absolutely see this taken with an OOP approach (blame it on my Java background 😅).
The following is a conceptual example of an OOP implementation using something similar to the Builder pattern:

my_schema = TableSchemaConfig(label="My table", description="Hello")

# nearly all methods of TableSchemaConfig are focused on manipulating the
# underlying data structure, and explicitly doesn't touch the database
my_schema.add_field("text1", FieldSchemaConfig(label="Method"))

# this call is what would actually create the container, the items, and the strings
my_schema.create()

Instead of single responsibility and building up from the smallest building blocks, the intuition and focus behind this approach would be separation of data and concrete instantiation. We have a flexible, complicated data structure (the schema config tables) and a clear process/methodology of creating it. Essentially, we let the caller build out how it wants the schema to look like before committing to its creation.

This is all just how I would refactor these functions to make them more usable and maintainable.

There's no "right" way of doing this, so if you have other ideas about how to simplify or refactor these functions, I'd love to hear them and absolutely go for it 🚀

Apply schema config localization defaults for this discipline.
"""
# Get default schema localization
defaults = load_json_from_file(Path(__file__).parent.parent.parent.parent / 'config' / 'common' / 'schema_localization_en.json')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

Could we simplify these defaults? I imagine the vast majority of these items already fall within the heuristic used by update_table_schema_config_with_defaults to determine the defaults.
It might be easier from a maintainability and performance standpoint if the defaults from the file contained only the elements which should explicitly not follow these defaults.

Provided is a link to the original PR for convenience: #7444

You can use the following query to get the schema config information for all fields in a Specify database:

SELECT its.Language,
    spc.name AS 'Table Name',
    itm.name AS 'Field Name',
    its.Text AS 'Table Label',
    itd.Text AS 'Table Desc',
    itms.Text AS 'Field Label',
    itmd.Text AS 'Field Desc',
    itm.type AS 'Field Type',
    CASE
        WHEN itm.IsRequired = 1 THEN 'True'
        WHEN itm.IsRequired = 0 THEN 'False'
        ELSE itm.IsRequired
    END AS 'Field Is Required',
    CASE
        WHEN itm.IsHidden = 1 THEN 'True'
        WHEN itm.IsHidden = 0 THEN 'False'
        ELSE itm.IsHidden
    END AS 'Field Is Hidden',
    itm.Format AS 'Field Format',
    itm.PickListName AS 'Field PickList',
    itm.WebLinkName AS 'Field WebLink'
FROM splocalecontainer AS spc
    LEFT OUTER JOIN splocaleitemstr AS its ON its.SpLocaleContainerNameID = spc.splocalecontainerid
    LEFT OUTER JOIN splocaleitemstr itd ON itd.SpLocaleContainerDescID = spc.splocalecontainerid
    LEFT OUTER JOIN splocalecontaineritem AS itm ON itm.SpLocaleContainerID = spc.splocalecontainerid
    LEFT OUTER JOIN splocaleitemstr itms ON itms.SpLocaleContainerItemNameID = itm.SpLocaleContainerItemID
    LEFT OUTER JOIN splocaleitemstr itmd ON itmd.SpLocaleContainerItemDescID = itm.SpLocaleContainerItemID
WHERE spc.SchemaType = 0
ORDER BY spc.DisciplineID,
    its.Language,
    spc.name,
    itm.name;

(Or alternatively use the /context/schema_localization.json endpoint).

And if you want to export the Schema from SQL as a CSV (this will output as schema_output.csv, but you can change the file path after the OUTFILE keyword):
(Note that NULL values are interpreted as the literal \N).

SELECT its.Language,
    spc.name AS 'Table Name',
    itm.name AS 'Field Name',
    its.Text AS 'Table Label',
    itd.Text AS 'Table Desc',
    itms.Text AS 'Field Label',
    itmd.Text AS 'Field Desc',
    itm.type AS 'Field Type',
    CASE
        WHEN itm.IsRequired = 1 THEN 'True'
        WHEN itm.IsRequired = 0 THEN 'False'
        ELSE itm.IsRequired
    END AS 'Field Is Required',
    CASE
        WHEN itm.IsHidden = 1 THEN 'True'
        WHEN itm.IsHidden = 0 THEN 'False'
        ELSE itm.IsHidden
    END AS 'Field Is Hidden',
    itm.Format AS 'Field Format',
    itm.PickListName AS 'Field PickList',
    itm.WebLinkName AS 'Field WebLink'
FROM splocalecontainer AS spc
    LEFT OUTER JOIN splocaleitemstr AS its ON its.SpLocaleContainerNameID = spc.splocalecontainerid
    LEFT OUTER JOIN splocaleitemstr itd ON itd.SpLocaleContainerDescID = spc.splocalecontainerid
    LEFT OUTER JOIN splocalecontaineritem AS itm ON itm.SpLocaleContainerID = spc.splocalecontainerid
    LEFT OUTER JOIN splocaleitemstr itms ON itms.SpLocaleContainerItemNameID = itm.SpLocaleContainerItemID
    LEFT OUTER JOIN splocaleitemstr itmd ON itmd.SpLocaleContainerItemDescID = itm.SpLocaleContainerItemID
WHERE spc.SchemaType = 0
ORDER BY spc.DisciplineID,
    its.Language,
    spc.name,
    itm.name INTO OUTFILE 'schema_output.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is ready to be merged we'll have to do some setup in Weblate to make these strings accessible to translators. You largely only have to do this when creating a new localization directory - which corresponds to a new Component on Weblate.

There's no instructions for adding a new component for in our https://github.com/specify/specify7/blob/main/specifyweb/frontend/js_src/lib/localization/README.md currently, so I'll modify the file to include that documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not creating any default App Resources in the database at the moment.

Image

This is probably fine for stuff like View Sets, TypeSearches, UIFormatters, etc., which exist as defaults on disk, but there is no default Remote or Global Preferences.

Do you know if these are these going to be addressed in something like #7501, or is generally expected that the preference files exist on data creation?
If these changes are going go to be included in a release before being addressed, I think we at least need to create the Remote Preferences so Specify has access to settings like the Date Format (ui.formatting.scrdateformat), Tree Count thresholds (TreeEditor.Rank.Threshold.<TREE>, related #6844), and allow users to configure things like automatically creating resources and other Remote Preferences

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

6 participants