Create a new setup page #6671

CarolineDenis · 2025-06-16T19:58:48Z

Fixes #2931
Fixes #4832
Fixes #6210

Adds an initial set up page for configuring your database much like the Spwizard for Specify 6.

TODO:

Future TODO:

Checklist

Self-review the PR after opening it to make sure the changes look good and
self-explanatory (or properly documented)
Add relevant issue to release milestone
Add pr to documentation list
Add automated tests
Add a reverse migration if a migration is present in the PR

Testing instructions

Fixes #2931

grantfitzsimmons · 2025-06-27T19:15:13Z

Any specifyuser can be set as an institution admin by inserting the following, replacing the specifyuser_id with the ID of the newly created user:

INSERT INTO spuserpolicy (id, resource, action, collection_id, specifyuser_id) VALUES (1, '%', '%', null, 1);

CarolineDenis · 2025-07-09T18:04:41Z

NOTES:

Add schema config
==> Add schema config splocalcontainer when discipline is created
Issue:
In the discipline creation backend def we need (once the disc is created):
Create a splocalcontainer for each table in models.py (fix issue mentioned above)
Use a default schema config json file to populate
Compare the default with the specific one for the discipline and update only what's needed. (See what we have in https://github.com/specify/specify6/tree/master/config/botany for each disc)
(Need to add a default schema config json in sp7 + specific ones for each discipline)
Include discipline type picklist Create a configuration tool #7312 (comment)
Create and Populate trees: some done with: https://github.com/specify/specify7/pull/6429/files
Need to do geography, storage, ... [Guided Setup] - Preload tree data at instantiation #4976
Need to add geology: [Guided Setup] - Add Geology discipline to guided setup #6279
Databases that are not used with Specify 6 are missing Remote and Global prefs

Triggered by 31211cc on branch refs/heads/issue-2931-1

CarolineDenis · 2025-12-03T15:59:25Z

Small UI notes:
Could we center the subsections?

There are some issues with the bottom of the page

grantfitzsimmons

Testing instructions

Use a blank DB to test this PR.

New DB was created!

Fill out the setup forms

Make sure the forms look good in light mode and dark mode (It uses your system settings.)

You can only progress to the next form if you filled out all required fields.

Make sure your choices are shown in the Overview sidebar.

Make sure you can submit at the end.

Some database restrictions (e.g. max field length) are not accounted for, say for instance when you enter more text than is able to fit in a field (e.g. institution.Name)

Make sure you can log into the database.

There is still a reported schema mismatch (it expects 6.8.03, returning 7)

Make sure the schema config defaults were applied correctly.

Make sure default picklists were created correctly.

Make sure prep types were created correctly.

Make sure all tree viewer pages load.

When creating a 'Herbarium' collection, my Taxon tree looks like this:

See Geography as well:

Same story for Storage...

Local testing:

Use an empty DB again (restore it or use a different one).

Stop the specify worker and try to start the setup. You should get a clear error on the frontend.

It just pauses on loading Institution?

- [X] Make sure API endpoints work. Look at the stetup_tool section in http://localhost/documentation/api/operations/all/ for more information. You should be able to manually set up a blank database by creating these resources in order.
    - [X] `setup_tool/institution/create`
    - [X] `setup_tool/division/create`
    - [X] `setup_tool/discipline/create`
    - [X] `setup_tool/collection/create`
    - [X] `setup_tool/specifyuser/create`

alesan99 · 2025-12-12T20:37:12Z

@grantfitzsimmons thanks for the review!

Some database restrictions (e.g. max field length) are not accounted for, say for instance when you enter more text than is able to fit in a field (e.g. institution.Name)

Fixed for institution name and password 👍
It was off by one (256 rather than varchar(255)), so the rest should be good.

Make sure all tree viewer pages load.

Once the tree PR is merged I can create a PR to properly create the trees.
I can make an issue detailing what I should add.

EDIT: #7593 <- to be addressed once the tree creation PR is merged

It just pauses on loading Institution?

Hm I'm getting an error (it's not descriptive, though). I might need to look at your set up to see if the check isn't working or if the error isn't being reported.

EDIT: Hopefully addressed by my new changes

Fix key not being lowercase

Triggered by a895d0c on branch refs/heads/issue-2931-1

Triggered by 92dcf63 on branch refs/heads/issue-2931-1

Triggered by 72763a3 on branch refs/heads/issue-2931-1

alesan99 · 2025-12-22T19:36:39Z

specifyweb/backend/businessrules/rules/user_rules.py

-            'insert into specifyuser_spprincipal(specifyuserid, spprincipalid) values (%s, %s)',
-            [user.id, gp.id]
-        )
+    # TODO: UNCOMMENT THIS. Commented specifically for testing PR https://github.com/specify/specify7/pull/6671


alesan99 · 2025-12-22T19:44:13Z

specifyweb/specify/models_utils/model_extras.py

-          WHERE Name = 'Administrator'
-        )
-        """, [self.id])
+        # TODO: UNCOMMENT THIS. Commented specifically for testing PR https://github.com/specify/specify7/pull/6671


specifyweb/backend/setup_tool/api.py

melton-jason

Nice work so far! 🚀

I've only reviewed the functionality and backend code, I've yet to finish a review on the frontend.

A lot of the "required" changes I've noticed should be fairly easy to undertake and were relatively small, or related to database structure from #6389.
My larger comments were mainly related to code maintainability, compatibility, and security.

Let me know if you have any questions with these comments!

melton-jason · 2025-12-22T15:54:06Z

specifyweb/specify/views.py

+            if not spmodels.Institution.objects.exists():
+                return view(request, *args, **kwargs)


Currently, as long as there is not an Institution, all requests are unauthenticated and allowed.

Practically, this would mean an attacker could trivially do something like:

Create their own Institution, Division, Discipline, Collection, and SpecifyUser

And theoretically:

Create any other record in the database without going through the setup tool

Including their own SpecifyUser with Institution Admin permissions

For the current setup process to be secure with no other changes, an administrator would need to perform the setup tool locally and not expose the server to the internet until the setup process is complete.
This may be infeasible for some setups where the Specify instance is only accessible remotely; configuring networking to allow a specific machine to connect (and removing that rule after setup is complete) might be unreasonably complex and place too much of the burden on the user to setup Specify.

Here are two more secure approaches which might be sufficient:

Have a password or similar setting that can be set before the container is started that an administrator can use to "login" to the Setup page

Similar to feat: add ALLOW_SUPPORT_LOGIN to Dockerfile #7399, generate a token at container startup which gets emitted (in the logs?) that the user can use to create an authenticated session

It would be very tricky to have authentication entirely handled by the browser. Though a user would only need to authenticate once for a secure session (unless the session expires, e.g., via closing the browser).

With Django's authentication systems, we can authenticate a user without having a corresponding SpecifyUser.

https://docs.djangoproject.com/en/6.0/topics/auth/default/#how-to-log-a-user-in

https://docs.djangoproject.com/en/6.0/topics/auth/customizing/

This would involve setting up a SetupTool Authentication Backend (and maybe a middleware, that can be disabled once the setup tool process is complete) that would handle whichever method we choose to verify the identity of the admin.

Reverted the changes to login_maybe_required 👍
cf48ece

This leaves only the new setup endpoints exposed, still not the best. Leaving this open for discussion to see what everyone thinks

melton-jason · 2025-12-24T14:50:19Z

specifyweb/specify/migration_utils/update_schema_config.py

(optional)

The complexity of update_table_schema_config_with_defaults and update_table_field_schema_config_with_defaults have increased quite a bit in this branch.
Especially with calls to update_table_schema_config_with_defaults, this added complexity can make it somewhat difficult to parse and understand what could be happening, so I think it might be time to reconsider refactoring these helper functions.

Below is something like a Control Flow Graph (CFG) depicting the work done compared to main (left) and this branch (right) for creating the description of a table when calling update_table_schema_config_with_defaults.

flowchart TD A("description: str | None") A -- == None --> B(Default Table Name) A -- != None --> C B --> C(Container Desc) D("defaults: dict | None") D -- == None --> E(Empty Dict) D -- != None --> G("defaults.get(tablename)") G --> F E --> F(Table Defaults) F --has desc--> H("default.get(desc)") F -- missing desc --> I(Default Table Name) H --> J(Container Desc) I --> J

Loading

Similarly, below is a CFG for the process of creating the field labels when calling update_table_schema_config_with_defaults in main (left) and this branch (right).

Nodes in green depict the work performed in update_table_schema_config_with_defaults, and nodes in orange depict the work performed in update_table_field_schema_config_with_defaults.

flowchart TD classDef fieldLvl stroke:#ffa500 classDef tblLvl stroke:#0f0 A(Get table from datamodel) A --> B(Collect fields from table) B -- for each field --> C(Convert name to Default Field Label) C --> CC(Create field label) class A,B tblLvl; class C,CC fieldLvl; D("default:s dict | None") E("overrides: dict | None") D --> AA(Get table from datamodel) E --> AA AA -- table in defaults --> G(field_overrides += defaults) G -- table in overrides --> H(Get fields from overrides) H -- for field in overrides --> I(field_overrides += field) I --> BB(Collect fields from table) AA -- table not in defaults --> BB G -- table not in overrides --> BB BB -- for each field --> J(Label = Convert name to Default Field Label) J -- field in field_overrides --> K("Label = field_override") K --> L(Create field label) J -- field not in field_overrides --> L class D,E,AA,G,H,I,BB tblLvl; class J,K,L fieldLvl;

Loading

It looks to me like our current functions are solely responsible for a lot of work.
I think adopting the Single responsibility principle with the Chain of responsibility pattern might be beneficial here.

Dividing the required functionalities into the smallest unit of work, I think there are four main responsibilities with our current functions:

Creating the SpLocaleContainer (table in schemaconfig)

Creating the table label and description (the SpLocaleItemStr for each)

Creating the SpLocaleContainerItem (field in schemaconfig)

Creating the field label and description (the SpLocaleItemStr for each)

Each of these can be converted to a function as the "basic building blocks" and used as needed: either in helper functions for common use cases or directly by the caller if more fine-grained behavior is desired.

For example, the following might be how I would refactor our update_table_schema_config_with_defaults

Suggested Implementation in this dropdown

The overall design/CFG:

flowchart TD classDef main stroke:#FF00FF classDef table stroke:#f00 classDef tableStr stroke:#FFA500 classDef field stroke:#228B22 classDef fieldStr stroke:#4169E1 A("Get table from datamodel") --> B("Create SpLocaleContainer") B --> C("Create table label and descriptions") C --> D("Gather fields from table") D -- for each field --> E("Create SpLocaleContainerItem") E --> F("Create field label and description") class A,D main; class B table class C tableStr; class E field; class F fieldStr;

Loading

A basic implementation might look like the following:

def create_default_table_schema_config( table_name: str, discipline_id: int, apps ): table = datamodel.get_table(table_name) if table is None: logger.warning( f"Table does not exist in latest state of the datamodel, skipping Schema Config entry for: {table_name}" ) return # If we need a more complicated datatype- like TableSchemaConfig- we can define/create it here # and pass it to the basic functions # creates and returns the SplocaleContainer container = create_table_config(apps, table, discipline_id) create_table_strings(apps, container, table) # It'd be very easy to define another function that's just a wrapper for creating the Container or Item # and corresponding strings # e.g., container = create_table_locale(apps, table, discipline_id) # And for an Item and it's strings: # item = create_field_locale(apps, field, container) for field in table.all_fields: # creates and returns the SpLocaleContainerItem item = create_field_config(apps, field, container) create_field_strings(apps, item, field)

The above implementation would be only for creating with the "defaults" .
If a caller has a non-default table or field description, we can easily define another helper which accepts a TableSchemaConfig object or similar and handles creating the object(s) (we could then even simplify the above implementation even further by using said helper).

Here's how a more general implementation mirroring the current implementation might look like:

def create_table_schema_config( config: TableSchemaConfig, discipline_id: int, apps ): table = datamodel.get_table(config.table_name) if table is None: logger.warning( f"Table does not exist in latest state of the datamodel, skipping Schema Config entry for: {config.table_name}" ) return container = create_table_config(apps, config, discipline_id) create_table_strings(apps, container, config) for field in table.all_fields: field_config = config.fields.get(field.name, None) item = create_field_config(apps, field, container, field_config) create_field_strings(apps, item, field, field_config) # This can be used like the following: # Everything not defined here is assumed to be the "default" value my_config = TableSchemaConfig( table_name="SomeTable", label="Custom Name", fields={ "text1": FieldSchemaConfig( description="Use this as needed!" ) } ) create_table_schema_config(my_config, discipline_id, apps)

Essentially, the goal is to keep implementations simple with a single responsibility. With the basic building blocks and helpers for common use cases, the caller has a clean interface regardless of their required complexity.

I could also absolutely see this taken with an OOP approach (blame it on my Java background 😅).
The following is a conceptual example of an OOP implementation using something similar to the Builder pattern:

my_schema = TableSchemaConfig(label="My table", description="Hello") # nearly all methods of TableSchemaConfig are focused on manipulating the # underlying data structure, and explicitly doesn't touch the database my_schema.add_field("text1", FieldSchemaConfig(label="Method")) # this call is what would actually create the container, the items, and the strings my_schema.create()

Instead of single responsibility and building up from the smallest building blocks, the intuition and focus behind this approach would be separation of data and concrete instantiation. We have a flexible, complicated data structure (the schema config tables) and a clear process/methodology of creating it. Essentially, we let the caller build out how it wants the schema to look like before committing to its creation.

This is all just how I would refactor these functions to make them more usable and maintainable.

There's no "right" way of doing this, so if you have other ideas about how to simplify or refactor these functions, I'd love to hear them and absolutely go for it 🚀

melton-jason · 2025-12-26T18:40:54Z

specifyweb/backend/setup_tool/schema_defaults.py

+    Apply schema config localization defaults for this discipline.
+    """
+    # Get default schema localization
+    defaults = load_json_from_file(Path(__file__).parent.parent.parent.parent / 'config' / 'common' / 'schema_localization_en.json')


(optional)

Could we simplify these defaults? I imagine the vast majority of these items already fall within the heuristic used by update_table_schema_config_with_defaults to determine the defaults.
It might be easier from a maintainability and performance standpoint if the defaults from the file contained only the elements which should explicitly not follow these defaults.

Provided is a link to the original PR for convenience: #7444

You can use the following query to get the schema config information for all fields in a Specify database:

SELECT its.Language, spc.name AS 'Table Name', itm.name AS 'Field Name', its.Text AS 'Table Label', itd.Text AS 'Table Desc', itms.Text AS 'Field Label', itmd.Text AS 'Field Desc', itm.type AS 'Field Type', CASE WHEN itm.IsRequired = 1 THEN 'True' WHEN itm.IsRequired = 0 THEN 'False' ELSE itm.IsRequired END AS 'Field Is Required', CASE WHEN itm.IsHidden = 1 THEN 'True' WHEN itm.IsHidden = 0 THEN 'False' ELSE itm.IsHidden END AS 'Field Is Hidden', itm.Format AS 'Field Format', itm.PickListName AS 'Field PickList', itm.WebLinkName AS 'Field WebLink' FROM splocalecontainer AS spc LEFT OUTER JOIN splocaleitemstr AS its ON its.SpLocaleContainerNameID = spc.splocalecontainerid LEFT OUTER JOIN splocaleitemstr itd ON itd.SpLocaleContainerDescID = spc.splocalecontainerid LEFT OUTER JOIN splocalecontaineritem AS itm ON itm.SpLocaleContainerID = spc.splocalecontainerid LEFT OUTER JOIN splocaleitemstr itms ON itms.SpLocaleContainerItemNameID = itm.SpLocaleContainerItemID LEFT OUTER JOIN splocaleitemstr itmd ON itmd.SpLocaleContainerItemDescID = itm.SpLocaleContainerItemID WHERE spc.SchemaType = 0 ORDER BY spc.DisciplineID, its.Language, spc.name, itm.name;

(Or alternatively use the /context/schema_localization.json endpoint).

And if you want to export the Schema from SQL as a CSV (this will output as schema_output.csv, but you can change the file path after the OUTFILE keyword):
(Note that NULL values are interpreted as the literal \N).

SELECT its.Language, spc.name AS 'Table Name', itm.name AS 'Field Name', its.Text AS 'Table Label', itd.Text AS 'Table Desc', itms.Text AS 'Field Label', itmd.Text AS 'Field Desc', itm.type AS 'Field Type', CASE WHEN itm.IsRequired = 1 THEN 'True' WHEN itm.IsRequired = 0 THEN 'False' ELSE itm.IsRequired END AS 'Field Is Required', CASE WHEN itm.IsHidden = 1 THEN 'True' WHEN itm.IsHidden = 0 THEN 'False' ELSE itm.IsHidden END AS 'Field Is Hidden', itm.Format AS 'Field Format', itm.PickListName AS 'Field PickList', itm.WebLinkName AS 'Field WebLink' FROM splocalecontainer AS spc LEFT OUTER JOIN splocaleitemstr AS its ON its.SpLocaleContainerNameID = spc.splocalecontainerid LEFT OUTER JOIN splocaleitemstr itd ON itd.SpLocaleContainerDescID = spc.splocalecontainerid LEFT OUTER JOIN splocalecontaineritem AS itm ON itm.SpLocaleContainerID = spc.splocalecontainerid LEFT OUTER JOIN splocaleitemstr itms ON itms.SpLocaleContainerItemNameID = itm.SpLocaleContainerItemID LEFT OUTER JOIN splocaleitemstr itmd ON itmd.SpLocaleContainerItemDescID = itm.SpLocaleContainerItemID WHERE spc.SchemaType = 0 ORDER BY spc.DisciplineID, its.Language, spc.name, itm.name INTO OUTFILE 'schema_output.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

specifyweb/backend/setup_tool/schema_defaults.py

specifyweb/specify/migration_utils/update_schema_config.py

specifyweb/backend/setup_tool/api.py

specifyweb/backend/setup_tool/urls.py

melton-jason · 2025-12-29T18:22:23Z

specifyweb/frontend/js_src/lib/localization/setupTool.ts

When this is ready to be merged we'll have to do some setup in Weblate to make these strings accessible to translators. You largely only have to do this when creating a new localization directory - which corresponds to a new Component on Weblate.

There's no instructions for adding a new component for in our https://github.com/specify/specify7/blob/main/specifyweb/frontend/js_src/lib/localization/README.md currently, so I'll modify the file to include that documentation.

melton-jason · 2025-12-29T20:08:08Z

specifyweb/backend/setup_tool/setup_tasks.py

We're not creating any default App Resources in the database at the moment.

This is probably fine for stuff like View Sets, TypeSearches, UIFormatters, etc., which exist as defaults on disk, but there is no default Remote or Global Preferences.

Do you know if these are these going to be addressed in something like #7501, or is generally expected that the preference files exist on data creation?
If these changes are going go to be included in a release before being addressed, I think we at least need to create the Remote Preferences so Specify has access to settings like the Date Format (ui.formatting.scrdateformat), Tree Count thresholds (TreeEditor.Rank.Threshold.<TREE>, related #6844), and allow users to configure things like automatically creating resources and other Remote Preferences

Remove unused urls

Refactor worker status code

Triggered by 64879ec on branch refs/heads/issue-2931-1

Remove print statement

Create a new setup page

c1ac2af

Fixes #2931

github-project-automation bot moved this to 📋Back Log in General Tester Board Jun 16, 2025

github-project-automation bot added this to General Tester Board Jun 16, 2025

CarolineDenis added 8 commits June 17, 2025 10:00

Create dependent rel resources for discipline

f3505bf

Handle collection creation

05764cf

Create agent for specifyuser

b900d01

Hash password

d093a30

Change UI

8e19f0d

Update naming

5db92cb

Cleanup

1458cf3

Create permission

6fb54b9

Give full permission to new user

2c964f6

CarolineDenis added 9 commits July 9, 2025 14:13

Add picklists

decdacd

Update list

1b26e97

Update type

f8a8cef

Merge remote-tracking branch 'origin/main' into issue-2931-1

87b4b65

Improve check for new db at login

d2f0623

Merge remote-tracking branch 'origin/main' into issue-2931-1

c03ad23

Fix test

a9dfb2f

Add type

49905f1

Add spversion with institution

31211cc

CarolineDenis force-pushed the issue-2931-1 branch from 63728c3 to 31211cc Compare August 11, 2025 18:39

CarolineDenis added 6 commits August 11, 2025 18:43

Lint code with ESLint and Prettier

ec228a6

Triggered by 31211cc on branch refs/heads/issue-2931-1

Merge remote-tracking branch 'origin/main' into issue-2931-1

7a5719a

Fix import

e039a65

Create separate backend app for setup tool

3a36eb0

Update imports

49eab87

Import creation function

b6b6c3b

Merge branch 'main' into issue-2931-1

f9e9b5e

grantfitzsimmons requested changes Dec 5, 2025

View reviewed changes

github-project-automation bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Dec 5, 2025

Fix max field lengths for institution name and specifyuser password

25bd6af

alesan99 added 7 commits December 16, 2025 12:49

Reformat

a895d0c

Fix key not being lowercase

Lint code with ESLint and Prettier

e28f44c

Triggered by a895d0c on branch refs/heads/issue-2931-1

Merge branch 'main' into issue-2931-1

92dcf63

Lint code with ESLint and Prettier

8a811fb

Triggered by 92dcf63 on branch refs/heads/issue-2931-1

Export field renderer

18e2790

Improve specify worker error

72763a3

Lint code with ESLint and Prettier

fe64aea

Triggered by 72763a3 on branch refs/heads/issue-2931-1

alesan99 requested a review from grantfitzsimmons December 17, 2025 20:40

Grant admin permissions on all collections

6cb080e

alesan99 reviewed Dec 22, 2025

View reviewed changes

alesan99 added 4 commits December 23, 2025 09:50

Redirect api/create reqs for setup resources

cd49067

Remove try except block

2d9e0d6

WIP create default COTs

f7cefd0

Assign primary trees in discipline

28d4edd

melton-jason requested changes Dec 29, 2025

View reviewed changes

alesan99 added 8 commits December 30, 2025 11:42

Fix schema config captions

4387eff

Add default uniqueness rules

23dff74

Remove unused urls

Simplify picklist json reading

31d2c88

Refactor worker status code

Revert changes to login_maybe_required (Not secure & no longer needed)

cf48ece

Add First and Last Name fields to user form

64879ec

Lint code with ESLint and Prettier

4abdbfd

Triggered by 64879ec on branch refs/heads/issue-2931-1

Batch all default picklist creation

c144308

Remove print statement

Reverse full name direction for chronostrat

5bc71e5

		if not spmodels.Institution.objects.exists():
		return view(request, args, *kwargs)

Create a new setup page #6671

Are you sure you want to change the base?

Create a new setup page #6671

Conversation

CarolineDenis commented Jun 16, 2025 • edited by alesan99 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Testing instructions

Uh oh!

grantfitzsimmons commented Jun 27, 2025

Uh oh!

CarolineDenis commented Jul 9, 2025 • edited by alesan99 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarolineDenis commented Dec 3, 2025

Uh oh!

grantfitzsimmons left a comment

Choose a reason for hiding this comment

Testing instructions

Uh oh!

alesan99 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

melton-jason left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CarolineDenis commented Jun 16, 2025 •

edited by alesan99

Loading

CarolineDenis commented Jul 9, 2025 •

edited by alesan99

Loading

alesan99 commented Dec 12, 2025 •

edited

Loading