diff --git a/antora.yml b/antora.yml index 3d37c5f9..2fd1cb9e 100644 --- a/antora.yml +++ b/antora.yml @@ -1,7 +1,7 @@ name: tigergraph-server title: TigerGraph DB version: '3.11' -display_version: "3.11 Pre" +display_version: "3.11" start_page: intro:index.adoc nav: diff --git a/modules/API/pages/authentication.adoc b/modules/API/pages/authentication.adoc index a5141b84..00cd7f5a 100644 --- a/modules/API/pages/authentication.adoc +++ b/modules/API/pages/authentication.adoc @@ -8,7 +8,7 @@ Each server uses different methods of authentication. [IMPORTANT] ==== As of 3.10.0, the use of plaintext tokens in authentication is deprecated. -Use xref:tigergraph-server:user-access:jwt-token.adoc[] instead. +Use xref:user-access:jwt-token.adoc[] instead. ==== == REST{pp} Server Requests @@ -42,7 +42,7 @@ curl -X GET -H "Authorization: Bearer 01234567abcdefgh01234567abcdefgh" "http:// === 3rd party JWT token -Since 3.10.0, TigerGraph now supports the use of 3rd party JWT token. See xref:tigergraph-server:user-access:jwt-token.adoc[] for more details. +Since 3.10.0, TigerGraph now supports the use of 3rd party JWT token. See xref:user-access:jwt-token.adoc[] for more details. == GSQL Server Requests diff --git a/modules/API/pages/built-in-endpoints.adoc b/modules/API/pages/built-in-endpoints.adoc index c81ffbd4..41680d51 100644 --- a/modules/API/pages/built-in-endpoints.adoc +++ b/modules/API/pages/built-in-endpoints.adoc @@ -225,7 +225,7 @@ You can use this endpoint to read from TS3. You can filter for the data points y * metric: `what` * location: `where` -Visualization of such metrics are available in Admin Portal - Dashboard - xref:gui:admin-portal:dashboard.adoc[Cluster Monitoring]. +Visualization of such metrics are available in Admin Portal - Dashboard - xref:{page-component-version}@gui:admin-portal:dashboard.adoc[Cluster Monitoring]. On a TigerGraph cluster, this endpoint is only present on the `m1` node. @@ -949,10 +949,10 @@ Response:: `GET /rebuildnow/\{graph_name}` + `POST /rebuildnow/\{graph_name}` -xref:tigergraph-server:reference:list-of-privileges.adoc[*Required privilege*]: Graph-level READ DATA +*Required privilege*: xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[Graph-level READ DATA] -When new data is being loaded into the graph (such as new vertices or edges), data is first stored in memory before it is saved to disk permanently. -TigerGraph runs a rebuild of the Graph Processing Engine (GPE) to commit the data in memory to disk every 30 seconds, but you can also call this endpoint to trigger a rebuild immediately. +When new data is being loaded into the graph (such as new vertices or edges), the data is initially stored in memory before being saved permanently to disk. +TigerGraph runs a rebuild of the Graph Processing Engine (GPE) to commit the data in memory to disk. You can also call this endpoint to trigger a rebuild immediately if necessary. ==== Parameters: @@ -1010,7 +1010,7 @@ cat finished.summary.txt `GET /deleted_vertex_check` -xref:tigergraph-server:reference:list-of-privileges.adoc[*Required privilege*]: Graph-level READ DATA +*Required privilege*: xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[Graph-level READ DATA] In certain rare cases, TigerGraph's Graph Processing Engine (GPE) and Graph Storage Engine (GSE) might be out of sync on vertex deletion information. When this happens, some vertices might exist on one of the components, but not the other. Even though these errors are exceedingly rare, TigerGraph provides an endpoint that allows you to check the deleted vertices on GSE and GPE to see if they are out of sync. @@ -1146,7 +1146,7 @@ curl -X GET 'http://localhost:9000/deleted_vertex_check?threadnum=10&verbose=0&v [IMPORTANT] ==== As of 3.10.0, the use of plaintext tokens in authentication is deprecated. -Use xref:tigergraph-server:user-access:jwt-token.adoc[] instead. +Use xref:user-access:jwt-token.adoc[] instead. ==== The endpoints in this subsection allow users to create, refresh and delete authentication tokens for requests made to the REST{pp} server. @@ -1175,7 +1175,8 @@ The endpoint expects a JSON request body in the following format: { "secret": , <1> "graph": , <2> - "lifetime": <3> + "lifetime": , <3> + "allowExisting": <4> } ---- <1> User's secret to generate the token. @@ -1183,20 +1184,7 @@ Required if the request body does not supply `graph`. <2> Name of the graph that the token is valid for. Required if the request body does not supply `secret`. <3> Period of time for which the token is valid measured in seconds. The default value is about 2.6 million (about a month). - -==== Parameters: -[width="100%",cols="15%,10%,75%a",options="header",] -|=== -|Name |Required |Description - -|`allowExisting` [v3.9.3+] -|No -|Boolean (default False). When True: -(1) If an existing token has at least one day remaining before its expiration, refresh and return that token. -(2) If there is no such existing token and the requester is a cross-region replica, then respond with a No Token Found error. -(3) Otherwise, return a new token. - -|=== +<4> When True, checks for an existing valid token with ≥1 day remaining and refreshes and returns it. If no such token exists and the requester is a cross-region replica, it returns a "No Token Found" error. Otherwise, it generates a new token.The default value is false.(Optional) ==== Sample requests: The responses are slightly different between requests made with secrets and username-password pair. @@ -1209,7 +1197,7 @@ With secret:: [source.wrap,bash] ---- curl -X POST http://localhost:9000/requesttoken \ - -d '{"secret":"jiokmfqqfu2f95qs6ug85o89rpkneib3", "graph":"MyGraph", "lifetime":"100000"}' + -d '{"secret":"jiokmfqqfu2f95qs6ug85o89rpkneib3", "graph":"MyGraph", "lifetime":"100000", "allowExisting":"true"}' ---- -- Response:: @@ -1424,7 +1412,7 @@ This request requires the privilege `WRITE_USER`: [.wrap,console] ---- -curl -X DELETE "https://localhost:14240/expiredtoken" +curl --user example_username:example_password -X DELETE https://localhost:14240/gsqlserver/gsql/expiredtoken ---- The following request deletes all expired tokens that belong to users `u1` and `u2` as well as all tokens created with secrets `s1` and `s2`. @@ -1466,9 +1454,9 @@ None `+POST /ddl/{graph_name}+` -xref:tigergraph-server:reference:list-of-privileges.adoc[*Required privilege*]: Graph-level EXECUTE_LOADINGJOB +*Required privilege*: xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[Graph-level EXECUTE LOADINGJOB] -This endpoint is for loading data into a graph. It submits data as an HTTP request payload, to be loaded into the graph by the DDL Loader. The data payload can be formatted as generic CSV or JSON. For more details, please see xref:gsql-ref:basics:system-and-language-basics.adoc[GSQL Language Reference Part 1 - Defining Graphs and Loading Data]. +This endpoint is for loading data into a graph. It submits data as an HTTP request payload, to be loaded into the graph by the DDL Loader. The data payload can be formatted as generic CSV or JSON. For more details, please see xref:{page-component-version}@gsql-ref:basics:system-and-language-basics.adoc[GSQL Language Reference Part 1 - Defining Graphs and Loading Data]. If the loading job references multiple files, multiple HTTP requests are needed to complete the loading job since you can only provide data for one filename variable at a time. The loading job will skip the `LOAD` statements referencing filename variables that the request didn't provide data for. @@ -1903,13 +1891,13 @@ See xref:upsert-rest.adoc[]. *Server*: GSQL-Server -xref:tigergraph-server:reference:list-of-privileges.adoc[*Required privilege*]: Global-level CLEAR_GRAPHSTORE +*Required privilege*: xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[Global-level CLEAR_GRAPHSTORE] This endpoint is available in v3.9.2+. This endpoint permanently deletes all the data out of the graph store (database), for all graphs. It does not delete the database schema, nor does it delete queries or loading jobs. -It is equivalent to the GSQL command xref:gsql-ref:ddl-and-loading:running-a-loading-job.adoc#_clear_graph_store[CLEAR GRAPH STORE]. +It is equivalent to the GSQL command xref:{page-component-version}@gsql-ref:ddl-and-loading:running-a-loading-job.adoc#_clear_graph_store[CLEAR GRAPH STORE]. [WARNING] ==== @@ -2499,7 +2487,7 @@ GET /graph/{graph_name}/edges/{source_vertex_type}/{source_vertex_id}/{edge_type ---- This endpoint returns the edge of a specified type between a source vertex and a target vertex. -If the edge type isn't defined with a xref:gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator], the source, target and edge type uniquely identify an edge. +If the edge type isn't defined with a xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator], the source, target and edge type uniquely identify an edge. If the edge type is defined with a discriminator, this endpoint returns all edges of the edge type between the source and target vertices. This endpoint requires the xref:user-access:access-control-model.adoc#_data_crud_privileges[`READ_DATA` privilege] on the types or attributes being queried. @@ -2570,7 +2558,7 @@ GET /graph/{graph_name}/edges/{source_vertex_type}/{source_vertex_id}/{edge_type ---- This endpoint allows you to retrieve an edge by its source, target, edge type, and -xref:gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator]. +xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator]. [NOTE] ==== @@ -2627,7 +2615,7 @@ DELETE /graph/{graph_name}/edges/{source_vertex_type}/{source_vertex_id}/{edge_t ---- Deletes an edge by its source vertex type and ID, target vertex type and ID, as well as edge type. -If the edge type isn't defined with a xref:gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator], the source, target and edge type uniquely identify an edge. +If the edge type isn't defined with a xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator], the source, target and edge type uniquely identify an edge. If the edge type is defined with a discriminator, this endpoint deletes all edges of the edge type between the source and target vertices. If you want to delete a specific edge by its discriminator, see <<_delete_an_edge_by_source_target_edge_type_and_discriminator>>. @@ -2718,7 +2706,7 @@ DELETE /graph/{graph_name}/edges/{source_vertex_type}/{source_vertex_id}/{edge_t ---- This endpoint allows you to delete an edge by its source, target, edge type, and -xref:gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator]. +xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminator]. [NOTE] ==== @@ -3172,7 +3160,7 @@ At the read committed level, it is guaranteed that any data read is committed at ==== Query parameter Passing When using a `POST` request to run an installed query, the query parameters are passed in through the request body and xref:API:index.adoc#_formatting_data_in_json[encoded in JSON format]. -The formatting rules for the JSON payload are the same as xref:gsql-ref:querying:query-operations.adoc#_parameter_json_object[using JSON to pass in parameters in the `RUN QUERY` command]. +The formatting rules for the JSON payload are the same as xref:{page-component-version}@gsql-ref:querying:query-operations.adoc#_parameter_json_object[using JSON to pass in parameters in the `RUN QUERY` command]. [width="99%",cols="28%,36%,36%",options="header",] |=== @@ -3307,7 +3295,7 @@ curl -X POST -H "GSQL-THREAD-LIMIT: 4" -d '{"p":{"id":"Tom","type":"person"}}' " [NOTE] ==== -Installed queries can run in xref:gsql-ref:querying:query-operations.adoc#_detached_mode_async_option[Detached Mode]. +Installed queries can run in xref:{page-component-version}@gsql-ref:querying:query-operations.adoc#_detached_mode_async_option[Detached Mode]. To do this, use the ``GSQL-ASYNC``header and set its value to `true`. The xref:built-in-endpoints.adoc#_check_query_status_detached_mode[results] and link:built-in-endpoints.adoc#_check_query_status_detached_mode[status] of the queries run in Detached Mode can be retrieved with a query ID, which is returned immediately when queries are executed in Detached Mode. @@ -3528,7 +3516,7 @@ The `expirationTime` attribute is also not available. `+GET /abortquery/{graph_name}+` -xref:tigergraph-server:reference:list-of-privileges.adoc[*Required privilege*]: Graph-level DELETE_DATA +*Required privilege*: xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[Graph-level DELETE_DATA] This endpoint safely aborts a selected query by ID or all queries of an endpoint by endpoint URL of a graph. @@ -3595,7 +3583,7 @@ Response:: `GET /query_status` -This endpoint allows you to check the status of a query run in xref:gsql-ref:querying:query-operations.adoc#_detached_mode_async_option[detached mode]. +This endpoint allows you to check the status of a query run in xref:{page-component-version}@gsql-ref:querying:query-operations.adoc#_detached_mode_async_option[detached mode]. If you are running a TigerGraph cluster, this endpoint only allows you to check the status of a query running on the node to which the request is sent, not all nodes on the cluster. @@ -3885,7 +3873,7 @@ Response:: `GET /data_consistency_check` -xref:tigergraph-server:reference:list-of-privileges.adoc[*Required privilege*]: Global-level DELETE_DATA +*Required privilege*: xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[Global-level DELETE_DATA] In order to provide peace of mind for TigerGraph operation teams managing HA clusters, a new tool was introduced in version 3.6.3 (via a service endpoint) to check high level consistency of data on both HA and Distributed Clusters. This tool can be easily incorporated into the regular operational process to provide up-to-date summary info on the integrity of your data on all servers. diff --git a/modules/API/pages/upsert-rest.adoc b/modules/API/pages/upsert-rest.adoc index bd6bf1ca..4c7f12a1 100644 --- a/modules/API/pages/upsert-rest.adoc +++ b/modules/API/pages/upsert-rest.adoc @@ -124,9 +124,10 @@ Edges, on the other hand, are first grouped by source vertex type, then vertex I === Examples +.Upsert Example Data 1: Two User vertices + The first example below shows two `User` vertices having an attribute called `age`: -.Upsert Example Data 1: Two User vertices [source,json] ---- { @@ -147,12 +148,13 @@ The first example below shows two `User` vertices having an attribute called `ag } ---- -The second example starts with one `User` vertex. -Since `id6` contains no attributes, it will remain the same it if already exists. -If it doesn't yet exist, the request will create a vertex with ID `id6` with default attribute values. -Then two edges are created: a `Liked` edge from `id1` to `id6`, and then a `Liked_By` edge from `id6` to `id1`. +.Upsert Example Data 2: Adding Vertices and Edges + +This example starts with one `User` vertex (`id6`). Since `id6` contains no attributes, it will remain unchanged if it already exists. If it doesn’t yet exist, the request will create a vertex with ID `id6` with default attribute values. Two edges are created: + +* A `Liked` edge from `id1` to `id6`. +* A `Liked_By` edge from `id6` to `id1`. -.Upsert Example Data 2:add_id6.json [source,json] ---- { @@ -314,26 +316,156 @@ curl -X POST "localhost:9000/graph/Person_Movie" -d ' ' ---- +== Upserting regular edges + +Regular edges do not have discriminators and must be uniquely defined by their source and target vertex IDs. +To upsert a regular edge, use the following JSON format: + +[source,json] +---- +{ + "edges": { + "": { + "": { + "": { + "": { + "": { + "": { + "value": + } + } + } + } + } + } + } +} +---- + +=== Examples + +Upserting a `Liked` edge from a `User` vertex (`id1`) to another `User` vertex (`id6`): + +[source,json] +---- +{ + "edges": { + "User": { + "id1": { + "Liked": { + "User": { + "id6": { + "weight": { + "value": 5.0 + } + } + } + } + } + } + } +} +---- + == Upserting edges with discriminators -Some edge types are defined with xref:gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminators], which allow multiple instances of the same edge type between two vertices. +Some edge types are defined with xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc#_discriminator[discriminators], which allow multiple instances of the same edge type between two vertices. -To upsert an edge that was defined with a discriminator, insert them as a regular edge. -However, the following rules apply: +=== Rules for upserting edges with discriminators -* You cannot leave off discriminator attributes when inserting an edge whose type was defined with discriminator attributes. -* If you are updating an existing edge, you cannot update the attributes that are defined as part of the edge type discriminator. +1. *Discriminator attributes are required:* +* You must include all attributes defined in the discriminator when inserting an edge. -For example, if you have the following edge type definition: +2. *Discriminator attributes cannot be updated:* +* Discriminator attributes are immutable and cannot be changed once the edge is created. + +3. *Support for multi-edges with JSON arrays:* +* When upserting multiple edges with discriminators between the same source and target vertices, you can use a JSON array format.. + +=== Example: Edge type definition [source.wrap,console] ---- -CREATE DIRECTED EDGE Study_At(From Person, To University, DISCRIMINATOR(class_year INT, class_month INT), major STRING)) +CREATE DIRECTED EDGE Liked ( + FROM User, + TO User, + DISCRIMINATOR(actionId STRING), + weight FLOAT +); +---- + +==== JSON Format for Edges with Discriminators + +To upsert edges with discriminators, use the following JSON format: + +[source,json] +---- +{ + "edges": { + "User": { + "id1": { + "Liked": { + "User": { + "id6": { + "actionId": { + "value": "uuid-1" + }, + "weight": { + "value": 5.0 + } + } + } + } + } + } + } +} +---- + +1. *"actionId" is the discriminator:* +* The `actionId` uniquely identifies each edge instance of type `Liked` between the same source (id1) and target (id6) vertices. +* Discriminators are required when upserting edges of this type and must be included in the JSON payload. + +2. *Other attributes, like "weight":* +* Attributes not part of the discriminator (e.g., `weight`) can be updated when upserting. + +3. *Nested Structure:* +* The JSON groups the edge by source vertex type (User) and source vertex ID (id1). +* Inside, the edge type (Liked) connects the source to the target vertex type (User) and target vertex ID (id6). +* The `value` field holds the attribute value for each edge attribute. + +[NOTE] +==== +* This format is the only supported way to upsert edges with discriminators in the current version. +* Discriminator attributes are required for creating edges and cannot be updated after the edge is created. +* A future version will introduce support for upserting multiple edges with discriminators using JSON arrays. +==== + +== General rules for JSON formatting + +1. *Escaped quotes:* +* If the payload is enclosed in double quotes (`"`), internal double quotes must be replaced with escaped single quotes (`\'`). +* Example: + +[source,json] +---- +'{"edges":{"User":{"id1":{"Liked":{"User":{"id6":{"weight":{"value":5.0}}}}}}}}' ---- -When inserting an edge of type `Study_AT`, you cannot omit the `class_year` attribute or the `class_month` attribute. -You cannot update these two attributes either. +2. *Lists and Sets:* +* For attributes that store lists or sets, use the following format: +[source,json] +---- +{ + "measurements": { + "value": { + "keyList": ["chest", "waist", "hip"], + "valueList": [35, 30, 35] + } + } +} +---- == Valid data types diff --git a/modules/additional-resources/nav.adoc b/modules/additional-resources/nav.adoc index b611a993..1153cf40 100644 --- a/modules/additional-resources/nav.adoc +++ b/modules/additional-resources/nav.adoc @@ -2,18 +2,17 @@ ** xref:best-practice-guides/best-practices-overview.adoc[] *** xref:best-practice-guides/best-prac-scaling-clusters.adoc[Scaling Guide] ** Troubleshooting and FAQs -*** link:https://kb.tigergraph.com/[Knowledge base and FAQs] *** xref:troubleshooting:troubleshooting-guide.adoc[] *** xref:troubleshooting:system-administration-faqs.adoc[] -*** xref:troubleshooting:audit-log.adoc[] *** xref:troubleshooting:log-files.adoc[] -**** xref:troubleshooting:service-log-tracking.adoc[] +**** xref:troubleshooting:audit-log.adoc[] +**** xref:troubleshooting:gcollect.adoc[] **** xref:troubleshooting:elk-filebeat.adoc[] ** References *** xref:reference:configuration-parameters.adoc[] *** xref:reference:return-codes.adoc[] *** xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[] -*** xref:reference:list-of-privileges.adoc[] +*** xref:reference:list-of-privileges-legacy.adoc[] *** xref:reference:ports.adoc[] *** xref:reference:glossary.adoc[] *** xref:reference:patents-and-third-party-software.adoc[] diff --git a/modules/additional-resources/pages/legacy-tg-versions.adoc b/modules/additional-resources/pages/legacy-tg-versions.adoc index e8e78688..ccd58473 100644 --- a/modules/additional-resources/pages/legacy-tg-versions.adoc +++ b/modules/additional-resources/pages/legacy-tg-versions.adoc @@ -5,19 +5,14 @@ This page lists all LTS (Long-Term Support) and previous versions of TigerGraph == LTS Versions -* xref:3.9@tigergraph-server:intro:index.adoc[3.9] -* xref:3.6@tigergraph-server:intro:index.adoc[3.6] +* xref:3.11@tigergraph-server:intro:index.adoc[3.11] +* xref:3.10@tigergraph-server:intro:index.adoc[3.10] == Other Versions -Documentation will be on our legacy documentation website. - -* 3.8 -* 3.7 -* 3.5 -* 3.4 -* 3.3 -* 3.2 +Documentation for older versions is no longer being maintained. +For historical purposes, snapshot PDF files of certain older versions (2.4 to 3.2) is available is +xref:master@home:legacy:index.adoc[]. //// * xref:3.8@tigergraph-server:intro:index.adoc[3.8] diff --git a/modules/advanced-topics/nav.adoc b/modules/advanced-topics/nav.adoc index 8c6805f6..47195dbf 100644 --- a/modules/advanced-topics/nav.adoc +++ b/modules/advanced-topics/nav.adoc @@ -16,7 +16,8 @@ *** Authentication **** xref:user-access:enabling-user-authentication.adoc[] **** xref:user-access:user-credentials.adoc[] -**** xref:user-access:sso.adoc[] +**** xref:user-access:sso-with-saml.adoc[] +**** xref:user-access:sso-with-oidc.adoc[] **** xref:user-access:ldap.adoc[] **** xref:user-access:jwt-token.adoc[] *** Authorization @@ -38,14 +39,13 @@ *** xref:backup-and-restore:cross-cluster-backup.adoc[] *** xref:backup-and-restore:differential-backups.adoc[] *** xref:backup-and-restore:online-backup.adoc[] -*** xref:backup-and-restore:gbar-legacy.adoc[] //Cluster and HA Management ** xref:cluster-and-ha-management:index.adoc[Cluster and HA Management] *** Cluster Resizing **** xref:cluster-and-ha-management:expand-a-cluster.adoc[] **** xref:cluster-and-ha-management:shrink-a-cluster.adoc[] **** xref:cluster-and-ha-management:repartition-a-cluster.adoc[] -**** xref:cluster-and-ha-management:how_to-replace-a-node-in-a-cluster.adoc[Cluster Replace] +**** xref:cluster-and-ha-management:replace-a-node.adoc[] //CRR *** xref:cluster-and-ha-management:crr-index.adoc[Cross-Region Replication (CRR)] **** xref:cluster-and-ha-management:set-up-crr.adoc[Set up CRR] diff --git a/modules/backup-and-restore/nav.adoc b/modules/backup-and-restore/nav.adoc index a47c4cb9..02bdad38 100644 --- a/modules/backup-and-restore/nav.adoc +++ b/modules/backup-and-restore/nav.adoc @@ -7,5 +7,4 @@ ** xref:backup-and-restore:differential-backups.adoc[] ** xref:backup-and-restore:point-in-time-restore.adoc[] ** xref:backup-and-restore:online-backup.adoc[] -** xref:gbar-legacy.adoc[] diff --git a/modules/backup-and-restore/pages/backup-cluster.adoc b/modules/backup-and-restore/pages/backup-cluster.adoc index d963faef..c9aee8a8 100644 --- a/modules/backup-and-restore/pages/backup-cluster.adoc +++ b/modules/backup-and-restore/pages/backup-cluster.adoc @@ -1,5 +1,6 @@ = Back up a Database Cluster -:description: +:description: How to back up a TigerGraph database +:page-aliases: backup-and-restore.adoc :sectnums: This page walks you through the steps to back up your database cluster. @@ -8,7 +9,7 @@ The process also applies to single-server instances. Backing up a cluster is an online operation. Your database remains available during the backup. -Consult xref:tigergraph-server:system-management:management-commands.adoc[the gadmin Command Glossary] to view the help files for the `gadmin backup` commands. +Consult xref:system-management:management-commands.adoc[the gadmin Command Glossary] to view the help files for the `gadmin backup` commands. == Prerequisites * You have access to the TigerGraph Linux user account on your cluster. diff --git a/modules/backup-and-restore/pages/configurations.adoc b/modules/backup-and-restore/pages/configurations.adoc index 463691c4..b1006ad3 100644 --- a/modules/backup-and-restore/pages/configurations.adoc +++ b/modules/backup-and-restore/pages/configurations.adoc @@ -1,6 +1,21 @@ = Backup and Restore Configurations +:description: Configuration parameters for backup and restore, with specific details for cloud storage +[#_configure_backup_and_restore] This page describes the configuration options available for backup and restore on TigerGraph and how to set them. +You can use `gadmin config set ` to change the value of any parameter. + +[TIP] +`gadmin config entry backup` will walk you though all the available backup comfiguration parameters interactively. Enter a value for the ones you want to change; move past the ones you don't want to set. + +After configuring the parameters, run `gadmin config apply -y` to apply the new parameter values. + +To save to cloud storage, the container name and access credentials need to be configured. +After the xref:#_configuration_parameters[table of configuration parameters], you will find instructions for configuring backup to + +* xref:#_backup_to_aws_s3[to AWS S3] +* xref:#_backup_to_abs_azure_blob_storage[to Azure Blob Storage] +* xref:#_backup_to_gcs_google_cloud_storage[to Google Cloud Storage] == Prerequisites * You have access to the TigerGraph Linux user account on your cluster. @@ -10,7 +25,10 @@ All commands must be run from the TigerGraph Linux user. The following is a list of configurations available for backup and restore. -[NOTE]: For `System.Backup.Local.Enable`, `System.Backup.S3.Enable`, `System.Backup.ABS.Enable`, and `System.Backup.GCS.Enable`, only one can be enabled at a time. +[NOTE] +==== +For `System.Backup.Local.Enable`, `System.Backup.S3.Enable`, `System.Backup.ABS.Enable`, and `System.Backup.GCS.Enable`, only one can be enabled at a time. +==== |=== |Configuration parameter |Description |Default @@ -23,45 +41,45 @@ Required if backup is to be stored locally. | `null` |System.Backup.S3.Enable |Enables or disables backup to AWS S3 or S3-compatible services such as Ceph S3.|`false` -|System.Backup.S3.AWSAccessKeyID |AWS Access Key ID for authentication (deprecated, use System.Backup.S3.AccessKeyID instead, which has a higher priority.) | `null` +|System.Backup.S3.AWSAccessKeyID |AWS Access Key ID for authentication (deprecated, use `System.Backup.S3.AccessKeyID` instead, which has a higher priority.) | `null` -|System.Backup.S3.AWSSecretAccessKey |AWS Secret Access Key for authentication (deprecated, use System.Backup.S3.SecretAccessKey instead, which has a higher priority.) +|System.Backup.S3.AWSSecretAccessKey |AWS Secret Access Key for authentication (deprecated, use `System.Backup.S3.SecretAccessKey` instead, which has a higher priority.) -[NOTE]: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_secret`. +*NOTE*: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_secret`. |`+null+` -|System.Backup.S3.AccessKeyID |Access key ID for authentication for AWS S3 or S3-compatible services.| `nan` +|System.Backup.S3.AccessKeyID |Access key ID for authentication for AWS S3 or S3-compatible services.| |System.Backup.S3.SecretAccessKey | Secret Access Key for authentication for AWS S3 or S3-compatible services. -[NOTE]: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_secret`.|`nan` +*NOTE*: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_secret`.| -|System.Backup.S3.BucketName |Bucket for AWS S3 or S3-compatible services.|`nan` +|System.Backup.S3.BucketName |Bucket for AWS S3 or S3-compatible services.| -|System.Backup.S3.Endpoint|A URL used to interact with the S3 (AWS S3/S3-compatible) service. It can be used for AWS S3 VPC Endpoint, customer endpoint, or S3-compatible service. For regular AWS S3, leave it empty.|`nan` +|System.Backup.S3.Endpoint|A URL used to interact with the S3 (AWS S3/S3-compatible) service. It can be used for AWS S3 VPC Endpoint, customer endpoint, or S3-compatible service. For regular AWS S3, leave it empty.| |System.Backup.S3.RoleARN |The AWS role for accessing s3 buckets. S3 Role ARN takes priority over access keys. For more information, see link:https://docs.aws.amazon.com/IAM/latest/APIReference/API_Role.html[AWS role ARN documentation]. -[NOTE]: This is only for AWS S3, and TigerGraph assumes the credentials for using `sts:AssumeRole` have been set up. You can verify the credentials are ready by running link:https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples[aws sts assume-role]. One way to set up credentials is to configure access key id, secret access key and region with AWS CLI `aws configure`. +*NOTE*: This is only for AWS S3, and TigerGraph assumes the credentials for using `sts:AssumeRole` have been set up. You can verify the credentials are ready by running link:https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples[aws sts assume-role]. One way to set up credentials is to configure access key id, secret access key and region with AWS CLI `aws configure`. |`+nan+` |System.Backup.ABS.Enable |Enables or disables backup to ABS (Azure Blob Storage).|`false` -|System.Backup.ABS.ContainerName |Azure storage account container|`nan` -|System.Backup.ABS.AccountName |Azure storage account name for authentication.| `nan` +|System.Backup.ABS.ContainerName |Azure storage account container| +|System.Backup.ABS.AccountName |Azure storage account name for authentication.| |System.Backup.ABS.AccountKey |Account key for the Azure storage account. -[NOTE]: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_key`. -| `nan` -|System.Backup.ABS.Endpoint|Optional Blob service endpoint; if not given, the default endpoint https://.blob.core.windows.net will be used.|`nan` +*NOTE*: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_key`. +| +|System.Backup.ABS.Endpoint|Optional Blob service endpoint; if not given, the default endpoint https://.blob.core.windows.net will be used.| |System.Backup.GCS.Enable |Enables or disables backup to Google Cloud Storage (GCS).|`false` -|System.Backup.GCS.BucketName |GCS bucket.|`nan` -|System.Backup.GCS.AccessKeyID |Access Key for a GCS HMAC Key.| `nan` +|System.Backup.GCS.BucketName |GCS bucket.| +|System.Backup.GCS.AccessKeyID |Access Key for a GCS HMAC Key.| |System.Backup.GCS.Secret |Secret for a GCS HMAC Key. -[NOTE]: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_secret`. -| `nan` +*NOTE*: If setting this in interactive mode, store the key in a file and provide the path to the file, e.g., `@/tmp/test_secret`. +| |System.Backup.GCS.Endpoint|GCS Storage URI.|`https://storage.googleapis.com` |System.Backup.TimeoutSec |Timeout limit for the backup operation in seconds |`+18000+` @@ -79,28 +97,19 @@ We recommending keeping the default value `10`. | "DefaultCompression" |=== -== Configure backup and restore - -Running `gadmin config entry backup` allows you to enter the value for each parameter individually. - -Alternatively, you can use `gadmin config set ` to change the value of any parameter. - -After configuring the parameters, run `gadmin config apply` to apply the new parameter values. - -== Configure System.Backup.S3.Endpoint +== Backup to AWS S3 Typically, there's no need to configure the `System.Backup.S3.Endpoint` parameter on a TigerGraph Server. -This is because the system auto-detects the regional endpoint for AWS S3 backups. +This is because the system auto-detects the regional endpoint for AWS S3 backups. .Users should configure this parameter *only* for special cases, such as: -* To backup to a AWS S3 vpc endpoint, typically set it to "https://s3.amazonaws.com/" or any available URI/VPC endpoints. -* To backup to an S3-compatible service, set it to its corresponding service URI. +* When using S3 in FIPS mode. +* When connecting to a private or localized cloud environment. +* When integrating with an S3-compatible service that requires a specific endpoint. -Except for the above specific situations, leave it empty. +Except for the above specific situations, leave it empty. For more information please see link:https://docs.aws.amazon.com/general/latest/gr/s3.html#s3_region[AWS Service Endpoints]. - -== Backup to AWS S3 -To configure backup files to an AWS S3 Bucket for an on-premises TigerGraph Server cluster, complete the following steps: +To configure backup files to an AWS S3 Bucket for an on-premises TigerGraph Server cluster, users need to complete the following steps: . Create an S3 bucket in AWS . Create an AWS IAM user @@ -133,6 +142,24 @@ To configure backup files to an AWS S3 Bucket for an on-premises TigerGraph Serv TigerGraph clusters use long-lived credentials to authenticate to AWS as the IAM user, allowing TigerGraph access to put backup files into the S3 bucket. These credentials are also used to read and copy files during a Restore process. +. Add the role policy to allow the role to assume itself. +If the custom role is named `EC2S3AccessRole`, run the following AWS CLI command to set the required policy: + ++ +[console,] +---- +aws iam put-role-policy --role-name EC2S3AccessRole --policy-name AssumeEC2S3AccessRole --policy-document '{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": "sts:AssumeRole", + "Resource": "arn:aws:iam::'$AWS_ACCOUNT':role/EC2S3AccessRole" + } + ] +}' +---- + . Configure each of the following parameters on the linux command line: + .Enable storing backup data in S3 @@ -165,13 +192,34 @@ Alternatively, instead of using `AccessKeyID` and `SecretAccessKey`, you may use ---- gadmin config set "System.Backup.S3.RoleARN" "arn:aws:iam::account:role/role-name-with-path" ---- + +. Set the AWS Region for EC2 instances: + -.Apply the new parameter values -[console,] +To configure the AWS region for the AWS SDK on an EC2 instance running TigerGraph, set the region as an environment variable: ++ +[source,console] +---- +gadmin config set Executor.BasicConfig.Env "AWS_REGION=us-west-2;$(gadmin config get Executor.BasicConfig.Env)" +---- ++ +Replace `us-west-2` with your actual AWS region. + +. Apply the new parameter values ++ +[source,console] ---- gadmin config apply -y ---- +. Restart the service ++ +[source,console] +---- +gadmin restart exe -y +---- + +Now you are ready to xref:backup-cluster.adoc[perform a backup]. + == Backup to ABS (Azure Blob Storage) Similar to backing up to AWS S3, once the Azure Blob Storage Container is created and configured properly (refer to https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction[Introduction to Azure Blob Storage]), then configure it to be your backup storage via the following steps. @@ -214,6 +262,8 @@ gadmin config set "System.Backup.ABS.AccountKey" "" gadmin config apply -y ---- +Now you are ready to xref:backup-cluster.adoc[perform a backup]. + == Backup to GCS (Google Cloud Storage) Similar to backing up to ABS, prepare the proper https://cloud.google.com/storage/docs/authentication/hmackeys[HMAC keys]. Then configure it to be your backup storage via the following steps. @@ -248,3 +298,5 @@ gadmin config set "System.Backup.GCS.Secret" "" ---- gadmin config apply -y ---- + +Now you are ready to xref:backup-cluster.adoc[perform a backup]. diff --git a/modules/backup-and-restore/pages/database-import-export.adoc b/modules/backup-and-restore/pages/database-import-export.adoc index c3485148..98e1cb47 100644 --- a/modules/backup-and-restore/pages/database-import-export.adoc +++ b/modules/backup-and-restore/pages/database-import-export.adoc @@ -1,17 +1,17 @@ = Database Import/Export All Graphs :description: This page details the instructions and requirements of importing and exporting a graph in TigerGraph. -//:page-aliases: tigergraph-server:import-export:database-import-export.adoc +//:page-aliases: import-export:database-import-export.adoc The GSQL `EXPORT GRAPH ALL` and `IMPORT GRAPH ALL` commands perform a logical backup and restore. A database `EXPORT` contains the database's data, and optionally some types of metadata. This data can be subsequently imported in order to recreate the same database, in the original or in a different TigerGraph platform instance. -To see how to Export/Import selected individual graphs in a database refer to xref:tigergraph-server:backup-and-restore:single-graph-import-export.adoc[]. +To see how to Export/Import selected individual graphs in a database refer to xref:single-graph-import-export.adoc[]. [IMPORTANT] ==== -Import/export is a complement to xref:tigergraph-server:backup-and-restore:index.adoc[], but not a substitute. See the xref:#_known_issues[] for additional considerations. +Import/export is a complement to xref:backup-and-restore:index.adoc[], but not a substitute. See the xref:#_known_issues[] for additional considerations. ==== == Import Prerequisite @@ -21,7 +21,7 @@ To import an exported database, ensure that the export files are from a database [WARNING] ==== * Running either `EXPORT GRAPH ALL` or `IMPORT GRAPH ALL` resets the TigerGraph superuser's password back to its default value. After running either command, change the superuser's password to make it secure again. -* User-defined loading jobs containing xref:gsql-ref:ddl-and-loading:creating-a-loading-job.adoc#_delete_statement[`DELETE` statements] are not exported correctly. +* User-defined loading jobs containing xref:{page-component-version}@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc#_delete_statement[`DELETE` statements] are not exported correctly. * If a graph contains vertex or edge types with a composite key, the graph data is exported in a nonstandard format that cannot be re-imported. * GSQL EXPORT GRAPH may fail and cause a GPE to crash when UDT type has a fixed STRING size. ==== @@ -248,11 +248,11 @@ Importing a new solution cannot be undone to restore the previous state, regardl Therefore, create a complete backup beforehand in case you need to restore the database: xref:backup-cluster.adoc[] For security purposes, TigerGraph has two `gadmin` commands, `GSQL.UDF.Policy.Enable` and `GSQL.UDF.Policy.HeaderAllowlist` to prevent malicious code execution during import. -Please refer to the section on xref:gsql-ref:querying:func/query-user-defined-functions.adoc#udf-security[UDF Security] to ensure that UDFs comply with the security specifications. This will help you import the solution successfully. +Please refer to the section on xref:{page-component-version}@gsql-ref:querying:func/query-user-defined-functions.adoc#udf-security[UDF Security] to ensure that UDFs comply with the security specifications. This will help you import the solution successfully. ==== === Required privileges -`WRITE_SCHEMA`, `WRITE_QUERY`, `WRITE_LOADINGJOB`, `EXECUTE_LOADINGJOB`, `DROP ALL`, `WRITE_USERS` +`WRITE_SCHEMA`, `WRITE_QUERY`, `WRITE_LOADINGJOB`, `EXECUTE_LOADINGJOB`, `DROP ALL`, `WRITE_USER`, `WRITE_ROLE`, `READ_DATA`, `CREATE_DATA` === Synopsis @@ -369,7 +369,10 @@ Some files, including the data files, are exported to each server, while some fi The following are the steps to import an export file to a cluster. -You may only import to a cluster that has the same number and configuration of servers as the data from which the export originated. +If you're exporting from a single-node cluster, the `ExportedGraph.zip` can be directly imported, so you don't need the source and target clusters to have the same number of servers. +If you're exporting from a smaller cluster to a larger one, you must transfer the dataset files along with the zip file to the appropriate nodes in the target cluster. + +For importing from a larger cluster to a smaller one, you’ll need to combine the dataset files, which can be easily done using GraphStudio. If you're using the `gsql CLI`, remember to combine all exported data from each node and transfer them to the new cluster before importing. ==== 1. Transfer files to new cluster @@ -382,4 +385,4 @@ The export file on every node must share the same absolute path. Run the `IMPORT GRAPH ALL` command from the server that corresponds to the server where `EXPORT GRAPH ALL` was run. -For example, if you exported from the m2 node in a cluster, you also need to run the `IMPORT GRAPH ALL` command from the m2 node of the cluster you are importing the export files into. \ No newline at end of file +For example, if you exported from the m2 node in a cluster, you also need to run the `IMPORT GRAPH ALL` command from the m2 node of the cluster you are importing the export files into. diff --git a/modules/backup-and-restore/pages/differential-backups.adoc b/modules/backup-and-restore/pages/differential-backups.adoc index cd2a1217..0f259fc0 100644 --- a/modules/backup-and-restore/pages/differential-backups.adoc +++ b/modules/backup-and-restore/pages/differential-backups.adoc @@ -37,7 +37,7 @@ gadmin backup create --incremental [NOTE] ==== -For more details on `gadmin backup create` see xref:tigergraph-server:backup-and-restore:backup-cluster.adoc#_procedure[Back up a Database Cluster Procedure]. +For more details on `gadmin backup create` see xref:backup-cluster.adoc#_procedure[Back up a Database Cluster Procedure]. ==== To restore a differential backup, users need to find the correct `` from the backup list. diff --git a/modules/backup-and-restore/pages/gbar-legacy.adoc b/modules/backup-and-restore/pages/gbar-legacy.adoc deleted file mode 100644 index 7aebe964..00000000 --- a/modules/backup-and-restore/pages/gbar-legacy.adoc +++ /dev/null @@ -1,286 +0,0 @@ -= Legacy Backup and Restore -:page-aliases: backup-and-restore.adoc - -[IMPORTANT] -==== -Since 3.10.0 the command `gbar` is removed and is no longer available. -However, if you are using a version of TigerGraph before 3.10.0 you can still use `gbar` to create a backup of the primary cluster. - -==== - -Graph Backup And Restore (GBAR), is an integrated tool for backing up and restoring the data and data dictionary (schema, loading jobs, and queries) of a TigerGraph instance or cluster. - -The backup feature packs TigerGraph data and configuration information into a directory on the local disk or a remote AWS S3 bucket. -Multiple backup files can be archived. -Later, you can use the restore feature to roll back the system to any backup point. -This tool can also be integrated easily with Linux cron to perform periodic backup jobs. - -[NOTE] -==== -The current version of GBAR is intended for restoring the same machine that was backed up. For help with cloning a database (i.e., backing up machine A and restoring the database to machine B), please https://tigergraph.zendesk.com/hc/en-us/[open a support ticket]. -==== - -== Synopsis - - -[source,text] ----- -Usage: gbar backup [options] -t - gbar restore [options] - gbar list [backup_tag] [-j] - gbar remove|rm - gbar cleanup - gbar expand [-a] - New nodes must be written in : pairs separated by comma - Example: - m1:192.168.1.2,m2:192.168.1.3,m3:192.168.1.4 - - Options: - -h, --help Show this help message and exit - -v Run with debug info dumped - -vv Run with verbose debug info dumped - -y Run without prompt - -j Print gbar list as JSON - -t BACKUP_TAG Tag for backup file, required on backup - -a, --advanced Enable advanced mode for node expansion ----- - - - -The `-y` option forces GBAR to skip interactive prompt questions by selecting the default answer. -There is currently one interactive question: - -* At the start of the restore process, GBAR always asks if it is okay to stop and reset the TigerGraph services. -The default answer is yes. - -== Configure GBAR - -Before using the backup or the restore feature, GBAR must be configured. - -. Run `gadmin config entry system.backup`. -At each prompt, enter the appropriate values for each config parameter. -+ -[source,console] ----- -$ gadmin config entry system.backup - -System.Backup.TimeoutSec [ 18000 ]: The backup timeout in seconds -New: 18000 - -System.Backup.CompressProcessNumber [ 0 ]: The number of concurrent process for compression during backup -New: 0 - -System.Backup.Local.Enable [ true ]: Backup data to local path -New: true - -System.Backup.Local.Path [ /tmp/backup ]: The path to store the backup files -New: /data/backup - -System.Backup.S3.Enable [ false ]: Backup data to S3 path -New: false - -System.Backup.S3.AWSAccessKeyID [ ]: The path to store the backup files -New: - -System.Backup.S3.AWSSecretAccessKey [ ]: The path to store the backup files -New: - -System.Backup.S3.BucketName [ ]: The path to store the backup files -New: ----- - -. After entering the configuration values, run the following command to apply the new configurations -+ -[source,console] ----- -gadmin config apply -y ----- - -[NOTE] -==== -* You can specify the number of parallel processes for backup and restore. -* You must provide the username and password using `GSQL_USERNAME` and `GSQL_PASSWORD` environment variables. - - $ GSQL_USERNAME=tigergraph GSQL_PASSWORD=tigergraph gbar backup -t daily -==== - -== Perform a backup - -Before performing a backup, ensure that there is enough free disk space for the backup files. - -To perform a backup, run the following command as the TigerGraph Linux user: - -[source,console] ----- -gbar backup -t ----- - -Depending on your configuration settings, your backup archive is output to your local backup path and/or your AWS S3 bucket. -If you are running a cluster, there will be a backup archive on every node in the same path. - -A backup archive is stored as several files in a folder, rather than as a single file. -The backup tag acts like a filename prefix for the archive filename. -The full name of the backup archive will be `-`, which is a subfolder of the backup repository. - -* If `System.Backup.Local.Enable` is set to `true`, the folder is a local folder on every node in a cluster, to avoid massive data moving across nodes in a cluster. -* If `System.Backup.S3.Enable` is set to `true`, every node will upload data located on the node to the s3 repository. Therefore, every node in a cluster needs access to Amazon S3. - -GBAR Backup performs a live backup, meaning that normal operations may continue while the backup is in progress. -When GBAR backup starts, GBAR checks if there are running loading jobs. -If there are, it pauses loading for 1 minute to generate a snapshot and then continue the backup process. -You can specify the loading pausing interval by the environment variable `PAUSE_LOADING`. - -GBAR then sends a request to the admin server, which then requests the GPE and GSE to create snapshots of their data. -Per the request, the GPE and GSE store their data under GBAR's own working directory. -GBAR also directly contacts the Dictionary and obtains a dump of its system configuration information. -In addition, GBAR gathers the TigerGraph system version and customized information including user-defined functions, token functions, schema layouts and user-uploaded icons. -Then, GBAR compresses each of these data and configuration information files in tgz format and stores them in the `-` subfolder on each node. -As the last step, GBAR copies that file to local storage or AWS S3, according to the Config settings, and removes all temporary files generated during backup. - -The current version of GBAR Backup takes snapshots quickly to make it very likely that all the components (GPE, GSE, and Dictionary) are in a consistent state, but it does not fully guarantee consistency. - -[WARNING] -==== -Backup does not save input message queues for REST{pp} or Kafka. -Make sure all messages are consumed before performing a backup. -==== - -== List Backup Files - -[source,console] ----- -gbar list ----- - -This command lists all generated backup files in the storage place configured by the user. -For each file, it shows the file's full tag, its size in human-readable format, and its creation time. - -== Restore from a backup archive - -Before restoring a backup, ensure that the backup you are restoring from is in the *same exact version* as your current version of TigerGraph. -Also make sure you have enough free disk space to accommodate both the old graph store and the graph store to be restored. - -To restore a backup, run the following command: - -[source,console] ----- -gbar restore ----- - -If GBAR can verify that the backup archive exists and that the backup's system version is compatible with the current system version, GBAR shuts down the TigerGraph servers temporarily as it restores the backup. -After completing the restore, GBAR restarts the TigerGraph servers. -If you are running a cluster, and you have copied the backup files to each individual node in the cluster, running `gbar restore` on any node restores the entire cluster. - -Restore is an offline operation, requiring the data services to be temporarily shut down. -The user must specify the full archive name ( `-` ) to be restored. - -=== Restore process - -When GBAR restore begins, it first searches for a backup archive exactly matching the archive name supplied in the command line. -Then it decompresses the backup files to a working directory. -Next, GBAR compares the TigerGraph system version in the backup archive with the current system's version, to make sure that the backup archive is compatible with that current system. -It will then shut down the TigerGraph servers (GSE, RESTPP, etc.) temporarily. - -GBAR then makes a copy of the current graph data, as a precaution. -Next, GBAR copies the backup graph data into the GPE and GSE and notifies the Dictionary to load the configuration data. -GBAR also notifies the GST to load backup user data and copy the backup user-defined token/functions to the right location. -When these actions are all done, GBAR restarts the TigerGraph servers. - - -== Remove a backup - -To remove a backup, run the `gbar remove` command: - -[source,console] ----- -$ gbar remove - ----- - -The command removes a backup from the backup storage path. -To retrieve the full tag of a backup with the timestamp, use the `gbar list` command. - -Please note that the backup tag entered when you create a backup automatically includes a timestamp that is shown in the results of `gbar list`. -The `gbar remove` command must use the full tag, including the timestamp. - -== Clean up temporary files - -Run `gbar cleanup` to delete the temporary files created during backup or restore operations: - -[source,console] ----- -$ gbar cleanup ----- - -== GBAR Detailed Example - -The following example describes a real example, to show the actual commands, the expected output, and the amount of time and disk space used, for a given set of graph data. For this example, an Amazon EC2 instance was used, with the following specifications: - -Single instance with 32 vCPU + 244GB memory + 2TB HDD. - -Naturally, backup and restore time will vary depending on the hardware used. - -=== GBAR Backup Operational Details - -To run a daily backup, we tell GBAR to backup with the tag name _daily_. - -[source,console] ----- -$ gbar backup -t daily -[23:21:46] Retrieve TigerGraph system configuration -[23:21:51] Start workgroup -[23:21:59] Snapshot GPE/GSE data -[23:33:50] Snapshot DICT data -[23:33:50] Calc checksum -[23:37:19] Compress backup data -[23:46:43] Pack backup data -[23:53:18] Put archive daily-20180607232159 to repo-local -[23:53:19] Terminate workgroup -Backup to daily-20180607232159 finished in 31m33s. ----- - -The total backup process took about 31 minutes, and the generated archive is about 49 GB. Dumping the GPE + GSE data to disk took 12 minutes. Compressing the files took another 20 minutes. - -=== GBAR Restore Operational Details - -To restore from a backup archive, a full archive name needs to be provided, such as _daily-20180607232159_. By default, restore will ask the user to approve to continue. If you want to pre-approve these actions, use the "-y" option. GBAR will make the default choice for you. - -[source,console] ----- -$ gbar restore daily-20180607232159 -[23:57:06] Retrieve TigerGraph system configuration -GBAR restore needs to reset TigerGraph system. -Do you want to continue?(y/N):y -[23:57:13] Start workgroup -[23:57:22] Pull archive daily-20180607232159, round #1 -[23:57:57] Pull archive daily-20180607232159, round #2 -[00:01:00] Pull archive daily-20180607232159, round #3 -[00:01:00] Unpack cluster data -[00:06:39] Decompress backup data -[00:17:32] Verify checksum -[00:18:30] gadmin stop gpe gse -[00:18:36] Snapshot DICT data -[00:18:36] Restore cluster data -[00:18:36] Restore DICT data -[00:18:36] gadmin reset -[00:19:16] gadmin start -[00:19:41] reinstall GSQL queries -[00:19:42] recompiling loading jobs -[00:20:01] Terminate workgroup -Restore from daily-20180607232159 finished in 22m55s. -Old gstore data saved under /home/tigergraph/tigergraph/gstore with suffix -20180608001836, you need to remove them manually. ----- - -For our test, GBAR restore took about 23 minutes. Most of the time (20 minutes) was spent decompressing the backup archive. - -Note that after the restore is done, GBAR informs you were the pre-restore graph data has been saved. After you have verified that the restore was successful, you may want to delete the old graph data files to free up disk space. - -=== Performance Summary of Example - -|=== -| GStore size | Backup file size | Backup time | Restore time - -| 219GB -| 49GB -| 31 mins -| 23 mins -|=== diff --git a/modules/backup-and-restore/pages/index.adoc b/modules/backup-and-restore/pages/index.adoc index dfcca409..7d97df3a 100644 --- a/modules/backup-and-restore/pages/index.adoc +++ b/modules/backup-and-restore/pages/index.adoc @@ -1,5 +1,5 @@ = Backup and Restore -:description: GBAR - Graph Backup and Restore +:description: How to configure and perform Backup and Restore operations, to save a copy of the database data, queries, and other information. :pp: {plus}{plus} TigerGraph offers the ability to perform backups and restore a backup. @@ -9,9 +9,6 @@ You can store your backup files locally or remotely in an Amazon S3 bucket. * xref:backup-cluster.adoc[] * xref:restore-backup-same.adoc[] * xref:cross-cluster-backup.adoc[] -* xref:tigergraph-server:backup-and-restore:online-backup.adoc[] -* xref:tigergraph-server:backup-and-restore:differential-backups.adoc[] -* xref:tigergraph-server:backup-and-restore:point-in-time-restore.adoc[] -* xref:gbar-legacy.adoc[] - - +* xref:online-backup.adoc[] +* xref:differential-backups.adoc[] +* xref:point-in-time-restore.adoc[] diff --git a/modules/backup-and-restore/pages/online-backup.adoc b/modules/backup-and-restore/pages/online-backup.adoc index c91dba58..cbc6a1ff 100644 --- a/modules/backup-and-restore/pages/online-backup.adoc +++ b/modules/backup-and-restore/pages/online-backup.adoc @@ -1,7 +1,7 @@ = Online Backup The introduction of Online Backup in version 3.10.0 is designed to operate seamlessly. -This feature maintains the same xref:tigergraph-server:backup-and-restore:configurations.adoc[] as before. +This feature maintains the same xref:backup-and-restore:configurations.adoc[] as before. [IMPORTANT] ==== @@ -23,8 +23,8 @@ The primary advantage of online backup lies in the uninterrupted data operations Consider a scenario where your TigerGraph database is actively serving user requests and, simultaneously, a backup needs to be performed. In the past, this could lead to blocking write operations, causing timeouts and preventing users from upserting data. -With online backup in place, users can seamlessly execute post requests via the xref:tigergraph-server:API:index.adoc[REST API] during the backup process. -This means that even as critical operations continue, the database is concurrently backed up, ensuring continuous data availability and preventing the inability to xref:tigergraph-server:API:upsert-rest.adoc[upsert data to the database]. +With online backup in place, users can seamlessly execute post requests via the xref:API:index.adoc[REST API] during the backup process. +This means that even as critical operations continue, the database is concurrently backed up, ensuring continuous data availability and preventing the inability to xref:API:upsert-rest.adoc[upsert data to the database]. ==== Summary of key benefits: @@ -46,9 +46,10 @@ This improvement not only safeguards ongoing operations but also streamlines the == Limitations and Considerations -Currently, the Online Backup feature has these known limitations: +Currently, the Online Backup feature has one limitation: -* If users start running a query, the full backup cannot succeed until that query finishes. -* Running queries also block the rebuild process. +* If a query starts during a *full* backup, the backup will not finish until that query finishes. + +However, *differential* backup does not have this limitation. Users can perform differential backups even if a query is running. diff --git a/modules/backup-and-restore/pages/point-in-time-restore.adoc b/modules/backup-and-restore/pages/point-in-time-restore.adoc index a59e1c78..9fe5d9ff 100644 --- a/modules/backup-and-restore/pages/point-in-time-restore.adoc +++ b/modules/backup-and-restore/pages/point-in-time-restore.adoc @@ -1,7 +1,7 @@ = Point-in-Time Restore Point-in-Time Restore, introduced in version 3.11, enables users to restore the database to a previous time point using previously made backups, even though no backup was conducted at exactly that time point. -This feature relies on the user having made one or more xref:tigergraph-server:backup-and-restore:differential-backups.adoc[differential backups]. +This feature relies on the user having made one or more xref:differential-backups.adoc[differential backups]. == Usage diff --git a/modules/backup-and-restore/pages/single-graph-import-export.adoc b/modules/backup-and-restore/pages/single-graph-import-export.adoc index 300eff49..ccaac324 100644 --- a/modules/backup-and-restore/pages/single-graph-import-export.adoc +++ b/modules/backup-and-restore/pages/single-graph-import-export.adoc @@ -8,7 +8,7 @@ To see how to Export/Inport all graphs in a database at once refer to xref:datab [IMPORTANT] ==== -Import/export is a complement to xref:tigergraph-server:backup-and-restore:index.adoc[], but not a substitute. +Import/export is a complement to xref:backup-and-restore:index.adoc[], but not a substitute. ==== == Import Prerequisite @@ -245,7 +245,7 @@ Importing a new solution cannot be undone to restore the previous state, regardl Therefore, create a complete backup beforehand in case you need to restore the database: xref:backup-cluster.adoc[] For security purposes, TigerGraph has two `gadmin` commands, `GSQL.UDF.Policy.Enable` and `GSQL.UDF.Policy.HeaderAllowlist` to prevent malicious code execution during import. -Please refer to the section on xref:gsql-ref:querying:func/query-user-defined-functions.adoc#udf-security[UDF Security] to ensure that UDFs comply with the security specifications. This will help you import the solution successfully. +Please refer to the section on xref:{page-component-version}@gsql-ref:querying:func/query-user-defined-functions.adoc#udf-security[UDF Security] to ensure that UDFs comply with the security specifications. This will help you import the solution successfully. ==== === Required privileges diff --git a/modules/cluster-and-ha-management/images/RESTPP-aws.png b/modules/cluster-and-ha-management/images/RESTPP-aws.png new file mode 100644 index 00000000..45d8e538 Binary files /dev/null and b/modules/cluster-and-ha-management/images/RESTPP-aws.png differ diff --git a/modules/cluster-and-ha-management/images/Test1.png b/modules/cluster-and-ha-management/images/Test1.png new file mode 100644 index 00000000..ddbfc407 Binary files /dev/null and b/modules/cluster-and-ha-management/images/Test1.png differ diff --git a/modules/cluster-and-ha-management/images/Test2.png b/modules/cluster-and-ha-management/images/Test2.png new file mode 100644 index 00000000..dd4a3419 Binary files /dev/null and b/modules/cluster-and-ha-management/images/Test2.png differ diff --git a/modules/cluster-and-ha-management/images/Test3.png b/modules/cluster-and-ha-management/images/Test3.png new file mode 100644 index 00000000..3582ab16 Binary files /dev/null and b/modules/cluster-and-ha-management/images/Test3.png differ diff --git a/modules/cluster-and-ha-management/images/Test4.png b/modules/cluster-and-ha-management/images/Test4.png new file mode 100644 index 00000000..ab499e79 Binary files /dev/null and b/modules/cluster-and-ha-management/images/Test4.png differ diff --git a/modules/cluster-and-ha-management/images/azure_backend_settings.png b/modules/cluster-and-ha-management/images/azure_backend_settings.png new file mode 100644 index 00000000..9033b2ec Binary files /dev/null and b/modules/cluster-and-ha-management/images/azure_backend_settings.png differ diff --git a/modules/cluster-and-ha-management/images/azure_gsql.png b/modules/cluster-and-ha-management/images/azure_gsql.png new file mode 100644 index 00000000..d167f131 Binary files /dev/null and b/modules/cluster-and-ha-management/images/azure_gsql.png differ diff --git a/modules/cluster-and-ha-management/images/azure_gui.png b/modules/cluster-and-ha-management/images/azure_gui.png new file mode 100644 index 00000000..0aa91656 Binary files /dev/null and b/modules/cluster-and-ha-management/images/azure_gui.png differ diff --git a/modules/cluster-and-ha-management/images/azure_restpp.png b/modules/cluster-and-ha-management/images/azure_restpp.png new file mode 100644 index 00000000..c2814b25 Binary files /dev/null and b/modules/cluster-and-ha-management/images/azure_restpp.png differ diff --git a/modules/cluster-and-ha-management/images/azure_rules.png b/modules/cluster-and-ha-management/images/azure_rules.png new file mode 100644 index 00000000..e1ab666a Binary files /dev/null and b/modules/cluster-and-ha-management/images/azure_rules.png differ diff --git a/modules/cluster-and-ha-management/images/backend_regions.png b/modules/cluster-and-ha-management/images/backend_regions.png new file mode 100644 index 00000000..da2bcd7c Binary files /dev/null and b/modules/cluster-and-ha-management/images/backend_regions.png differ diff --git a/modules/cluster-and-ha-management/images/gcp_default_backend.png b/modules/cluster-and-ha-management/images/gcp_default_backend.png new file mode 100644 index 00000000..02f18795 Binary files /dev/null and b/modules/cluster-and-ha-management/images/gcp_default_backend.png differ diff --git a/modules/cluster-and-ha-management/images/gcp_edit_backend.png b/modules/cluster-and-ha-management/images/gcp_edit_backend.png new file mode 100644 index 00000000..c777f872 Binary files /dev/null and b/modules/cluster-and-ha-management/images/gcp_edit_backend.png differ diff --git a/modules/cluster-and-ha-management/images/gcp_gsql.png b/modules/cluster-and-ha-management/images/gcp_gsql.png new file mode 100644 index 00000000..0ee001a6 Binary files /dev/null and b/modules/cluster-and-ha-management/images/gcp_gsql.png differ diff --git a/modules/cluster-and-ha-management/images/gcp_gui.png b/modules/cluster-and-ha-management/images/gcp_gui.png new file mode 100644 index 00000000..629ea04f Binary files /dev/null and b/modules/cluster-and-ha-management/images/gcp_gui.png differ diff --git a/modules/cluster-and-ha-management/images/gcp_restpp.png b/modules/cluster-and-ha-management/images/gcp_restpp.png new file mode 100644 index 00000000..f14eb57d Binary files /dev/null and b/modules/cluster-and-ha-management/images/gcp_restpp.png differ diff --git a/modules/cluster-and-ha-management/images/gsql-aws.png b/modules/cluster-and-ha-management/images/gsql-aws.png new file mode 100644 index 00000000..fee2ccda Binary files /dev/null and b/modules/cluster-and-ha-management/images/gsql-aws.png differ diff --git a/modules/cluster-and-ha-management/images/gui-aws.png b/modules/cluster-and-ha-management/images/gui-aws.png new file mode 100644 index 00000000..21600373 Binary files /dev/null and b/modules/cluster-and-ha-management/images/gui-aws.png differ diff --git a/modules/cluster-and-ha-management/images/host_and_path_rules.png b/modules/cluster-and-ha-management/images/host_and_path_rules.png new file mode 100644 index 00000000..697bb54d Binary files /dev/null and b/modules/cluster-and-ha-management/images/host_and_path_rules.png differ diff --git a/modules/cluster-and-ha-management/images/listener-aws.png b/modules/cluster-and-ha-management/images/listener-aws.png new file mode 100644 index 00000000..ad7deadb Binary files /dev/null and b/modules/cluster-and-ha-management/images/listener-aws.png differ diff --git a/modules/cluster-and-ha-management/images/listener-gsql-aws.png b/modules/cluster-and-ha-management/images/listener-gsql-aws.png new file mode 100644 index 00000000..b6bb52c1 Binary files /dev/null and b/modules/cluster-and-ha-management/images/listener-gsql-aws.png differ diff --git a/modules/cluster-and-ha-management/images/listener-gui-aws.png b/modules/cluster-and-ha-management/images/listener-gui-aws.png new file mode 100644 index 00000000..095d28d0 Binary files /dev/null and b/modules/cluster-and-ha-management/images/listener-gui-aws.png differ diff --git a/modules/cluster-and-ha-management/images/listener_restpp-aws.png b/modules/cluster-and-ha-management/images/listener_restpp-aws.png new file mode 100644 index 00000000..a99fe36f Binary files /dev/null and b/modules/cluster-and-ha-management/images/listener_restpp-aws.png differ diff --git a/modules/cluster-and-ha-management/images/listener_rules_aws.png b/modules/cluster-and-ha-management/images/listener_rules_aws.png new file mode 100644 index 00000000..00fe3a45 Binary files /dev/null and b/modules/cluster-and-ha-management/images/listener_rules_aws.png differ diff --git a/modules/cluster-and-ha-management/images/taget-instances-aws.png b/modules/cluster-and-ha-management/images/taget-instances-aws.png new file mode 100644 index 00000000..dc700a0e Binary files /dev/null and b/modules/cluster-and-ha-management/images/taget-instances-aws.png differ diff --git a/modules/cluster-and-ha-management/images/target-group-aws.png b/modules/cluster-and-ha-management/images/target-group-aws.png new file mode 100644 index 00000000..27d9dcb3 Binary files /dev/null and b/modules/cluster-and-ha-management/images/target-group-aws.png differ diff --git a/modules/cluster-and-ha-management/nav.adoc b/modules/cluster-and-ha-management/nav.adoc index 18b25c3b..f776db0f 100644 --- a/modules/cluster-and-ha-management/nav.adoc +++ b/modules/cluster-and-ha-management/nav.adoc @@ -1,16 +1,17 @@ * Cluster and HA Management ** Cluster Resizing -*** xref:cluster-and-ha-management:expand-a-cluster.adoc[] -*** xref:cluster-and-ha-management:shrink-a-cluster.adoc[] -*** xref:cluster-and-ha-management:repartition-a-cluster.adoc[] -*** xref:how_to-replace-a-node-in-a-cluster.adoc[Cluster Replace] -// -** xref:cluster-and-ha-management:crr-index.adoc[] -*** xref:cluster-and-ha-management:set-up-crr.adoc[] -*** xref:cluster-and-ha-management:fail-over.adoc[] -*** xref:cluster-and-ha-management:troubleshooting.adoc[] -*** xref:cluster-and-ha-management:crr-faq.adoc[] -// +*** xref:expand-a-cluster.adoc[] +*** xref:shrink-a-cluster.adoc[] +*** xref:repartition-a-cluster.adoc[] +*** xref:remove-failed-node.adoc[] +*** xref:replace-a-node.adoc[] +// CRR +** xref:crr-index.adoc[] +*** xref:set-up-crr.adoc[] +*** xref:fail-over.adoc[] +*** xref:troubleshooting.adoc[Troubleshooting CRR] +*** xref:crr-faq.adoc[] +// HA ** xref:ha-overview.adoc[High Availability] *** xref:ha-cluster.adoc[] *** xref:ha-for-gsql-server.adoc[] diff --git a/modules/cluster-and-ha-management/pages/.elastic-cluster.adoc b/modules/cluster-and-ha-management/pages/.elastic-cluster.adoc index f8fcf60a..8c8ca90d 100644 --- a/modules/cluster-and-ha-management/pages/.elastic-cluster.adoc +++ b/modules/cluster-and-ha-management/pages/.elastic-cluster.adoc @@ -1,5 +1,5 @@ = Compute-on-demand Elastic Cluster -:page-aliases: tigergraph-server:ha:elastic-cluster.adoc +:page-aliases: ha:elastic-cluster.adoc :description: Overview of TigerGraph's compute-on-demand elastic cluster. TigerGraph allows you to spin up an Elastic Read-only (ER) cluster from your primary cluster to handle compute-intensive online analytical processing (OLAP) queries on demand. diff --git a/modules/cluster-and-ha-management/pages/.set-up-elastic-cluster.adoc b/modules/cluster-and-ha-management/pages/.set-up-elastic-cluster.adoc index 1c940c02..abe95edc 100644 --- a/modules/cluster-and-ha-management/pages/.set-up-elastic-cluster.adoc +++ b/modules/cluster-and-ha-management/pages/.set-up-elastic-cluster.adoc @@ -1,5 +1,5 @@ = Set Up Elastic Cluster for On-Prem Instance -:page-aliases: tigergraph-server:ha:set-up-elastic-cluster.adoc +:page-aliases: ha:set-up-elastic-cluster.adoc :description: Instructions on how to set up an elastic cluster for an on-prem TigerGraph instance. This page walks you through setting up an elastic cluster for an on-prem TigerGraph instance. diff --git a/modules/cluster-and-ha-management/pages/cluster-commands.adoc b/modules/cluster-and-ha-management/pages/cluster-commands.adoc index ffad39e6..a606eb35 100644 --- a/modules/cluster-and-ha-management/pages/cluster-commands.adoc +++ b/modules/cluster-and-ha-management/pages/cluster-commands.adoc @@ -1,5 +1,5 @@ = Cluster Commands -//:page-aliases: tigergraph-server:ha:cluster-commands.adoc +//:page-aliases: ha:cluster-commands.adoc This page documents a list of advanced Linux commands that simplify platform operations that are performed often during debugging, especially on high availability (HA) clusters. Only the TigerGraph platform owner (the Linux user created during installation) has access to the commands on this page. diff --git a/modules/cluster-and-ha-management/pages/crr-faq.adoc b/modules/cluster-and-ha-management/pages/crr-faq.adoc index 0626bb32..042813ce 100644 --- a/modules/cluster-and-ha-management/pages/crr-faq.adoc +++ b/modules/cluster-and-ha-management/pages/crr-faq.adoc @@ -1,5 +1,5 @@ = Cross-Region Replication FAQ -//:page-aliases: tigergraph-server:crr:faq.adoc, tigergraph-server:crr:crr-faq.adoc +//:page-aliases: crr:faq.adoc, crr:crr-faq.adoc == How can I verify if data is replicated between primary and Disaster Recovery (DR) clusters? @@ -24,13 +24,13 @@ Here is a list of all commands and operations that will stop CRR: * `gsql --reset` which clears all data, schemas and users, even resetting the password of the default tigergraph. * `gsql import graph` * `gsql export graph` -* `gbar restore` +* `gadmin backup restore` == Why is GSQL failing to replay a replica with an "UNAUTHORIZED" error? It's most likely that primary and DR have different passwords for the same TigerGraph user. -This can happen when you enable CRR without restoring the GBAR backup in the DR cluster (since you did not have any data), but the DR was installed with a different password than the primary. +This can happen when you enable CRR without restoring the backup of the DR cluster (since you did not have any data), but the DR was installed with a different password than the primary. Make sure the DR and primary clusters have the same TigerGraph password before enabling CRR. == What happens if DR is down, unavailable or under scheduled maintenance (e.g. VM Motion)? diff --git a/modules/cluster-and-ha-management/pages/crr-index.adoc b/modules/cluster-and-ha-management/pages/crr-index.adoc index 6fa6ddba..61771524 100644 --- a/modules/cluster-and-ha-management/pages/crr-index.adoc +++ b/modules/cluster-and-ha-management/pages/crr-index.adoc @@ -1,5 +1,5 @@ = Cross-Region Replication -//:page-aliases: tigergraph-server:crr:cross-region-replication.adoc, tigergraph-server:crr:index.adoc +//:page-aliases: crr:cross-region-replication.adoc, crr:index.adoc :description: Overview of cross-region replication for TigerGraph servers. TigerGraph's Cross-Region Replication (CRR) feature allows users to keep two or more TigerGraph clusters in different data centers or regions in sync. @@ -41,7 +41,7 @@ Data loaded through a loading job is still replicated from the primary to the DR The following commands/actions and will *stop* syncing to the DR cluster: -* `gbar restore` while the cross-region replication is enabled. +* `gadmin backup restore` while the cross-region replication is enabled. * `gsql --reset` command * The following GSQL commands: ** `EXPORT` and `IMPORT` commands @@ -68,7 +68,7 @@ CRR logic is divided into two layers: [NOTE] ==== -A "replica" in Kafka topic context is the committed operation of a write transaction ( `CREATE VETEX Person`) that happened successfully on the primary. +A "replica" in Kafka topic context is the committed operation of a write transaction ( `CREATE VERTEX Person`) that happened successfully on the primary. Every replica has a unique identification which is consistent between primary and DR. Kafka MirrorMaker is a stand-alone tool (out-of-the-box from TigerGraph) for copying data between two Kafka, in this specific case between primary Kafka and DR Kafka. diff --git a/modules/cluster-and-ha-management/pages/expand-a-cluster.adoc b/modules/cluster-and-ha-management/pages/expand-a-cluster.adoc index e02a5002..27291193 100644 --- a/modules/cluster-and-ha-management/pages/expand-a-cluster.adoc +++ b/modules/cluster-and-ha-management/pages/expand-a-cluster.adoc @@ -1,5 +1,5 @@ = Cluster Expansion -//:page-aliases: tigergraph-server:cluster-resizing:expand-a-cluster.adoc +//:page-aliases: cluster-resizing:expand-a-cluster.adoc :sectnums: Expanding a cluster adds more nodes to the cluster. @@ -81,19 +81,35 @@ We suggest naming the new nodes following the convention of `m`, such as ==== Supply a staging location -Extra disk space is required during cluster expansion. If more space is not available on the same disk, you can supply a staging location on a different disk to hold temporary data: +Extra disk space is required during cluster expansion. If more space is not available on the existing disk, you can supply a staging location on a different disk to hold temporary data: [source,console] ---- -$ gadmin cluster expand m3:192.168.1.3,m4:192.168.1.4 --stagingPath /tmp/ +$ gadmin cluster expand m3:192.168.1.3,m4:192.168.1.4 --staging-path /tmp/ ---- -If you choose to supply a staging location, make sure that the TigerGraph Linux user has write permission to the path you provide. The overall amount of space required for expansion on each node is `(1 + ceiling(oldPartition/newPartition) ) * dataRootSize`. -`oldPartition` and `newPartition` stand for the partitioning factors of the cluster before and after expansion, respectively; `dataRootSize` stands for the size of the data root folder on the node. +When you specify `--staging-path`, the path must: -For example, assume you are expanding from a 6-node cluster with a replication factor of 2 and a partitioning factor of 3, to a 10-node cluster with a replication factor of 2 and a partitioning factor of 5, and the size of the data root folder on a node is 50GB. -You would need more than `(1 + ceiling(3/5)) * 50) = 100 GB` of free space on the staging path. +* Exist on *every node* in the cluster (both existing nodes and newly added nodes). +* Be accessible and writable by the TigerGraph Linux user on all nodes. +TigerGraph uses the staging path on each node during expansion, so the operation will fail if the path is missing or not writable on any node. + +The overall amount of space required on each node is: + +`(1 + ceiling(oldPartition / newPartition)) * dataRootSize` + +where: + +* `oldPartition` = partitioning factor before expansion +* `newPartition` = partitioning factor after expansion +* `dataRootSize` = size of the data root folder on that node + +For example, expanding from a 6-node cluster with replication factor 2 and partitioning factor 3 to a 10-node cluster with replication factor 2 and partitioning factor 5, with a 50 GB data root folder, requires more than: + +`(1 + ceiling(3 / 5)) * 50 = 100 GB` + +of free space on the staging path of *each node*. === Verify success and delete temporary files diff --git a/modules/cluster-and-ha-management/pages/fail-over.adoc b/modules/cluster-and-ha-management/pages/fail-over.adoc index c0e8d2e9..f0e381db 100644 --- a/modules/cluster-and-ha-management/pages/fail-over.adoc +++ b/modules/cluster-and-ha-management/pages/fail-over.adoc @@ -1,5 +1,5 @@ = Fail over to the DR cluster -//:page-aliases: tigergraph-server:crr:fail-over.adoc +//:page-aliases: crr:fail-over.adoc In the event of catastrophic failure that has impacted the full cluster due to Data Center or Region failure, the user can initiate the failover to the Disaster Recovery (DR) cluster. This is a manual process. @@ -8,8 +8,9 @@ Run the following commands to make configuration changes on the DR cluster to up [source,console] ---- -gadmin config set System.CrossRegionReplication.Enabled false -gadmin config apply -y +gadmin crr stop -y" +gadmin config set System.CrossRegionReplication.Enabled false" +gadmin config apply -y" gadmin restart -y ---- @@ -25,24 +26,19 @@ To set up a new DR cluster over the upgraded primary cluster: [source,console] ---- -# Enable Kafka Mirrormaker -$ gadmin config set System.CrossRegionReplication.Enabled true - -# Kafka mirrormaker primary cluster's IPs, separator by ',' -$ gadmin config set System.CrossRegionReplication.PrimaryKafkaIPs PRIMARY_IP1,PRIMARY_IP2,PRIMARY_IP3 - -# Kafka mirrormaker primary cluster's KafkaPort -$ gadmin config set System.CrossRegionReplication.PrimaryKafkaPort 30002 - -# The prefix of GPE/GUI/GSQL Kafka Topic, by default is empty. -$ gadmin config set System.CrossRegionReplication.TopicPrefix Primary.Primary - -# Apply the config changes, init Kafka, and restart -$ gadmin config apply -y -$ gadmin init kafka -y -$ gadmin restart all -y + # Kafka mirrormaker primary cluster's IPs, separator by ',' + $ gadmin config set System.CrossRegionReplication.PrimaryKafkaIPs PRIMARY_IP1,PRIMARY_IP2,PRIMARY_IP3 + + # Kafka mirrormaker primary cluster's KafkaPort + $ gadmin config set System.CrossRegionReplication.PrimaryKafkaPort 30002 + + # The prefix of GPE/GUI/GSQL Kafka Topic, by default is empty. + $ gadmin config set System.CrossRegionReplication.TopicPrefix Primary.Primary + + # Enable CRR with the primary's backup created in step 1 + $ gadmin backup restore --dr ---- There is no limit on the number of times a cluster can fail over to another cluster. When designating a new DR cluster, make sure that you set the `System.CrossRegionReplication.TopicPrefix` parameter correctly by adding an additional `.Primary` . -For example, if your original cluster fails over once, and the current cluster's `TopicPrefix` is `Primary`, then the new DR cluster needs to have its `TopicPrefix` be `Primary.Primary`. If it needs to fail over again, the new DR cluster needs to have its `TopicPrefix` be set to `Primary.Primary.Primary`. \ No newline at end of file +For example, if your original cluster fails over once, and the current cluster's `TopicPrefix` is `Primary`, then the new DR cluster needs to have its `TopicPrefix` be `Primary.Primary`. If it needs to fail over again, the new DR cluster needs to have its `TopicPrefix` be set to `Primary.Primary.Primary`. diff --git a/modules/cluster-and-ha-management/pages/ha-cluster.adoc b/modules/cluster-and-ha-management/pages/ha-cluster.adoc index 4470b7bf..12d934c6 100644 --- a/modules/cluster-and-ha-management/pages/ha-cluster.adoc +++ b/modules/cluster-and-ha-management/pages/ha-cluster.adoc @@ -3,7 +3,7 @@ :stem: latexmath :partition: partition :bucket: bucket -//:page-aliases: tigergraph-server:ha:index.adoc, tigergraph-server:ha:ha-cluster.adoc +//:page-aliases: ha:index.adoc, ha:ha-cluster.adoc TigerGraph HA service provides load balancing when all components are operational, as well as automatic failover in the event of a service disruption. diff --git a/modules/cluster-and-ha-management/pages/ha-for-application-server.adoc b/modules/cluster-and-ha-management/pages/ha-for-application-server.adoc index dd019c3a..ea6807c0 100644 --- a/modules/cluster-and-ha-management/pages/ha-for-application-server.adoc +++ b/modules/cluster-and-ha-management/pages/ha-for-application-server.adoc @@ -1,5 +1,5 @@ = High Availability Support for Application Server -//:page-aliases: tigergraph-server:ha:ha-for-application-server.adoc +//:page-aliases: ha:ha-for-application-server.adoc :description: Overview of high availability support for the application server. TigerGraph supports native HA functionality for its application server, which serves the APIs for TigerGraph's GUI - GraphStudio and Admin Portal. @@ -15,6 +15,7 @@ When you deploy TigerGraph in a cluster with multiple replicas, it is ideal to s This page discusses what to do when a server fails when you haven't set up load balancing, and the steps needed to set up load balancing for the application server. + == When a server fails When a server fails, users can proceed to the next available server within the cluster to resume the operations. @@ -32,76 +33,169 @@ When you deploy TigerGraph in a cluster with multiple replicas, it is ideal to s === Set up load balancing with Nginx -One possible choice for setting up load balancing is through the use of Nginx. +TigerGraph includes Nginx in the package by default. This Nginx has a function in the pre-built template to route traffic between the available RESTPP, GSQL and GUI instances. This means that load balancing is built into your deployment, when a request reaches Nginx, this request is load balanced between the available RESTPP, GSQL and GUI. + +However, Nginx is only a proxy at the instance level, to expose the cluster to outside world usually an external load balancer is employed. This guide describes how to configure load balancing in AWS, Azure, and GCP. + +=== Prerequisite +A potential issue with the setup is that Nginx may not always skip an unhealthy instance of RESTPP, GSQL, or GUI, causing requests to be routed to an unhealthy instance and return errors to the load balancer. Because of the routing behavior and health check configuration, an instance may alternate between healthy and unhealthy states in the cloud console, depending on how many requests return successfully. To avoid this, verify that the TigerGraph built-in Nginx template is configured correctly. + +To retrieve the template, you can run this command: +[source,text] +---- +gadmin config get Nginx.ConfigTemplate > nginx.template +---- -Here is an example Nginx configuration for the upstream and server directives: +Observe the file `nginx.template` and make sure that `proxy_next_upstream error timeout http_502 http_504 exists in the required block:` [source,text] ---- - upstream flask_pool { - ip_hash; - zone flask_pool 64k; - server 172.31.86.19:14240; - server 172.31.88.70:14240; - server 172.31.94.90:14240; - - keepalive 32; - } - - server { - listen 8000; - server_name localhost; - - location / { - root html; - index index.html index.htm; - proxy_pass http://flask_pool; - proxy_read_timeout 3600; - proxy_set_header Connection ""; - proxy_http_version 1.1; - chunked_transfer_encoding off; - proxy_buffering off; - proxy_cache off; + # 1. Forward to upstream backend in round-robin manner. + # 2. The URL prefixed with "/internal/". + # 3. Using the same scheme as the incoming request. + location @gui-server { + rewrite ^/(.*) /internal/gui/$1 break; + proxy_read_timeout 604800s; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_http_version 1.1; + proxy_set_header Connection ""; + proxy_buffering off; + proxy_pass __UPSTREAM_GSQLGUI_SCHEME__://all_gui_server; + + # This line is required to skip unhealthy GUI instance. + proxy_next_upstream error timeout http_502 http_504; } - error_page 500 502 503 504 /50x.html; - location = /50x.html { - root html; + ... + + # 1. Forward to upstream backend in round-robin manner. + # 2. The URL prefixed with "/internal/". + # 3. Using the same scheme as the incoming request. + location /gsqlserver/ { + proxy_read_timeout 604800s; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-User-Agent $http_user_agent; + proxy_http_version 1.1; + proxy_set_header Connection ""; + proxy_buffering off; + proxy_pass __UPSTREAM_GSQLGUI_SCHEME__://all_gsql_server/internal/gsqlserver/; + + # This line is required to skip unhealthy GSQL instance. + proxy_next_upstream error timeout http_502 http_504; + } + + location ~ ^/restpp/(.*) { + # use rewrite since proxy_pass doesn't support URI part in regular expression location + rewrite ^/restpp/(.*) /$1 break; + proxy_read_timeout 604800s; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-User-Agent $http_user_agent; + proxy_http_version 1.1; + proxy_set_header Connection ""; + proxy_set_header Host $http_host; + proxy_buffering off; + proxy_pass __UPSTREAM_GSQLGUI_SCHEME__://all_fastcgi_backend; + + # This line is required to skip unhealthy RESTPP instance. + proxy_next_upstream error timeout http_502 http_504; } - } ---- -The server directives should specify your nodes' addresses which you want to load balance. -Since TigerGraph requires session persistence, the load balancing methods will be limited to _ip_hash_ or _hash_, unless you have access to Nginx Plus, which then means any load balancing method may be used with session persistence setup: https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/#sticky +For 4.1.3 and below, RESTPP uses `fastcgi` so we have to use +`fastcgi_next_upstream` -An active health check can be set on the following endpoint if using Nginx Plus: +[source,text] +---- + location ~ ^/restpp/(.*) { + fastcgi_pass fastcgi_backend; + fastcgi_keep_conn on; + fastcgi_param REQUEST_METHOD $request_method; + ... + # This line is required to skip unhealthy RESTPP instance. + # note: http_502 and http_504 are not for fastcgi, but for proxy_pass + fastcgi_next_upstream error timeout; + } +---- -`/api/ping` -Otherwise, only a passive health check is available. See Nginx documentation for more information: https://docs.nginx.com/nginx/admin-guide/load-balancer/http-health-check/ +If the mentioned line doesn’t exist, please add them to the aforementioned block and apply the config. + +[source,text] +---- +gadmin config set Nginx.ConfigTemplate @/path/to/nginx.template +gadmin config apply -y +gadmin restart nginx -y +---- + +== Common Load Balancer Settings + +In 3.11.1 the following settings apply to all three services (RESTPP, GSQL, GUI) across AWS, Azure, and GCP. + +[cols="1,1,1,1", options="header"] +|=== +| Service | Protocol | Port | Health Check Path + +| RESTPP | HTTP | 14240 | /restpp/echo +| GSQL | HTTP | 14240 | /gsqlserver/gsql/version +| GUI | HTTP | 14240 | /api/ping +|=== + +For the basic setup, apply the following configuration to all the three services (RESTPP, GSQL, GUI) and across AWS, Azure, and GCP: + +* Protocol: HTTP (default TigerGraph deployment) +* Port: 14240 (default TigerGraph deployment) +* Configure health check paths individually for each service as shown in the table above. + === Set up AWS Elastic Load Balancer If your applications are provisioned on AWS, another choice for load balancing is through the use of an link:https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html[Application Load Balancer]. -To create an application load balancer, follow AWS's guide to link:https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-application-load-balancer.html[create an application load balancer]. The following configurations apply as you follow the guide: +To create an application load balancer, follow AWS's guide to link:https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-application-load-balancer.html[create an application load balancer]. + +.Basic target group setup + +image::target-group-aws.png[] + +=== Health check +Each service requires a different health check configuration. Use the following paths: + -==== Configure a security group +.GUI health check -When creating or using an existing security group in Step 3, make sure it allows requests from the load balancer to port 14240 of the instances in the target group. +image::gui-aws.png["GUI health check configuration"] -==== Health check URL +.GSQL health check +image::gsql-aws.png["GSQL health check configuration"] -In Step 4, set the health check URL to `/api/ping` +.RESTPP health check +image::RESTPP-aws.png["RESTPP health check configuration"] -==== Configure targets for the target group +.This setup selects the same instances for all the target groups +image::target-group-aws.png[] -In Step 5, enter `14240` for the port of your instances. +=== Listener +Create a listener that routes to the GUI target group, as most requests will go here. You can also choose the GSQL or RESTPP target group. This target group acts as the fallback option if no other rules match. -==== Enable sticky sessions +image::listener-aws.png[] -After following the steps and creating your load balancer, https://docs.aws.amazon.com/elasticloadbalancing/latest/application/sticky-sessions.html[enable sticky sessions] in your target group. +Create a rule for each target group. Skip the rule for the target group that is your default routing action since if nothing matches it will fallback to this rule. -After successfully creating your load balancer, you should now be able to access GraphStudio through the load balancer's DNS name. The DNS name can be found under the "Description" tab of your load balancer in the Amazon EC2 console. +.GUI listener + +image::listener-gui-aws.png[] + +.GSQL listener +image::listener-gsql-aws.png[] + +.RESTPP listener +image::listener_restpp-aws.png[] + +.Final Result +image::listener_rules_aws.png[] === Set up Azure Application Gateway @@ -119,15 +213,30 @@ Some different TigerGraph specific settings are required during Application Gate After the Application Gateway is complete, we need to create a custom health probe in order to check the health/status of our Application Servers. You can follow the following steps outlined here: link:https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-create-probe-portal[Create a custom probe using the portal - Azure Application Gateway] -When filling out the health probe information, the fields below should have the following values: +.Basic backend setup +image::azure_backend_settings.png[] + + +=== Health check +Each service requires a different health check configuration. Use the following paths: + +.GUI health check -*Pick port from backend HTTP settings:* yes +image::azure_gui.png[] +.GSQL health check +image::azure_gsql.png[] +.RESTPP health check +image::azure_restpp.png[] -*Path:* `/api/ping` +=== Rules + +Select the GUI backend as the default target since most requests are routed here. You can also set GSQL or RESTPP as the default. This backend automatically acts as the fallback if no other rules match. + +.Final Result + +image::azure_rules.png[] -*HTTP Settings:* The HTTP settings associated with the backend pool create during the Application Gateway setup -After successfully creating the Application Gateway, you should now be able to access GraphStudio from the frontend IP associated with the Application Gateway. === Set up GCP External HTTP(s) Load Balancer @@ -135,15 +244,51 @@ If your instances are provisioned on Google Cloud Platform (GCP), you can set up You can follow Google's provided steps in their documentation for setup here: https://cloud.google.com/iap/docs/load-balancer-howto[Setting up an external HTTPS load balancer | Identity-Aware Proxy] -When https://cloud.google.com/iap/docs/load-balancer-howto#mig[creating the instance group]: -* Click "`Specify port name mapping`", and use `14240` for the port +image::gcp_default_backend.png[] +.Basic backend service setup + +image::gcp_edit_backend.png[] + +=== Health check +Each service requires a different health check configuration. Use the following paths: + +.GUI health check + + +image::gcp_gui.png[] + +.GSQL health check +image::gcp_gsql.png[] + +.RESTPP health check +image::gcp_restpp.png[] + +.This setup selects the same instances for all the target groups +image::backend_regions.png[] + +=== Host and Path Rules +Create a default rule that routes to the GUI backend service, as it handles most requests. You can also configure GSQL or RESTPP as the default. This backend service automatically acts as the fallback if no other rules match. + +.Final Result +image::host_and_path_rules.png[] + +== Test + +This section demonstrates testing on AWS, but the same approach applies to Azure and GCP. + +To verify the setup, shut down one of the backend services (RESTPP, GSQL, or GUI) and observe the target group behavior. + +For example, when the GUI service is shut down, its target group becomes non-functional, and any calls to `/api/*` will fail. + + -When https://cloud.google.com/load-balancing/docs/health-checks[setting up the health check]: +image::Test1.png[] +However, GSQL and RESTPP continue to function normally -* For the port, use `14240`. -* For the path, use `/api/ping`. +image::Test2.png[] +image::Test3.png[] -Lastly, we need to set up session affinity for our load balancer. This is outlined in GCP documentation here: https://cloud.google.com/load-balancing/docs/https#session_affinity[External HTTP(S) Load Balancing overview | Google Cloud] +If at least one instance of the GUI is running, the target group remains healthy because of the Nginx configuration. -After successfully creating the load balancer, you should now be able to access GraphStudio from the frontend IP associated with the load balancer. +image::Test4.png[] diff --git a/modules/cluster-and-ha-management/pages/ha-for-gsql-server.adoc b/modules/cluster-and-ha-management/pages/ha-for-gsql-server.adoc index 25f8db92..8c0eb0c6 100644 --- a/modules/cluster-and-ha-management/pages/ha-for-gsql-server.adoc +++ b/modules/cluster-and-ha-management/pages/ha-for-gsql-server.adoc @@ -1,5 +1,5 @@ = High Availability Support for GSQL Server -//:page-aliases: tigergraph-server:ha:ha-for-gsql-server.adoc +//:page-aliases: ha:ha-for-gsql-server.adoc :description: High availability overview for the GSQL server. TigerGraph has built-in HA for all the internal critical components. diff --git a/modules/cluster-and-ha-management/pages/ha-overview.adoc b/modules/cluster-and-ha-management/pages/ha-overview.adoc index bda38b26..febf1f28 100644 --- a/modules/cluster-and-ha-management/pages/ha-overview.adoc +++ b/modules/cluster-and-ha-management/pages/ha-overview.adoc @@ -8,7 +8,7 @@ For example, an application's predefined queries will continue to run and file-b Incorporating HA can yield a number of other system benefits, such as: * The query workload is distributed across all replicas. -** This includes replicas used for long-running queries or with the xref:tigergraph-server:API:built-in-endpoints.adoc#_headers[GSQL-REPLICA] header. +** This includes replicas used for long-running queries or with the xref:API:built-in-endpoints.adoc#_headers[GSQL-REPLICA] header. * Data loading operations are distributed across all nodes. * Individual nodes can fail without impacting query workloads. ** If a query does fail during a node failure the system adjusts to accommodate the failed node (typically up to 30 seconds). It is highly recommended to adopt client-side retry logic as a workaround. @@ -22,7 +22,7 @@ NOTE: The re-pavement of a node, is an offline process that takes a node offline TigerGraph HA provides continuous operation of some but not all services. Please note the following exceptions and consider your options for taking additional steps to maintain continuous operation or to restore service as quickly as possible. -.If an HA system is operating with a failed node, unless the node is recovered, xref:tigergraph-server:cluster-and-ha-management:how_to-replace-a-node-in-a-cluster.adoc[replaced], or the system is reconfigured to xref:tigergraph-server:cluster-and-ha-management:remove-failed-node.adoc[exclude that node], the following services are limited or unavailable: +.If an HA system is operating with a failed node, unless the node is recovered, xref:how_to-replace-a-node-in-a-cluster.adoc[replaced], or the system is reconfigured to xref:remove-failed-node.adoc[exclude that node], the following services are limited or unavailable: * A data partition slated for a connector-based loading, such as, s3 files or via Kafka, *cannot* be loaded. @@ -34,7 +34,7 @@ NOTE: However, new interpreted and any existing queries can still be executed. * Database export operation is *not available* and will be rejected. -NOTE: As a workaround, if the failed node cannot be recovered (e.g. hardware issue), full operation can be restored temporarily by the xref:tigergraph-server:cluster-and-ha-management:remove-failed-node.adoc[removal of the failing nodes]. +NOTE: As a workaround, if the failed node cannot be recovered (e.g. hardware issue), full operation can be restored temporarily by the xref:remove-failed-node.adoc[removal of the failing nodes]. For example, a 5 x 2 cluster with one node removed would become a 4x2 + 1, where 1 is the data partition that is not being replicated. === 3.9.2 and Below @@ -42,7 +42,7 @@ For example, a 5 x 2 cluster with one node removed would become a 4x2 + 1, where In addition to the considerations above, in versions 3.9.2 and below, users will not be able to run a GSQL query when a single node is down in a High Availability cluster. -In this case, as with the other consideration cases above, the failed node needs to be xref:tigergraph-server:cluster-and-ha-management:remove-failed-node.adoc[removed] from the cluster via: +In this case, as with the other consideration cases above, the failed node needs to be xref:remove-failed-node.adoc[removed] from the cluster via: [source, console] ---- @@ -54,29 +54,29 @@ gadmin cluster remove : This issue is no longer present in versions 3.9.3 and 3.10.0. ==== -== xref:tigergraph-server:cluster-and-ha-management:ha-cluster.adoc[High Availability Cluster Configuration] +== xref:ha-cluster.adoc[High Availability Cluster Configuration] Here you will find detailed information about terminology, system requirements, and how to configure an HA cluster. -== xref:tigergraph-server:cluster-and-ha-management:ha-for-gsql-server.adoc[High Availability Support for GSQL Server] +== xref:ha-for-gsql-server.adoc[High Availability Support for GSQL Server] Learn how TigerGraph incorporates built-in HA for all the internal critical components. -== xref:tigergraph-server:cluster-and-ha-management:ha-for-application-server.adoc[High Availability Support for Application Server] +== xref:ha-for-application-server.adoc[High Availability Support for Application Server] Here you will find detailed information about how TigerGraph supports native HA functionality for its application server, which serves the APIs for TigerGraph’s GUI - GraphStudio and Admin Portal. -== xref:tigergraph-server:cluster-and-ha-management:cluster-commands.adoc[Cluster Commands] +== xref:cluster-commands.adoc[Cluster Commands] Here platform owners can learn advanced Linux commands that simplify platform operations and can be performed during debugging on HA clusters. -== xref:tigergraph-server:cluster-and-ha-management:remove-failed-node.adoc[Removal of Failed Nodes] +== xref:remove-failed-node.adoc[Removal of Failed Nodes] Here you find detailed Instructions for the removal of a failed node. == File and Kafka loaders HA with Auto-Restart -Loading jobs for xref:tigergraph-server:data-loading:load-from-kafka.adoc[Kafka] and xref:tigergraph-server:data-loading:load-local-files.adoc[Local Files] will automatically restart the loader job process if it unexpectedly exits. +Loading jobs for xref:data-loading:load-from-kafka.adoc[Kafka] and xref:data-loading:load-local-files.adoc[Local Files] will automatically restart the loader job process if it unexpectedly exits. And will then continue to import data from the loading job. This functionality is enabled by default, but users can disable this feature by setting `LoaderRetryMax=0` through `gadmin config set RESTPP.BasicConfig.Env`. diff --git a/modules/cluster-and-ha-management/pages/index.adoc b/modules/cluster-and-ha-management/pages/index.adoc index 44331194..0a9fe7c7 100644 --- a/modules/cluster-and-ha-management/pages/index.adoc +++ b/modules/cluster-and-ha-management/pages/index.adoc @@ -9,7 +9,7 @@ Learn how to manage a systems clusters. * xref:cluster-and-ha-management:expand-a-cluster.adoc[] * xref:cluster-and-ha-management:shrink-a-cluster.adoc[] * xref:cluster-and-ha-management:repartition-a-cluster.adoc[] -* xref:how_to-replace-a-node-in-a-cluster.adoc[Cluster Replacing] +* xref:replace-a-node.adoc[] [TIP] ==== diff --git a/modules/cluster-and-ha-management/pages/remove-failed-node.adoc b/modules/cluster-and-ha-management/pages/remove-failed-node.adoc index 049b7351..cd652e89 100644 --- a/modules/cluster-and-ha-management/pages/remove-failed-node.adoc +++ b/modules/cluster-and-ha-management/pages/remove-failed-node.adoc @@ -1,9 +1,9 @@ = Removal of Failed Nodes -//:page-aliases: tigergraph-server:ha:remove-failed-node.adoc +//:page-aliases: ha:remove-failed-node.adoc :description: This page describes the procedure to remove a failed node. -If a node fails in a highly available (HA) cluster, as in a hardware failure (replication factor > 1), you can remove the failed node from the cluster while keeping all your data intact. -After removal, use the xref:expand-a-cluster.adoc[cluster expansion] feature to restore your cluster to its original size. +If a node in a high availability (HA) cluster has an unrecoverable failure, such as a hardware failure, you can remove the failed node from the cluster while keeping all your data intact. +After removal, use the xref:expand-a-cluster.adoc[cluster expansion] feature to maintain the remaining nodes in a functioning state. Restoring is important as node removal does not redistribute the data in your cluster. You should only consider removing a node when the node has failed. diff --git a/modules/cluster-and-ha-management/pages/repartition-a-cluster.adoc b/modules/cluster-and-ha-management/pages/repartition-a-cluster.adoc index 3d49dd34..c347e084 100644 --- a/modules/cluster-and-ha-management/pages/repartition-a-cluster.adoc +++ b/modules/cluster-and-ha-management/pages/repartition-a-cluster.adoc @@ -1,5 +1,5 @@ = Cluster Repartition -//:page-aliases: tigergraph-server:cluster-resizing:repartition-a-cluster.adoc" +//:page-aliases: cluster-resizing:repartition-a-cluster.adoc" :sectnums: diff --git a/modules/cluster-and-ha-management/pages/how_to-replace-a-node-in-a-cluster.adoc b/modules/cluster-and-ha-management/pages/replace-a-node.adoc similarity index 85% rename from modules/cluster-and-ha-management/pages/how_to-replace-a-node-in-a-cluster.adoc rename to modules/cluster-and-ha-management/pages/replace-a-node.adoc index f47e1ffe..bd216fa4 100644 --- a/modules/cluster-and-ha-management/pages/how_to-replace-a-node-in-a-cluster.adoc +++ b/modules/cluster-and-ha-management/pages/replace-a-node.adoc @@ -1,13 +1,14 @@ -= How to Replace a Node in a Cluster += Replacing a Cluster Node :description: This page describes the procedure to replace a node in a non-ha cluster. +:page-aliases: how_to-replace-a-node-in-a-cluster.adoc //welcome and introduction -This guide outlines the procedure for replacing a node in a cluster regardless of whether it is an High Availability(HA) cluster. If your system uses xref:ha-overview.adoc[High Availability] and ** you do not have a replacement node to use, ** refer to the documentation on removing a failed node in xref:tigergraph-server:cluster-and-ha-management:remove-failed-node.adoc[Removal of Failed Nodes]. +This guide outlines the procedure for replacing a node in a cluster regardless of whether it is an High Availability(HA) cluster. If your system uses xref:ha-overview.adoc[High Availability] and ** you do not have a replacement node to use, ** refer to the documentation on removing a failed node in xref:remove-failed-node.adoc[Removal of Failed Nodes]. == Prerequisites * TigerGraph is installed with HostName, refer to xref:installation:bare-metal-install.adoc[HostName Installation]. Additionally, TigerGraph should be installed on a device that can be unmounted from the machine to be replaced and mounted to the new machine. -* This procedure applies to TigerGraph versions 3.10.1 and later. For versions prior to TigerGraph 3.10.1, please refer to link:https://docs.tigergraph.com/tigergraph-server/3.9/cluster-and-ha-management/how_to-replace-a-node-in-a-cluster[node replacement in preceding versions]. +* This procedure applies to TigerGraph versions 3.10.1 and later. * The procedure requires pointing hostname to different IP addresses by changing DNS record. On AWS, this is done by link:https://docs.aws.amazon.com/route53/[Route 53]. For other cloud service providers, it is doable as long as they offer similar DNS web services. == Procedure diff --git a/modules/cluster-and-ha-management/pages/set-up-crr.adoc b/modules/cluster-and-ha-management/pages/set-up-crr.adoc index 1d69946f..209e51b6 100644 --- a/modules/cluster-and-ha-management/pages/set-up-crr.adoc +++ b/modules/cluster-and-ha-management/pages/set-up-crr.adoc @@ -1,5 +1,5 @@ = Set Up Cross-Region Replication -//:page-aliases: tigergraph-server:crr:set-up-crr.adoc +//:page-aliases: crr:set-up-crr.adoc :description: Instruction on how to set up the DR cluster for cross-region replication. :sectnums: @@ -9,11 +9,14 @@ Changes on the primary cluster are copied over to the DR cluster. When necessary, you can fail over to a DR cluster, making it the new primary cluster. -== Before you begin +[#before_you_begin] +== Prerequsites -* Install TigerGraph 3.10.0 or higher on both the primary cluster and the DR cluster *in the same version*. -* Make sure that your DR cluster has the same number of partitions as the primary cluster. -* Make sure the username and password of the TigerGraph database user created on the DR cluster during installation matches one of the users on the primary cluster who have the `superuser` role. +* The DR cluster and the primary cluster are running the *same version of TigerGraph (3.10 or higher)*. +* The DR cluster and the primary cluster have the *same partitioning factor*. +* You have *access to the TigerGraph Linux user account on the DR cluster*. +** The username and password of the TigerGraph database user creates on the DR cluster during installation *matches one of the superusers on the primary cluster*. +* Initializing the DR cluster requires running a restore operation from a recent backup of the primary cluster; therefore, *xref:backup-and-restore:configurations.adoc[backup and restore must be configured] on the DR cluster*. * If you choose to enable CRR and your DR cluster is in a different Virtual Private Cloud (VPC) than your primary cluster, make sure that TigerGraph is installed on your cluster with public IPs: ** If you xref:installation:bare-metal-install.adoc#_interactive_installation[install interactively], make sure that you supply the public IP of all nodes. ** If you xref:installation:bare-metal-install.adoc#_non_interactive_installation[install non-interactively], make sure in the `NodeList` field of `install_conf.json` that you are providing the public IPs for all nodes. @@ -32,12 +35,12 @@ The following setup is needed in order to enable Cross Region Replication. Retrieve the latest backup, for instance, `pr_latest_backup` from the Primary Cluster. How recent the backup is will determine how much the DR cluster lags behind and how long it will take to catch up with the Primary cluster. If there is no suitable backup, use `gadmin backup create ` to create one. -For more details refer to xref:tigergraph-server:backup-and-restore:backup-cluster.adoc[]. +For more details refer to xref:backup-and-restore:backup-cluster.adoc[]. -For how to restore a cluster from a backup that was created from another database cluster (cross cluster), refer to xref:tigergraph-server:backup-and-restore:cross-cluster-backup.adoc[]. +For how to restore a cluster from a backup that was created from another database cluster (cross cluster), refer to xref:backup-and-restore:cross-cluster-backup.adoc[]. For setting up a new DR cluster after failover, refer to -xref:tigergraph-server:cluster-and-ha-management:fail-over.adoc#_set_up_a_new_dr_cluster_after_failover[]. +xref:fail-over.adoc#_set_up_a_new_dr_cluster_after_failover[]. === Enable CRR on the DR cluster @@ -97,7 +100,7 @@ However, note that these operations should be performed when there is no traffic gadmin backup restore --dr ---- -Specifically, when switching between PR and DR, before the switch, it is necessary to xref:tigergraph-server:crr:troubleshooting.adoc#_check_data_consistency_between_primary_and_dr[check data consistency between primary and DR] +Specifically, when switching between PR and DR, before the switch, it is necessary to xref:troubleshooting.adoc#_check_data_consistency_between_primary_and_dr[check data consistency between primary and DR] [NOTE] ==== @@ -189,23 +192,33 @@ tasks.max=4 Do not change the values of `name`, `topics`, as this will cause the CRR to work abnormally. ==== -== Updating a CRR system +== Upgrading a CRR system From time to time, you may want to update the TigerGraph software on a CRR system. To perform this correctly, follow this sequence of steps. -1. Stop CRR on your DR cluster. +. Stop CRR on your DR cluster. + -[source.wrap,console] +[source,console] ---- -$ gadmin crr stop -y +gadmin crr stop -y ---- + +. Disable CRR on your DR cluster. + -2. xref:tigergraph-server:installation:upgrade.adoc[Upgrade] both the primary cluster and DR cluster. +[source,console] +---- +gadmin config set System.CrossRegionReplication.Enabled false +gadmin config apply -y +gadmin restart all -y +---- + +. xref:installation:upgrade.adoc[Upgrade] the cluster. -3. Start CRR on the DR cluster(From TigerGraph 3.10.0, no additional restart is required to start CRR). +. Start CRR on the new DR cluster. + -[source.wrap,console] +[source,console] ---- -$ gadmin crr start +gadmin crr start ---- + diff --git a/modules/cluster-and-ha-management/pages/shrink-a-cluster.adoc b/modules/cluster-and-ha-management/pages/shrink-a-cluster.adoc index 119b6b4c..91aec65f 100644 --- a/modules/cluster-and-ha-management/pages/shrink-a-cluster.adoc +++ b/modules/cluster-and-ha-management/pages/shrink-a-cluster.adoc @@ -1,5 +1,5 @@ = Cluster Shrinking -//:page-aliases: tigergraph-server:cluster-resizing:shrink-a-cluster.adoc +//:page-aliases: cluster-resizing:shrink-a-cluster.adoc :sectnums: Shrinking a cluster removes nodes from the cluster. The data stored on those nodes will be redistributed to the remaining nodes. @@ -62,7 +62,7 @@ Extra disk space is required during cluster shrinking. If more space is not avai [source,bash] ---- -$ gadmin cluster shrink m3:192.168.1.3,m4:192.168.1.4 --stagingPath /tmp/ +$ gadmin cluster shrink m3:192.168.1.3,m4:192.168.1.4 --staging-path /tmp/ ---- If you choose to supply a staging location, make sure that the TigerGraph Linux user has write permission to the path you provide. The overall amount of space required for cluster shrinking on each node is `(1 + ceiling(oldPartition/newPartition) ) * dataRootSize`. `oldPartition` and `newPartition` stand for the partitioning factors of the cluster before and after shrinking, respectively; `dataRootSize` stands for the size of the data root folder on the node. diff --git a/modules/cluster-and-ha-management/pages/troubleshooting.adoc b/modules/cluster-and-ha-management/pages/troubleshooting.adoc index e0edb5af..8d9499d0 100644 --- a/modules/cluster-and-ha-management/pages/troubleshooting.adoc +++ b/modules/cluster-and-ha-management/pages/troubleshooting.adoc @@ -1,5 +1,5 @@ = Troubleshooting for Cross-Region Replication -//:page-aliases: tigergraph-server:crr:troubleshooting.adoc +//:page-aliases: crr:troubleshooting.adoc :sectnums: diff --git a/modules/data-loading/examples/config-avro b/modules/data-loading/examples/config-avro index 36f7a403..2050789a 100644 --- a/modules/data-loading/examples/config-avro +++ b/modules/data-loading/examples/config-avro @@ -1,8 +1,8 @@ connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector source.cluster.alias=hello target.cluster.alias=world -source.cluster.bootstrap.servers=source.kafka.server:9092 -target.cluster.bootstrap.servers=localhost:30002 +source.cluster.bootstrap.servers= +target.cluster.bootstrap.servers= source->target.enabled=true topics=avro-without-registry-topic replication.factor=1 @@ -18,41 +18,10 @@ emit.heartbeats.interval.seconds=5 world.scheduled.rebalance.max.delay.ms=35000 key.converter=org.apache.kafka.connect.converters.ByteArrayConverter header.converter=org.apache.kafka.connect.converters.ByteArrayConverter -value.converter=com.tigergraph.kafka.connect.converters.TigerGraphAvroConverterWithoutSchemaRegistry - -producer.security.protocol=SASL_SSL -producer.sasl.mechanism=GSSAPI -producer.sasl.kerberos.service.name=kafka -producer.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/path/to/kafka-producer.keytab\" principal=\"kafka-producer@TIGERGRAPH.COM\"; -producer.ssl.endpoint.identification.algorithm= -producer.ssl.keystore.location=/path/to/client.keystore.jks -producer.ssl.keystore.password=****** -producer.ssl.key.password=****** -producer.ssl.truststore.location=/path/to/client.truststore.jks -producer.ssl.truststore.password=****** - -consumer.security.protocol=SASL_SSL -consumer.sasl.mechanism=GSSAPI -consumer.sasl.kerberos.service.name=kafka -consumer.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/path/to/kafka-consumer.keytab\" principal=\"kafka-consumer@TIGERGRAPH.COM\"; -consumer.ssl.endpoint.identification.algorithm= -consumer.ssl.keystore.location=/path/to/client.keystore.jks -consumer.ssl.keystore.password=****** -consumer.ssl.key.password=****** -consumer.ssl.truststore.location=/path/to/client.truststore.jks -consumer.ssl.truststore.password=****** - -source.admin.security.protocol=SASL_SSL -source.admin.sasl.mechanism=GSSAPI -source.admin.sasl.kerberos.service.name=kafka -source.admin.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/path/to/kafka-admin.keytab\" principal=\"kafka-admin@TIGERGRAPH.COM\"; -source.admin.ssl.endpoint.identification.algorithm= -source.admin.ssl.keystore.location=/path/to/client.keystore.jks -source.admin.ssl.keystore.password=****** -source.admin.ssl.key.password=****** -source.admin.ssl.truststore.location=/path/to/client.truststore.jks -source.admin.ssl.truststore.password=****** +transforms=TigerGraphAvroTransform +transforms.TigerGraphAvroTransform.type=com.tigergraph.kafka.connect.transformations.TigergraphAvroWithoutSchemaRegistryTransformation +transforms.TigerGraphAvroTransform.errors.tolerance=none [connector_1] name=avro-test-without-registry -tasks.max=10 +tasks.max=10 \ No newline at end of file diff --git a/modules/data-loading/images/loading-arch_3.11-rev2.png b/modules/data-loading/images/loading-arch_3.11-rev2.png new file mode 100644 index 00000000..39cdf380 Binary files /dev/null and b/modules/data-loading/images/loading-arch_3.11-rev2.png differ diff --git a/modules/data-loading/nav.adoc b/modules/data-loading/nav.adoc index 35617aaa..196e13e3 100644 --- a/modules/data-loading/nav.adoc +++ b/modules/data-loading/nav.adoc @@ -2,22 +2,12 @@ ** xref:data-loading-overview.adoc[Overview] ** xref:data-loading:externalizing-kafka-configs.adoc[Externalize Kafka Configs] ** xref:load-local-files.adoc[] -//** xref:data-streaming-connector/index.adoc[Data Streaming Connector] ** xref:load-from-cloud.adoc[Load from Cloud Storage] ** xref:load-from-warehouse.adoc[Load from Data Warehouse] ** xref:load-from-kafka.adoc[Load from External Kafka] *** xref:data-loading:avro-validation-with-kafka.adoc[Avro Data Validation through KafkaConnect] -*** xref:data-loading:kafka-ssl-security-guide.adoc[] ** xref:load-from-spark-dataframe.adoc[] *** xref:spark-connection-via-jdbc-driver.adoc[] ** xref:manage-data-source.adoc[Manage Data Sources] ** xref:data-loading-v2.adoc[Data Loading V2] -//** xref:kafka-loader/index.adoc[] -//*** xref:kafka-loader/load-data.txt[] -//*** xref:kafka-loader/manage-data-source.adoc[] -//*** xref:kafka-loader/manage-loading-jobs.adoc[] -//*** xref:kafka-loader/kafka-ssl-sasl.adoc[] ** xref:data-streaming-connector/kafka.adoc[Stream from External Kafka (Deprecated)] - - - diff --git a/modules/data-loading/pages/avro-validation-with-kafka.adoc b/modules/data-loading/pages/avro-validation-with-kafka.adoc index 3a9853bd..46fcdec0 100644 --- a/modules/data-loading/pages/avro-validation-with-kafka.adoc +++ b/modules/data-loading/pages/avro-validation-with-kafka.adoc @@ -48,27 +48,27 @@ The Avro message will contain a `scheme ID`, the Transformation (or previous con == How to Enable Avro Data Validation -To enable Avro data validation, configure Kafka Connector with `ErrorTolerance` and the new transformation. +To enable Avro data validation, configure the Kafka data source with `ErrorTolerance` and the transformation settings. -Below are the additional settings to be added to the connector config: +Below is an example configuration to add to the data source: -[source, gsql] +[source,json] ---- -connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector -... -key.converter=org.apache.kafka.connect.converters.ByteArrayConverter -header.converter=org.apache.kafka.connect.storage.StringConverter -value.converter=org.apache.kafka.connect.converters.ByteArrayConverter -transforms=TigerGraphAvroTransform -transforms.TigerGraphAvroTransform.type=com.tigergraph.kafka.connect.transformations.TigergraphAvroWithSchemaRegistryTransformation -transforms.TigerGraphAvroTransform.schema.registry.url= -transforms.TigerGraphAvroTransform.errors.tolerance=all +{ + "type": "kafka", + ... + "transforms": "TigerGraphAvroTransform", + "transforms.TigerGraphAvroTransform.type": "com.tigergraph.kafka.connect.transformations.TigergraphAvroWithSchemaRegistryTransformation", + "transforms.TigerGraphAvroTransform.schema.registry.url": "{schema-registry-url}", + "transforms.TigerGraphAvroTransform.schema.registry.basic.auth.credentials.source": "USER_INFO", + "transforms.TigerGraphAvroTransform.schema.registry.basic.auth.user.info": "{username}:{password}", + "transforms.TigerGraphAvroTransform.errors.tolerance": "all" +} ---- [NOTE] ==== -* In the above settings, `MirrorSourceConnector` is the connector. -* The transformation type is set to `TigergraphAvroWithSchemaRegistryTransformation,` but alternatively, it could be set to `TigergraphAvroWithoutSchemaRegistryTransformation.` +To enable Avro transformation with data validation, remove any configurations that start with `value.converter`. ==== === Old Schema vs. New Schema: @@ -119,4 +119,4 @@ curl -s http://$(gmyip):$(gadmin config get KafkaStreamLL.Port)/log-aggregation/ If data malformation exists, the result will display errors aiding users in identifying problematic files or records, including: * `errorCode: 55` -* Other error messages containing error stacks. \ No newline at end of file +* Other error messages containing error stacks. diff --git a/modules/data-loading/pages/data-loading-overview.adoc b/modules/data-loading/pages/data-loading-overview.adoc index 12b277c7..32a4fcd0 100644 --- a/modules/data-loading/pages/data-loading-overview.adoc +++ b/modules/data-loading/pages/data-loading-overview.adoc @@ -3,11 +3,16 @@ :description: Overview of available loading methods and supported features. :page-aliases: data-loading:kafka-loader/index.adoc -//data-loading:data-streaming-connector/index.adoc, \ -//data-loading:kafka-loader:index.adoc, \ -//data-loading:data-streaming-connector:index.adoc -Once you have xref:gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[defined a graph schema], you can load data into the graph. This section focuses on how to configure TigerGraph for the different data sources, as well as different data formats and transport schemes. +Once you have xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[defined a graph schema], you can load data into the graph. This section focuses on how to configure TigerGraph for the different data sources, as well as different data formats and transport schemes. + +== Loading System Architecture + +This diagram shows the supported data sources, which connector to use, and which TigerGraph component manages the data loading. + +.TigerGraph Data Loading Options +image::data-loading:loading-arch_3.11-rev2.png[Architectural diagram showing supported data sources, which connector to use, and which TigerGraph component manages the data loading] +// source file: https://graphsql.atlassian.net/wiki/..../Data+Loading+Architecture+with+New+Spark+Connector == Data Sources @@ -26,7 +31,7 @@ You can use this approach for the following data sources: + See the pages for the specific method that fits your data source. -* *Spark*: The TigerGraph xref:tigergraph-server:data-loading:load-from-spark-dataframe.adoc[Spark Connector] is used with Apache Spark to read data from a Spark DataFrame (or Data Lake) and write to TigerGraph. +* *Spark*: The TigerGraph xref:load-from-spark-dataframe.adoc[Spark Connector] is used with Apache Spark to read data from a Spark DataFrame (or Data Lake) and write to TigerGraph. Users can leverage it to connect TigerGraph to the Spark ecosystem and load data from any Spark data sources ** *Spark/JDBC (Deprecated)*: To load data from other big data platforms, such as Hadoop, the typical method is to use Spark's built-in feature to write a DataFrame to a JDBC target, together with TigerGraph's `POST /ddl` REST endpoint. @@ -38,7 +43,7 @@ TigerGraph uses the same workflow for both local file and Kafka Connect loading: . *Specify a graph*. Data is always loading to exactly one graph (though that graph could have global vertices and edges which are shared with other graphs). For example: + -[source,php] +[source,gsql] USE GRAPH ldbc_snb . If you are using Kafka Connect, *define a `DATA_SOURCE` object*. @@ -47,25 +52,16 @@ xref:load-from-cloud.adoc[cloud storage], xref:load-from-warehouse.adoc#_bigquery[BigQuery], xref:load-from-warehouse.adoc#_snowflake[Snowflake], xref:load-from-warehouse.adoc#_postgresql[PostgreSQL], -xref:tigergraph-server:data-loading:load-from-kafka.adoc#_configure_the_kafka_source[Kafka] -or xref:data-streaming-connector/kafka.adoc[]. +or xref:load-from-kafka.adoc#_configure_the_kafka_source[Kafka]. . *Create a xref:#_loading_jobs[loading job]*. . *Run your loading job*. -== Loading System Architecture - -This diagram shows the supported data sources, which connector to use, and which TigerGraph component manages the data loading. - -.TigerGraph Data Loading Options -image::data-loading:loading_arch_3.11.png[Architectural diagram showing supported data sources, which connector to use, and which TigerGraph component manages the data loading] -// source file: https://graphsql.atlassian.net/wiki/..../Data+Loading+Architecture+with+New+Spark+Connector - == Loading Jobs A loading job tells the database how to construct vertices and edges from data sources. -[source,php] +[source,gsql] .CREATE LOADING JOB syntax ---- CREATE LOADING JOB FOR GRAPH { @@ -85,29 +81,4 @@ These can refer to actual files or be placeholder names. The actual data sources . LOAD statements specify how to take the data fields from files to construct vertices or edges. -NOTE: Refer to the xref:gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[Creating a Loading Job] documentation for full details - -//// -OLD CONTENT -== Set up a data source for a data streaming loading job - -GSQL uses a user-provided configuration file to automatically set up a streaming data connection and a loading job for data in these external cloud data hosts: - -* Google Cloud Storage (GCS) -* AWS S3 -* Azure Blob Storage (ABS) -* Google BigQuery - -Go to the xref:data-streaming-connector/index.adoc[] main page for instructions on setting up the loading job. - -NOTE: The data streaming will stage temporary data files on the database server's disk. -You should have free disk space of at least 2 times the size of your total (uncompressed) input data. - -== Manual connector setup -For data stored in an external Kafka cluster, you need to perform a few more steps to set up data streaming. -Using `gadmin` server commands, you first create a connector to interpret the data source, then define the data source, create the loading job, and run it. - -See the xref:data-streaming-connector/kafka.adoc[Kafka cluster streaming] page for more information. - -This method relies on the xref:kafka-loader/index.adoc[TigerGraph Kafka Loader]. -//// \ No newline at end of file +NOTE: Refer to the xref:{page-component-version}@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[Creating a Loading Job] documentation for full details diff --git a/modules/data-loading/pages/externalizing-kafka-configs.adoc b/modules/data-loading/pages/externalizing-kafka-configs.adoc index a374ae11..6db32338 100644 --- a/modules/data-loading/pages/externalizing-kafka-configs.adoc +++ b/modules/data-loading/pages/externalizing-kafka-configs.adoc @@ -1,6 +1,6 @@ = Externalizing Kafka Configurations -Users can utilize external sources, including files, vault, and environment variables, to provide configurations for Kafka connectors when xref:tigergraph-server:data-loading:load-from-kafka.adoc[Loading data from Kafka]. +Users can utilize external sources, including files, vault, and environment variables, to provide configurations for Kafka connectors when xref:load-from-kafka.adoc[Loading data from Kafka]. Sensitive information such as credentials and security setups are kept secure. For more information on Kafka security see https://docs.confluent.io/platform/current/connect/security.html#externalize-secrets[Kafka Connect Security Basics]. diff --git a/modules/data-loading/pages/index.adoc b/modules/data-loading/pages/index.adoc index 179b4e37..b34516e2 100644 --- a/modules/data-loading/pages/index.adoc +++ b/modules/data-loading/pages/index.adoc @@ -21,11 +21,11 @@ Instructions for loading files stored in third-party cloud storage == xref:load-from-warehouse.adoc[Load Data from a Data Warehouse] Instructions for loading query results from a data warehouse -(xref:load-from-warehouse.adoc#_bigquery[BigQuery], xref:load-from-warehouse.adoc#_snowflake[Snowflake], and xref:tigergraph-server:data-loading:load-from-warehouse.adoc#_postgresql[PostgreSql]). +(xref:load-from-warehouse.adoc#_bigquery[BigQuery], xref:load-from-warehouse.adoc#_snowflake[Snowflake], and xref:load-from-warehouse.adoc#_postgresql[PostgreSql]). == xref:load-from-kafka.adoc[Load Data from an External Kafka Cluster, in v3.9.3+] Instructions for loading records from Kafka topics including CSV, JSON and Avro formats. -With additional instructions on xref:avro-validation-with-kafka.adoc[Avro Data Validation through KafkaConnect] and how to xref:tigergraph-server:data-loading:kafka-ssl-security-guide.adoc[Set up SSL on Kafka] or xref:tigergraph-server:data-loading:externalizing-kafka-configs.adoc[Externalize Kafka Configs]. +With additional instructions on xref:avro-validation-with-kafka.adoc[Avro Data Validation through KafkaConnect] and how to xref:externalizing-kafka-configs.adoc[Externalize Kafka Configs]. * For versions earlier than 3.9.3, see xref:data-streaming-connector/kafka.adoc[Stream from an External Kafka Cluster (deprecated)]. @@ -35,3 +35,7 @@ Instructions for TigerGraph's dedicated connector used to read data from a Spark * xref:spark-connection-via-jdbc-driver.adoc[] is now deprecated and will no longer be supported. +== Management and Configuration +=== xref:manage-data-source.adoc[] +=== xref:externalizing-kafka-configs.adoc[] +=== xref:loading-concurrency.adoc[] diff --git a/modules/data-loading/pages/kafka-ssl-security-guide.adoc b/modules/data-loading/pages/kafka-ssl-security-guide.adoc deleted file mode 100644 index eefe3e34..00000000 --- a/modules/data-loading/pages/kafka-ssl-security-guide.adoc +++ /dev/null @@ -1,277 +0,0 @@ -= Kafka SSL Security Guide - -== Introduction -Connections to Kafka brokers can be secured by SSL. -All the connections from Kafka clients to Kafka brokers can be secured, including these scenarios: - -* Loading data via `fileLoader` to Kafka before it is loaded into TigerGraph. -* Loading data via KafkaLoader including streaming job to Kafka. -* MirrorMaker2 (MM2) to load data from external Kafka to (internal) Kafka Brokers. -* Cross-Region Replication (CRR), which is a special case of MM2. -* KafkaStrm-LL connect to Kafka Brokers. -* Connection from TigerGraph engine to Kafka broker is also secured. - - - -== How to Enable SSL for Kafka - -Firstly, users need to request or generate certificates before enabling SSL for Kafka. -Refer to xref:tigergraph-server:security:encrypting-connections.adoc#_option_2_create_a_self_signed_certificate[self-signed certificates] for instructions on how to generate self-signed certificate. - -=== X509 Certificate Formats Supported - -The only format of certificate supported is `PEM`. -Other formats like `DER` should be firstly converted to `PEM` to enable Kafka SSL. - -See xref:tigergraph-server:security:index.adoc[] for more information on Security in TigerGraph. - -=== Only `PKCS_12 Store` Type is Supported - -X509 certificates are stored in key/trust stores. - -.Only store type `PKCS_12(P12)`: -[console] ----- -Path to key store: /configs/kafka/conf/credential/key_store.p12 -Path to key store: /configs/kafka/conf/credentials/trust_store.p12 ----- - -[NOTE] -==== -JKS is current NOT yet supported. -==== - -== Kafka SSL Settings - -A few configuration settings are introduced to enable and manager Kafka security (SSL): - -[cols="3", separator=¦ ] -|=== -¦ Setting ¦ Description ¦ Default Value - -¦ Kafka.Security.ClientConf.ProtocolForAllClients -a¦ If specified, all clients must use the specified protocol. - -Legal values include: - -* -* ssl -* plaintext - -If it's not specified, clients can choose a preferred protocol. -¦ `` - -¦ Kafka.Security.ClientConf.InterBrokerProtocol -a¦ It is the protocol for inter-broker communication. - -The value can be: - -* -* ssl -* plaintext - -It can be overridden by `Kafka.Security.ClientConf.ProtocolForAllClients`. -¦ `` - -¦ Kafka.Security.ClientConf.InfraProtocol -a¦ It is the protocol for infra-kafka communication. - -The value can be: - -* -* plaintext -* ssl - -It can be overridden by `Kafka.Security.ClientConf.ProtocolForAllClients`. -¦ `` - -¦ Kafka.Security.ClientConf.EngineProtocol -a¦ It is the protocol for engine-kafka communication. - -The value can be: - -* -* plaintext -* ssl - -It can be overridden by `Kafka.Security.ClientConf.ProtocolForAllClients`. -¦ `` - -¦ Kafka.Security.SSL.Enable -¦ Enable Kafka TLS encryption. - -Can either be `true` or `false` -¦ `false ` - -¦ Kafka.Security.SSL.Port -¦ Kafka SSL listening port. -¦ `30001` - -¦ Kafka.Security.SSL.Certificate -¦ Kafka broker certificate in PEM format. - -Usage: `@file/path/to/certificate` -¦ `"@/home/graphsql/tmp/certificates/xyz.pem"` - - -¦ Kafka.Security.SSL.PrivateKey -¦ Kafka broker private key in PEM format. - -Usage: "@file/path/to/certificate" -¦ `"@/home/graphsql/tmp/certificates/xyz.pem"` - -¦ Kafka.Security.SSL.Passphrase -¦ Passphrase for SSL private key, trust store and key store. -Should not be empty when SSL is enabled for Kafka -¦ `` -|=== - -=== Client Settings Precedence - -* SSL.Enable < ClientConf.InterBrokerProtocol < ClientConf.ProtocolForAllClients -* SSL.Enable < ClientConf.InfraProtocol < ClientConf.ProtocolForAllClients -* SSL.Enable < ClientConf.EngineProtocol < ClientConf.ProtocolForAllClients - -== Instructions - -=== Prerequisites -As mentioned above, users need to generate certificates in PEM format. - -Basically two certificates (or a certificate chain) need to be generated: - -* Public certificate(chain), which includes: -** `Root-CA-Cert` -** (Optional) `intermediate-CA-Cert` -** `Leaf-Cert` (Machine Public Certificate) -* Private Key of machine (leaf private key). - -=== Basic Instructions on Enabling SSL for Kafka - -Please run the `gadmin` commands below to enable SSL for Kafka: - -[console] ----- -gadmin config set Kafka.Security.SSL.Passphrase - -gadmin config set Kafka.Security.SSL.Enable true - -#NOTE: this chain includes: leaf public cert ← (optional) intermediate-CA-cert ← CA-Root cert -gadmin config set Kafka.Security.SSL.Certificate <@path_to_public_certificate_chain> - -gadmin config set Kafka.Security.SSL.PrivateKey <@path_to_private_key> - -gadmin config apply -y -gadmin restart all -y ----- - -=== Instructions on Enabling SSL for MirrorMaker2 -Settings below need be added to connector configuration: - -[NOTE] -==== -See xref:tigergraph-server:data-loading:data-streaming-connector/kafka.adoc#_basic_configurations[Basic Configurations] for more infomration on connection configurations. -==== - -* `source.cluster.bootstrap.servers=` -* `target.cluster.bootstrap.servers=` -* `source.cluster.security.protocol=SSL` -* `target.cluster.security.protocol=SSL` - -.A full connector configuration example, with schema registry: -[console] ----- -connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector - -source.cluster.alias=Primary - -target.cluster.alias=Secondary - -source.cluster.bootstrap.servers=195.0.0.1:30001 - -target.cluster.bootstrap.servers=127.0.0.1:30001 - -source.cluster.security.protocol=SSL - -source->target.enabled=true - -topics=${topic_avro_with_registry} - -replication.factor=1 - -sync.topic.acls.enabled=false - -checkpoints.topic.replication.factor=1 - -heartbeats.topic.replication.factor=1 - -offset-syncs.topic.replication.factor=1 - -offset.storage.replication.factor=1 - -status.storage.replication.factor=1 - -config.storage.replication.factor=1 - -emit.heartbeats.interval.seconds=5 - -secondary.scheduled.rebalance.max.delay.ms=35000 - -key.converter=org.apache.kafka.connect.converters.ByteArrayConverter - -header.converter=org.apache.kafka.connect.converters.ByteArrayConverter - -value.converter=com.tigergraph.kafka.connect.converters.TigerGraphAvroConverter - -value.converter.schema.registry.url=http://127.0.0.1:8081 - -[connector_mm] - -name=connector_name_with_schema_registry - -tasks.max=10 ----- - -=== Instructions on Enabling SSL for Cross-Region Replication - -[console] ----- -gadmin config set System.CrossRegionReplication.PrimaryKafkaIPs - -#Default port number is: 30001 -gadmin config set System.CrossRegionReplication.PrimaryKafkaPort - -gadmin init kafka -y - -gadmin backup restore --dr -y ----- - -==== Optional Instructions -Users can use still enable/disable some or all the clients connected to Kafka brokers using these configuration settings: - -[console] ----- -Kafka.Security.ClientConf.InterBrokerProtocol -Kafka.Security.ClientConf.InfraProtocol -Kafka.Security.ClientConf.EngineProtocol ----- - -Precedence of these settings are described in the xref:_client_settings_precedence[] section. - -== How to Renew Certificates -Open sourced public tool KeyTool can be used to manage the key/trust store with store type `PCKS_12(P12)`. - -Usually, `CA Root` certs have much longer expiry than `leaf certs`. -They are not going to be expired in a few years or even 10+ years, but users can still renew it if they want to. - -Here is the instructions users can follow to renew certificates: - -. (Optional) Insert a new `CA Root public cert` using KeyTool to the truststore under the path mentioned above. -This needs to be done on all the nodes before next steps; -.. Insert a new private/public key pair of leaf (machine) certificates into the `keystore.p12`. -.. (Optional) Users can still delete the old certificate from the `keystore.p12`. -.. Restart services including: -... Kafka -... KafkaStrm-LL -... KafkaConnect -... GPE -... GSE \ No newline at end of file diff --git a/modules/data-loading/pages/load-from-kafka.adoc b/modules/data-loading/pages/load-from-kafka.adoc index c57a2d6f..73978003 100644 --- a/modules/data-loading/pages/load-from-kafka.adoc +++ b/modules/data-loading/pages/load-from-kafka.adoc @@ -33,8 +33,8 @@ include::partial$kafka/kafka-specify-mapping-details.adoc[] === Avro Data Validation -In certain scenarios, users could load data in Avro format to TigerGraph DB, via an external Kafka connector, such as MirrorMakerConnector and experience malformed data errors during this process. -See our documentation on xref:tigergraph-server:data-loading:avro-validation-with-kafka.adoc[] for help. +Users can load data in Avro format to TigerGraph DB via an external Kafka connector, such as MirrorMakerConnector, and may encounter malformed data errors during this process. +See our documentation on xref:avro-validation-with-kafka.adoc[] for help. include::partial$load-part4-run-job.adoc[] @@ -45,8 +45,8 @@ include::partial$load-part5-monitor-and-manage.adoc[] == Kafka Loader Auto-Restart -See xref:tigergraph-server:cluster-and-ha-management:ha-overview.adoc#_support_file_and_kafka_loader_by_auto_restart[High Availability (HA) Overview]. +See xref:cluster-and-ha-management:ha-overview.adoc#_file_and_kafka_loaders_ha_with_auto_restart[High Availability (HA) Overview]. include::partial$load-part6-known-issues.adoc[] -// Custom known issues for \ No newline at end of file +// Custom known issues for diff --git a/modules/data-loading/pages/load-from-spark-dataframe.adoc b/modules/data-loading/pages/load-from-spark-dataframe.adoc index ef83a25d..72581129 100644 --- a/modules/data-loading/pages/load-from-spark-dataframe.adoc +++ b/modules/data-loading/pages/load-from-spark-dataframe.adoc @@ -10,14 +10,14 @@ Users can leverage it to connect TigerGraph to the Spark ecosystem and load data * Distributed file system: HDFS, S3, GCS and ABS * Streaming source: Kafka * Data warehouse: -xref:tigergraph-server:data-loading:load-from-warehouse.adoc#_bigquery[BigQuery], -xref:tigergraph-server:data-loading:load-from-warehouse.adoc#_snowflake[Snowflake], -xref:tigergraph-server:data-loading:load-from-warehouse.adoc#_postgresql[PostgreSql], +xref:load-from-warehouse.adoc#_bigquery[BigQuery], +xref:load-from-warehouse.adoc#_snowflake[Snowflake], +xref:load-from-warehouse.adoc#_postgresql[PostgreSql], and Redshift * Open table format: -xref:tigergraph-server:data-loading:load-from-spark-dataframe.adoc#_load_data_from_delta_lake[Delta Lake], -xref:tigergraph-server:data-loading:load-from-spark-dataframe.adoc#_load_data_from_iceberg[Iceberg] -and xref:tigergraph-server:data-loading:load-from-spark-dataframe.adoc#_load_data_from_hudi[Hudi] +xref:load-from-spark-dataframe.adoc#_load_data_from_delta_lake[Delta Lake], +xref:load-from-spark-dataframe.adoc#_load_data_from_iceberg[Iceberg] +and xref:load-from-spark-dataframe.adoc#_load_data_from_hudi[Hudi] include::partial$spark/jdbc-deprecation.adoc[] @@ -49,7 +49,7 @@ CREATE LOADING JOB load_Comment FOR GRAPH demo_graph { == Overview The first step is the read data into a Spark dataframe. -Then, using a TigerGraph loading job whieh maps data fields from the dataframe into graph elements, the connector pulls data from Spark into TigerGraph. +Then, using a TigerGraph loading job which maps data fields from the dataframe into graph elements, the connector pulls data from Spark into TigerGraph. .Spark dataframe: [console] @@ -357,7 +357,7 @@ df.writeStream .awaitTermination() ---- -For more details on Iceberg see https://iceberg.apache.org/docs/1.3.1/getting-started/[Iceberg Apache: Getting Started] +For more details on Iceberg see https://iceberg.apache.org/docs/latest/spark-getting-started/[Iceberg Apache: Getting Started] === Load Data from Hudi ==== Batch Write @@ -507,4 +507,5 @@ There are 3 levels of stats: ¦ `invalidAttribute` ¦ Number of token lists where at least one of the attribute tokens is invalid. ¦ `incorrectFixedBinaryLength` ¦ Number of token lists where at least one of the tokens corresponding to a UDT type attribute is invalid. ¦ `invalidVertexType` ¦ Number of token lists where at least one of the tokens corresponding to an edge type's source/target vertex type is invalid. -|=== \ No newline at end of file +¦ `tokenBankException` ¦ Counts how many times a user-defined token function has thrown an exception during data loading. +|=== diff --git a/modules/data-loading/pages/load-local-files.adoc b/modules/data-loading/pages/load-local-files.adoc index cec333fd..d9d6ef11 100644 --- a/modules/data-loading/pages/load-local-files.adoc +++ b/modules/data-loading/pages/load-local-files.adoc @@ -32,7 +32,7 @@ include::partial$load-part5-monitor-and-manage.adoc[] == Files Loader Auto-Restart -See xref:tigergraph-server:cluster-and-ha-management:ha-overview.adoc#_support_file_and_kafka_loader_by_auto_restart[High Availability (HA) Overview]. +See xref:cluster-and-ha-management:ha-overview.adoc#_file_and_kafka_loaders_ha_with_auto_restart[High Availability (HA) Overview]. include::partial$load-part6-known-issues.adoc[] diff --git a/modules/data-loading/pages/loading-concurrency.adoc b/modules/data-loading/pages/loading-concurrency.adoc index 3d886dc4..6bd68d72 100644 --- a/modules/data-loading/pages/loading-concurrency.adoc +++ b/modules/data-loading/pages/loading-concurrency.adoc @@ -1,6 +1,7 @@ -== Loading Job Concurrency += Loading Job Concurrency +:description: How to configure loading job concurrency and resource allocation -=== Number of concurrent loading jobs +== Number of concurrent loading jobs By default, only one loading job may run at a time. Additional job requests are held in a wait queue. @@ -22,7 +23,7 @@ $ gadmin config set KafkaLoader.ReplicaNumber $ gadmin config apply -y && gadmin restart gse gpe restpp -y ---- -=== Allowed resources for loading +== Allowed resources for loading You can also configure how many resources are permitted to work on loading jobs. diff --git a/modules/data-loading/pages/read-to-spark-dataframe.adoc b/modules/data-loading/pages/read-to-spark-dataframe.adoc index 01472b80..c634a47a 100644 --- a/modules/data-loading/pages/read-to-spark-dataframe.adoc +++ b/modules/data-loading/pages/read-to-spark-dataframe.adoc @@ -353,7 +353,7 @@ df.show() ¦ `query.params` ¦ The query parameters in JSON format. -Please refer to xref:tigergraph-server:API:index.adoc#_formatting_data_in_json[Formatting data in JSON]. +Please refer to xref:API:index.adoc#_formatting_data_in_json[Formatting data in JSON]. |=== Example: run the query `topK` with k=10 diff --git a/modules/data-loading/pages/spark-connection-via-jdbc-driver.adoc b/modules/data-loading/pages/spark-connection-via-jdbc-driver.adoc index c5221bd0..c29dc2c2 100644 --- a/modules/data-loading/pages/spark-connection-via-jdbc-driver.adoc +++ b/modules/data-loading/pages/spark-connection-via-jdbc-driver.adoc @@ -2,7 +2,7 @@ [NOTE] ==== -Please see xref:tigergraph-server:data-loading:load-from-spark-dataframe.adoc[] for information on the Spark connector. +Please see xref:load-from-spark-dataframe.adoc[] for information on the Spark connector. ==== Apache Spark is a popular big data distributed processing system which is frequently used in data management ETL process and Machine Learning applications. @@ -102,7 +102,7 @@ The loading job above, `load_Social` loads the 1st, 2nd, and 3rd columns of sour //http://host:port/restpp/ddl/Social?tag=load_Social&filename=file1 //--data -See the pages xref:gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[], xref:gsql-ref:ddl-and-loading:running-a-loading-job.adoc[] and xref:tigergraph-server:API:built-in-endpoints.adoc#_loading_jobs[Loading Jobs as a REST Endpoint] for more information about loading jobs in TigerGraph. +See the pages xref:{page-component-version}@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[], xref:{page-component-version}@gsql-ref:ddl-and-loading:running-a-loading-job.adoc[] and xref:API:built-in-endpoints.adoc#_loading_jobs[Loading Jobs as a REST Endpoint] for more information about loading jobs in TigerGraph. == Advanced Usages with Spark @@ -187,9 +187,9 @@ To bypass the disk IO limitation, it is better to put the raw data file on a dif | `url` | (none) |The JDBC URL to connect to: `jdbc:tg:http(s)://ip:port`, this port is the one used by GraphStudio.| Yes | `graph` | (none)| The graph name.| Yes | `version` | 3.9.0 |The TigerGraph version. |Yes -| `username` | tigergraph | TigerGraph username. | If xref:tigergraph-server:user-access:enabling-user-authentication.adoc[REST++ authentication] is enabled, a username/password or token is required. -| `password` | tigergraph | TigerGraph password. | If xref:tigergraph-server:user-access:enabling-user-authentication.adoc[REST++ authentication] is enabled, a username/password or token is required. -| `token` | (none) | A token used to authenticate RESTPP requests. Request a token| If xref:tigergraph-server:user-access:enabling-user-authentication.adoc[REST++ authentication] is enabled, a username/password or token is required. +| `username` | tigergraph | TigerGraph username. | If xref:user-access:enabling-user-authentication.adoc[REST++ authentication] is enabled, a username/password or token is required. +| `password` | tigergraph | TigerGraph password. | If xref:user-access:enabling-user-authentication.adoc[REST++ authentication] is enabled, a username/password or token is required. +| `token` | (none) | A token used to authenticate RESTPP requests. Request a token| If xref:user-access:enabling-user-authentication.adoc[REST++ authentication] is enabled, a username/password or token is required. | `jobid` (TG version >= 3.9.0) | (none) | A unique ID for tracing aggregated loading statistics. | No | `max_num_error` (TG version >= 3.9.0) | (none) | The threshold of the error objects count within the `jobid`. The loading job will be aborted when reaching the limit. `jobid` is required. | No | `max_percent_error` (TG version >= 3.9.0) | (none) |The threshold of the error objects percentage within the `jobid`. The loading job will be aborted when reaching the limit. `jobid` is required. | No diff --git a/modules/data-loading/partials/cloud/cloud-data-source-details.adoc b/modules/data-loading/partials/cloud/cloud-data-source-details.adoc index 4f79456c..8146f024 100644 --- a/modules/data-loading/partials/cloud/cloud-data-source-details.adoc +++ b/modules/data-loading/partials/cloud/cloud-data-source-details.adoc @@ -8,21 +8,129 @@ Also, three data object formats are supported: CSV, JSON, and Parquet. === AWS S3 -AWS uses the standard IAM credential provider and uses your access key for authentication. +The connector offers two methods for authentication: + +* The standard IAM credential provider using your access key +* AWS Identity and Access Management (IAM) role associated with the EC2 instance Access keys can be used for an individual user or for an IAM role. -See Using IAM Roles for Amazon EC2 for more information. +For the access key method, include the following fields in the data source connection: + +[source,json] +{ + "type": "s3", + "access.key": "", + "secret.key": "" +} + +For the IAM role method, you must attach an IAM role to all the EC2 instances which constitute your TigerGraph database cluster. +Include the following fields in the data source connection: + +[source,json] +{ + "type": "s3", + "file.reader.settings.fs.s3a.aws.credentials.provider": "com.amazonaws.auth.InstanceProfileCredentialsProvider", + "access.key": "none", + "secret.key": "none" +} + +NOTE: Beginning with v3.10.1, you may omit the `access.key` and `secret.key` fields if you are using the IAM role method. + +For more information, see Amazon documentation on using IAM roles for Amazon EC2. + +[#temporary-credentials] +*Temporary Credentials* + +In addition, AWS temporary credentials are supported. +A temporary credential can be generated from the AWS CLI. For example: +[source,bash] +aws sts assume-role --role-arn arn:aws:iam:::role/ --role-session-name "" + +For more details on temporary credentials, please refer to AWS documentation on https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html[temporary security credentials in IAM]. + +To use the temporary credential, add the below configurations in the data source definition: +[source,json] +{ + "type": "s3", + "file.reader.settings.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider", + "access.key": "The in credential", + "secret.key": "The in credential", + "file.reader.settings.fs.s3a.session.token": "The in credential" +} +Example temporary credential from AWS [source,json] { -"type": "s3", -"access.key": "", -"secret.key": "" + "Credentials": { + "AccessKeyId": "ASIA...X4M3A", + "SecretAccessKey": "/wSVO...WUZYv0", + "SessionToken": "IQoJb3J...O1k=", + "Expiration": "2024-10-10T23:00:00+00:00" + }, + "AssumedRoleUser": { + "AssumedRoleId": "A...1", + "Arn": "arn:aws:sts:::assumed-role//" + } } +==== Using MinIO (S3-Compatible Storage) + +TigerGraph also supports loading data from https://min.io[MinIO], which is fully S3-compatible. +When using MinIO instead of AWS S3, additional configuration is required. + +*Specify the MinIO Endpoint* + +Configure the S3A endpoint to point to your MinIO service: + +[source,json] +---- +"file.reader.settings.fs.s3a.endpoint": "" +---- + +The endpoint should resolve to your MinIO server and include the correct port if it is not using the default. + +*Enable Path-Style Access* + +MinIO typically requires path-style access instead of virtual-hosted–style access. +Enable this option explicitly: + +[source,json] +---- +"file.reader.settings.fs.s3a.path.style.access": "true" +---- + +*Import MinIO TLS Certificates (Non-Public or Custom CA)* + +If your MinIO deployment uses non-public TLS certificates, such as: + +* Self-signed certificates +* Certificates issued by an internal or private Certificate Authority (CA) +* Certificates not included in the default Java trust store + +you must import the MinIO root or CA certificate into TigerGraph’s Java trust store on *all* TigerGraph nodes. + +Run the following command on each node: + +[source,bash] +---- +JAVA_HOME=$(gadmin config get System.AppRoot)/.syspre/usr/lib/jvm/java-openjdk +$JAVA_HOME/bin/keytool -import -trustcacerts -alias certalias \ + -noprompt \ + -file /path/to/minio-root-cert \ + -keystore $JAVA_HOME/lib/security/cacerts \ + -storepass changeit +---- + +[NOTE] +==== +* `changeit` is the default password for the Java `cacerts` keystore. +* Replace `/path/to/minio-root-cert` with the actual path to your MinIO root or CA certificate. +* After importing the certificate, restart the TigerGraph `KAFKACONN` service for the change to take effect. +==== + === Azure Blob Storage -We support two types of authentication: +The connector supports two types of authentication: *Shared key authentication*: @@ -31,22 +139,22 @@ TigerGraph can automatically extract the account name from the file URI, so ther [source,json] { -"type" : "abs", -"account.key" : "" + "type" : "abs", + "account.key" : "" } image::data-loading:azure-storage-account.png[Azure Access Keys tab] -**S**ervice principal authentication*: +*Service principal authentication*: To use service principal authentication, you must first register your TigerGraph instance as an application and grant it access to your storage account. [source,json] { -"type" : "abs", -"client.id" : "", -"client.secret" : "", -"tenant.id" : "" + "type" : "abs", + "client.id" : "", + "client.secret" : "", + "tenant.id" : "" } === Google Cloud Storage @@ -55,11 +163,11 @@ For GCS, the TigerGraph data source configuration object is based on the _GCS se [source,json] { -"type": "gcs", -"project_id": "", -"private_key_id": "", -"private_key": "", -"client_email": "" + "type": "gcs", + "project_id": "", + "private_key_id": "", + "private_key": "", + "client_email": "" } -You can follow Google Cloud's instructions for creating a service account key, and then replace the `"type"` value with `"gcs"`. \ No newline at end of file +You can follow Google Cloud's instructions for creating a service account key, and then replace the `"type"` value with `"gcs"`. diff --git a/modules/data-loading/partials/create-loading-job-kafka.adoc b/modules/data-loading/partials/create-loading-job-kafka.adoc index 3b19cd8d..42db6630 100644 --- a/modules/data-loading/partials/create-loading-job-kafka.adoc +++ b/modules/data-loading/partials/create-loading-job-kafka.adoc @@ -31,6 +31,6 @@ If you want to load from the beginning of a topic, the `start_offset` value shou <4> Replace `` with the partition number if you want to configure. . Create a loading job and map data to graph. See xref:data-loading:load-from-kafka.adoc#_create_a_loading_job[Load from External Kafka] for how to map data from a Kafka data source to the graph. -See xref:gsql-ref:ddl-and-loading:creating-a-loading-job.adoc#_loading_json_data[Loading JSON data] on how to create a loading job for JSON data. +See xref:{page-component-version}@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc#_loading_json_data[Loading JSON data] on how to create a loading job for JSON data. WARNING: Known bug: to use the `-1` value for offset, delete the `start_offset` key instead of setting it to `-1`. \ No newline at end of file diff --git a/modules/data-loading/partials/kafka/kafka-data-source-details.adoc b/modules/data-loading/partials/kafka/kafka-data-source-details.adoc index f4a51d7c..985fecbb 100644 --- a/modules/data-loading/partials/kafka/kafka-data-source-details.adoc +++ b/modules/data-loading/partials/kafka/kafka-data-source-details.adoc @@ -4,7 +4,7 @@ The TigerGraph connector to external Kafka sources makes use of https://cwiki.ap [NOTE] ==== -Users can additionally utilize external sources, including files, Vault, and environment variables, to provide configurations for Kafka connectors. See xref:tigergraph-server:data-loading:externalizing-kafka-configs.adoc[]. +In addition, users can utilize external sources to provide configurations for Kafka connectors. These sources include files, Vault, and environment variables. See xref:data-loading:externalizing-kafka-configs.adoc[]. ==== To configure the data source object, the minimum requirement is the address of the external source Kafka cluster: @@ -13,67 +13,247 @@ To configure the data source object, the minimum requirement is the address of t .Data source configuration for external Kafka ---- { -"type": "mirrormaker", -"source.cluster.bootstrap.servers": "" + "type": "mirrormaker", + "source.cluster.bootstrap.servers": "" } ---- -If the source cluster is configured for SSL or SASL protocols, you need to provide the following SSL/SASL credentials in order to communicate with the source cluster. - -* If the source cluster uses SASL, you need to upload the keytab of each Kerberos principal to every node of your TigerGraph cluster at the same absolute path. -* If the source cluster uses SSL, see our documentation xref:tigergraph-server:data-loading:kafka-ssl-security-guide.adoc[] -* If the source cluster uses SASL *and* SSL, you need to upload the keytab of each Kerberos principal, as well as the key store and truststore to every node of your TigerGraph cluster. -Each file must be at the same absolute path on all nodes. - -The following configurations are required for admin, producer and consumer. To supply the configuration for the corresponding component, replace `` with `source.admin`, `producer`, or `consumer`. -For example, to specify `GSSAPI` as the SASL mechanism for consumer, include `"consumer.sasl.mecahnism": "GSSAPI"` in the data source configuration. +==== Configuration Settings [%header,cols="1,2"] |=== | Field | Description -| .security.protocol +| `.security.protocol` | Protocol used to communicate with brokers. Valid values are: `PLAINTEXT`, `SSL, `SASL_PLAINTEXT`, `SASL_SSL`. The default is `PLAINTEXT`. -| .sasl.mechanism +| `.sasl.mechanism` | SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism. +Third party providers require configuring `.sasl.kerberos.service.name` and `.sasl.client.callback.handler.class` +as well as placing the third party jar under `$(gadmin config get System.Approot)/kafka/libs/`. -| .sasl.kerberos.service.name +| `.sasl.kerberos.service.name` | The Kerberos principal name used by your Kafka brokers. This could be defined in either JAAS configuration or Kafka’s configuration. -| .sasl.jaas.config +| `.sasl.jaas.config` | JAAS login context parameters for SASL connections in the format used by JAAS configuration files. See https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html[JAAS Login Configuration File] for details. -| .ssl.endpoint.identification.algorithm +| `.sasl.client.callback.handler.class` +| Name of java handler class required for third party `.sasl.mechanism` which must be placed under `$(gadmin config get System.Approot)/kafka/libs/`. + +| `.ssl.endpoint.identification.algorithm` | The endpoint identification algorithm used to validate server hostname in the server certificate. Default is `https`. If the value is set to an empty string, this will disable server host name verification. -| .ssl.keystore.location +| `.ssl.keystore.location` | The location of the key store file. -| .ssl.keystore.password +| `.ssl.keystore.password` | The password of the key store file. -| .ssl.key.password +| `.ssl.key.password` | The password of the private key in the key store file or the PEM key specified in `ssl.keystore.key`. -| .ssl.truststore.location +| `.ssl.truststore.location` | The location of the trust store file. -| .ssl.truststore.password +| `.ssl.truststore.password` | The password for the trust store file. |=== +===== Component Prefix +Replace the `` with the appropriate identifier: + +* `[admin | producer | consumer]` +* `[source | target].cluster` +* `[source | target].cluster.[admin | producer | consumer]` + +===== Security Protocols + +If the source cluster is configured for SSL or SASL protocols, you need to provide the following SSL/SASL credentials in order to communicate with the source cluster. + +* If the source cluster uses SASL, you need to upload the keytab of each Kerberos principal to every node of your TigerGraph cluster at the same absolute path. +* If the source cluster uses SASL *and* SSL, you need to upload the keytab of each Kerberos principal, as well as the key store and truststore to every node of your TigerGraph cluster. Each file must be at the same absolute path on all nodes. + +The following configurations are required for admin, producer and consumer. Kafka allows SSL settings overriding, respecting security settings with the following precedence: + +* `generic.ssl.setting` < `source/target.cluster.ssl.setting` < `admin/producer/consumer.ssl.setting`. + +If both source and target clusters are sharing the same SSL settings, user can set generic settings for both source/target clusters and all the rols(admin/producer/consumer). + +For example, user can set `ssl.keystore.location="/path/to/key/store"` instead of: + +* `source.cluster.ssl.keystore.location="/path/to/key/store"` +* `admin.ssl.keystore.location="/path/to/key/store"` +* `source.cluster.admin.ssl.keystore.location="/path/to/key/store"`. + +If source and target clusters have different SSL settings, it is possible to set cluster wide SSL configs. + +For example, user can set: `target.cluster.ssl.truststore.password="/password/for/trust/store"` instead of: + +* `target.cluster.producer.ssl.trust.password="/password/for/trust/store"`. + +Note: SSL is now well supported by TigerGraph, we recommend users to set up regular SSL rather than SASL + PlainText/SSL. + +==== Supported Configuration Examples +===== PLAINTEXT +[source,json,linenum] +---- +{ + "type": "mirrormaker", + "source.cluster.bootstrap.servers": ":" +} +---- + +===== SSL +Need to configure: + +* `.security.protocol` +* `.ssl.` + +[source,json,linenum] +---- +{ + "type": "mirrormaker", + "source.cluster.bootstrap.servers": ":", + + "consumer.security.protocol": "SSL", + "consumer.ssl.endpoint.identification.algorithm": "none", + "consumer.ssl.keystore.location": "/path/to/client.keystore.jks", + "consumer.ssl.keystore.password": "******", + "consumer.ssl.key.password": "******", + "consumer.ssl.truststore.location": "/path/to/client.truststore.jks", + "consumer.ssl.truststore.password": "******", + + "source.admin.security.protocol": "SSL", + "source.admin.ssl.endpoint.identification.algorithm": "none", + "source.admin.ssl.keystore.location": "/path/to/client.keystore.jks", + "source.admin.ssl.keystore.password": "******", + "source.admin.ssl.key.password": "******", + "source.admin.ssl.truststore.location": "/path/to/client.truststore.jks", + "source.admin.ssl.truststore.password": "******", + + "producer.security.protocol": "SSL", + "producer.ssl.endpoint.identification.algorithm": "none", + "producer.ssl.keystore.location": "/path/to/client.keystore.jks", + "producer.ssl.keystore.password": "******", + "producer.ssl.key.password": "******", + "producer.ssl.truststore.location": "/path/to/client.truststore.jks", + "producer.ssl.truststore.password": "******" +} +---- + +===== SASL_PLAINTEXT +Need to configure: + +* `.security.protocol` +* `.sasl.` + +[source,json,linenum] +---- +{ + "type": "mirrormaker", + "source.cluster.bootstrap.servers": ":", + + "consumer.security.protocol": "SASL_PLAINTEXT", + "consumer.sasl.mechanism": "", + "consumer.sasl.jaas.config": "", + + "source.admin.security.protocol": "SASL_PLAINTEXT", + "source.admin.sasl.mechanism": "", + "source.admin.sasl.jaas.config": "", + + "producer.security.protocol": "SASL_PLAINTEXT", + "producer.sasl.mechanism": "", + "producer.sasl.jaas.config": "", +} +---- + +===== SASL_SSL +Need to configure: + +* `.security.protocol` +* `.sasl.` +* `.ssl.` + +[source,json,linenum] +---- +{ + "type": "mirrormaker", + "source.cluster.bootstrap.servers": ":", + + "consumer.security.protocol": "SASL_SSL", + "consumer.sasl.mechanism": "", + "consumer.sasl.jaas.config": "", + "consumer.ssl.endpoint.identification.algorithm": "none", + "consumer.ssl.keystore.location": "/path/to/client.keystore.jks", + "consumer.ssl.keystore.password": "******", + "consumer.ssl.key.password": "******", + "consumer.ssl.truststore.location": "/path/to/client.truststore.jks", + "consumer.ssl.truststore.password": "******", + + "source.admin.security.protocol": "SASL_PLAINTEXT", + "source.admin.sasl.mechanism": "", + "source.admin.sasl.jaas.config": "", + "source.admin.ssl.endpoint.identification.algorithm": "none", + "source.admin.ssl.keystore.location": "/path/to/client.keystore.jks", + "source.admin.ssl.keystore.password": "******", + "source.admin.ssl.key.password": "******", + "source.admin.ssl.truststore.location": "/path/to/client.truststore.jks", + "source.admin.ssl.truststore.password": "******", + + "producer.security.protocol": "SASL_PLAINTEXT", + "producer.sasl.mechanism": "", + "producer.sasl.jaas.config": "", + "producer.ssl.endpoint.identification.algorithm": "none", + "producer.ssl.keystore.location": "/path/to/client.keystore.jks", + "producer.ssl.keystore.password": "******", + "producer.ssl.key.password": "******", + "producer.ssl.truststore.location": "/path/to/client.truststore.jks", + "producer.ssl.truststore.password": "******" +} +---- + +===== Third Party SASL Mechanism +For both `SASL` and `SASL_SSL` when a third party mechanism it is necessary to: + +* Include the `.sasl.jaas.config` in addition to the `.sasl.client.callback.handler.class` in the configuration +* Place the third party jar under `$(gadmin config get System.Approot)/kafka/libs/` + +[source,json,linenum] +.Example SASL Configuration with third party mechanism +---- +{ + "type": "mirrormaker", + "source.cluster.bootstrap.servers": ":", + + "consumer.security.protocol": "SASL_PLAINTEXT", + "consumer.sasl.mechanism": "", + "consumer.sasl.jaas.config": "", + "consumer.sasl.client.callback.handler.class": "", + + "source.admin.security.protocol": "SASL_PLAINTEXT", + "source.admin.sasl.mechanism": "", + "source.admin.sasl.jaas.config": "", + "source.admin.sasl.client.callback.handler.class": "", + + "producer.security.protocol": "SASL_PLAINTEXT", + "producer.sasl.mechanism": "", + "producer.sasl.jaas.config": "", + "producer.sasl.client.callback.handler.class": "" +} +---- + +==== Schema Registry Service If there is a https://docs.confluent.io/platform/current/schema-registry/index.html[schema registry service] containing the record schema of the source topic, please add it to the data source configuration: [source,json] "value.converter.schema.registry.url": "schema_registry_url" [NOTE] -Currently, only Avro schema is supported. - +Currently, only Avro schema is supported. \ No newline at end of file diff --git a/modules/data-loading/partials/kafka/kafka-example-loading-job.adoc b/modules/data-loading/partials/kafka/kafka-example-loading-job.adoc index 5cbf6a8c..48696f47 100644 --- a/modules/data-loading/partials/kafka/kafka-example-loading-job.adoc +++ b/modules/data-loading/partials/kafka/kafka-example-loading-job.adoc @@ -2,8 +2,8 @@ The following is an example loading job from and external Kafka cluster. -[source,php,linenums] -.Example loading job for BigQuery +[source,gsql,linenums] +.Example loading job from external Kafka ---- USE GRAPH ldbc_snb CREATE DATA_SOURCE s1 = "ds_config.json" FOR GRAPH ldbc_snb diff --git a/modules/data-loading/partials/load-part1-intro-and-schema.adoc b/modules/data-loading/partials/load-part1-intro-and-schema.adoc index 070ae426..a7aaf494 100644 --- a/modules/data-loading/partials/load-part1-intro-and-schema.adoc +++ b/modules/data-loading/partials/load-part1-intro-and-schema.adoc @@ -1,10 +1,9 @@ -After you have xref:gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[defined a graph schema], you can create a loading job, specify your data sources, and run the job to load data. +After you have xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[defined a graph schema], you can create a loading job, specify your data sources, and run the job to load data. -The steps are similar whether you are loading from local files, from cloud storage, or any of the other supported sources. +The steps for loading from local files, cloud storage, or any other supported sources are similar. We will call out whether a particular step is common for all loading or specific to a data source or loading mode. == Example Schema -This example uses part of the LDBC_SNB schema: [source,gsql] .Example schema taken from LDBC_SNB diff --git a/modules/data-loading/partials/load-part2-create-data-source.adoc b/modules/data-loading/partials/load-part2-create-data-source.adoc index 5a4ec42a..d08c7712 100644 --- a/modules/data-loading/partials/load-part2-create-data-source.adoc +++ b/modules/data-loading/partials/load-part2-create-data-source.adoc @@ -1,28 +1,40 @@ + == Create Data Source Object -A data source code provides a standard interface for all supported data source types, so that loading jobs can be written without regard for the data source. +A data source object provides a standard interface for all supported data source types, so that loading jobs can be written without regard for the data source. When you create the object, you specify its details (type, access credentials, etc.) in the form of a JSON object. The JSON object can either be read in from a file or provided inline. +[NOTE] +==== Inline mode is required when creating data sources for TigerGraph Cloud instances. - +==== In the following example, we create a data source named `s1`, and read its configuration information from a file called `ds_config.json`. [source,php] USE GRAPH ldbc_snb CREATE DATA_SOURCE s1 = "ds_config.json" FOR GRAPH ldbc_snb +[NOTE] +==== Older versions of TigerGraph required a keyword after `DATA_SOURCE` such as `STREAM` or `KAFKA`. +==== -[source,php] +[source,gsql] .Inline JSON data format when creating a data source CREATE DATA_SOURCE s1 = "{ type: , key: }" FOR GRAPH ldbc_snb -String literals can be enclosed with a double quote `"`, triple double quotes `"""`, or triple single quotes `'''`. -Double quotes `"` in the JSON can be omitted if the key name does not contain a colon `:` or comma `,`. +=== String Literals +String literals can be represented according to the following options: + +* Enclosed with double quote `"`. +* Enclosed with triple double quotes `"""`. +* Enclosed with triple single quotes `'''`. + +In the case of JSON that does not contain a colon `:` or a comma `,` the double quotes `"` can be omitted. [source,php] .Alternate quote syntax for inline JSON data @@ -31,4 +43,4 @@ CREATE DATA_SOURCE s1 = """{ "key": "" }""" FOR GRAPH ldbc_snb -Key names accept a separator of either a period `.` or underscore `_`, so for example, `key_name` and `key.name` are both valid key names. \ No newline at end of file +Either a period `.` or `_` can be used for separation in the specified key name. Example: `first.second` or `first_second`. diff --git a/modules/data-loading/partials/load-part3B-specify-mapping.adoc b/modules/data-loading/partials/load-part3B-specify-mapping.adoc index 1333fe63..1a7a43f8 100644 --- a/modules/data-loading/partials/load-part3B-specify-mapping.adoc +++ b/modules/data-loading/partials/load-part3B-specify-mapping.adoc @@ -25,4 +25,4 @@ LOAD file_Person TO VERTEX Person * `HEADER="true"` says that the first line in the source contains column header names instead of data. These names can be used instead of the columnn numbers. * `SPLIT` is one of GSQL's ETL functions. It says that there is a multi-valued column, which has a separator character to mark the subfields in that column. -Refer to xref:3.6@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[] in the GSQL Language Reference for descriptions of all the options for loading jobs. \ No newline at end of file +Refer to xref:{page-component-version}@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[] in the GSQL Language Reference for descriptions of all the options for loading jobs. \ No newline at end of file diff --git a/modules/data-loading/partials/load-part5-monitor-and-manage.adoc b/modules/data-loading/partials/load-part5-monitor-and-manage.adoc index 459e504f..2ac538a5 100644 --- a/modules/data-loading/partials/load-part5-monitor-and-manage.adoc +++ b/modules/data-loading/partials/load-part5-monitor-and-manage.adoc @@ -29,7 +29,7 @@ When inspecting all current jobs with `SHOW LOADING STATUS ALL`, the jobs in the You can use `SHOW LOADING STATUS job_id` to check the historical information of finished jobs. If the report for this job contains error data, you can use `SHOW LOADING ERROR job_id` to see the original data that caused the error. -See xref:gsql-ref:ddl-and-loading:managing-loading-job.adoc[Managing and Inspecting a Loading Job] for more details. +See xref:{page-component-version}@gsql-ref:ddl-and-loading:managing-loading-job.adoc[Managing and Inspecting a Loading Job] for more details. == Manage loading job concurrency diff --git a/modules/data-loading/partials/snowflake/snowflake-example-loading-job.txt b/modules/data-loading/partials/snowflake/snowflake-example-loading-job.txt index 67a4a4d2..840683e0 100644 --- a/modules/data-loading/partials/snowflake/snowflake-example-loading-job.txt +++ b/modules/data-loading/partials/snowflake/snowflake-example-loading-job.txt @@ -3,7 +3,7 @@ The following is an example loading job from Snowflake. Users may not need to use `temp_table` if they do not need token functions. -For more details see xref:gsql-ref:ddl-and-loading:functions/token/flatten_json_array.adoc[]. +For more details see xref:{page-component-version}@gsql-ref:ddl-and-loading:functions/token/flatten_json_array.adoc[]. [NOTE] ==== diff --git a/modules/data-loading/partials/spark/jdbc-deprecation.adoc b/modules/data-loading/partials/spark/jdbc-deprecation.adoc index ff814c02..1cd5fefb 100644 --- a/modules/data-loading/partials/spark/jdbc-deprecation.adoc +++ b/modules/data-loading/partials/spark/jdbc-deprecation.adoc @@ -1,4 +1,4 @@ [IMPORTANT] ==== -The legacy xref:tigergraph-server:data-loading:spark-connection-via-jdbc-driver.adoc[Spark Connection Via JDBC Driver] is deprecated. Please migrate to this new connector. +The legacy xref:data-loading:spark-connection-via-jdbc-driver.adoc[Spark Connection Via JDBC Driver] is deprecated. Please migrate to this new connector. ==== \ No newline at end of file diff --git a/modules/data-loading/partials/spark/pre-def-options.adoc b/modules/data-loading/partials/spark/pre-def-options.adoc index 455ef7c6..4c59d45e 100644 --- a/modules/data-loading/partials/spark/pre-def-options.adoc +++ b/modules/data-loading/partials/spark/pre-def-options.adoc @@ -19,6 +19,11 @@ val tgOptions = Map( ) ---- +[NOTE] +==== +If you’re running long queries through the Spark connector and a load balancer (for example, Azure Load Balancer) is between Spark and TigerGraph, increase the load balancer’s idle timeout to prevent the connection from timing out. +==== + [TIP] .Authentication methods ==== diff --git a/modules/data-loading/partials/spark/prerequisites.adoc b/modules/data-loading/partials/spark/prerequisites.adoc index 301dbefc..56883bcb 100644 --- a/modules/data-loading/partials/spark/prerequisites.adoc +++ b/modules/data-loading/partials/spark/prerequisites.adoc @@ -1,9 +1,9 @@ -=== Compatibility + * TigerGraph 3.6.0 or higher. Job-level loading statistics are available in v3.10+. * Spark 3.2 or higher with Scala 2.12 and Scala 2.13. * JAVA 8 or higher. -=== Download the JARs +=== Download the TigerGraph Spark Connector This connector can be downloaded from the Maven central repository: https://central.sonatype.com/artifact/com.tigergraph/tigergraph-spark-connector/overview[Maven Central]. The connector is available in three formats: diff --git a/modules/data-loading/partials/warehouse/warehouse-specify-mapping-details.adoc b/modules/data-loading/partials/warehouse/warehouse-specify-mapping-details.adoc index ee7ea44a..d27abec2 100644 --- a/modules/data-loading/partials/warehouse/warehouse-specify-mapping-details.adoc +++ b/modules/data-loading/partials/warehouse/warehouse-specify-mapping-details.adoc @@ -33,4 +33,4 @@ SELECT col.field1, col.field2, col.field3 FROM table SELECT ARRAY_TO_STRING(col_arr,separator) FROM table . In the LOAD statement, use the GSQL -xref:gsql-ref:ddl-and-loading:functions/token/split.adoc[SPLIT function]. \ No newline at end of file +xref:{page-component-version}@gsql-ref:ddl-and-loading:functions/token/split.adoc[SPLIT function]. \ No newline at end of file diff --git a/modules/getting-started/nav.adoc b/modules/getting-started/nav.adoc index fcb1c9f3..65485f05 100644 --- a/modules/getting-started/nav.adoc +++ b/modules/getting-started/nav.adoc @@ -2,6 +2,7 @@ * xref:index.adoc[Get Started] ** Installation *** xref:docker.adoc[On Docker] +*** xref:kubernetes.adoc[On Kubernetes] *** xref:cloud-images/index.adoc[On Cloud Marketplace] **** xref:cloud-images/aws.adoc[AWS] **** xref:cloud-images/azure.adoc[Azure] diff --git a/modules/getting-started/pages/cloud-images/aws.adoc b/modules/getting-started/pages/cloud-images/aws.adoc index 53013cde..db73593d 100644 --- a/modules/getting-started/pages/cloud-images/aws.adoc +++ b/modules/getting-started/pages/cloud-images/aws.adoc @@ -29,7 +29,7 @@ image::configuration-page (1).png[Configuration Page] [NOTE] The instance type needs to have at least 4 CPUs and 16GB RAM for TigerGraph to work properly. + -The security group must allow inbound TCP traffic to port 14240 if you want to access GraphStudio (TigerGraph's visualization platform). For more about GraphStudio, see the xref:gui:graphstudio:overview.adoc[GraphStudio UI Guide]. +The security group must allow inbound TCP traffic to port 14240 if you want to access GraphStudio (TigerGraph's visualization platform). For more about GraphStudio, see the xref:{page-component-version}@gui:graphstudio:overview.adoc[GraphStudio UI Guide]. + The security group must allow inbound TCP traffic to port 9000 if you want to send RESTful requests to TigerGraph from outside the instance (this includes configuring the GSQL client on a remote machine). For more about the REST API, see the xref:API:index.adoc[TigerGraph RESTful API User Guide]. + diff --git a/modules/getting-started/pages/cloud-images/azure.adoc b/modules/getting-started/pages/cloud-images/azure.adoc index 308fefe7..6b24e9fc 100644 --- a/modules/getting-started/pages/cloud-images/azure.adoc +++ b/modules/getting-started/pages/cloud-images/azure.adoc @@ -18,7 +18,7 @@ image::basic-settings-page (1).png[Azure Basic Settings Page] [NOTE] The instance type needs to have at least 4 CPUs and 16GB RAM for TigerGraph to work properly. + -The "NIC network security group" must allow inbound TCP traffic to port 14240 if you want to access GraphStudio (TigerGraph's visualization platform). For more about GraphStudio, see the xref:gui:graphstudio:overview.adoc[GraphStudio UI Guide]. +The "NIC network security group" must allow inbound TCP traffic to port 14240 if you want to access GraphStudio (TigerGraph's visualization platform). For more about GraphStudio, see the xref:{page-component-version}@gui:graphstudio:overview.adoc[GraphStudio UI Guide]. + The "NIC network security group" must allow inbound TCP traffic to port 9000 if you want to send RESTful requests to TigerGraph from outside the instance (this includes configuring the GSQL client on a remote machine). For more about the REST API, see the xref:API:index.adoc[TigerGraph RESTful API User Guide]. + diff --git a/modules/getting-started/pages/cloud-images/gcp.adoc b/modules/getting-started/pages/cloud-images/gcp.adoc index b935f66f..beed143c 100644 --- a/modules/getting-started/pages/cloud-images/gcp.adoc +++ b/modules/getting-started/pages/cloud-images/gcp.adoc @@ -12,7 +12,7 @@ This tutorial will show you how to start TigerGraph from an image on Google Clou When ready, click btn:[Deploy]. * The instance type needs to have at least 4 CPUs and 16GB RAM for TigerGraph to work properly. * You must allow internet traffic for TCP port 14240 if you want to access GraphStudio. -For more about GraphStudio, see the xref:gui:graphstudio:overview.adoc[GraphStudio UI Guide]. +For more about GraphStudio, see the xref:{page-component-version}@gui:graphstudio:overview.adoc[GraphStudio UI Guide]. * You must allow internet traffic for TCP port 9000 if you want to use TigerGraph's REST API (this includes configuring the GSQL client on a remote machine). + diff --git a/modules/getting-started/pages/database-definition.adoc b/modules/getting-started/pages/database-definition.adoc index 95bc5153..3d2c74b3 100644 --- a/modules/getting-started/pages/database-definition.adoc +++ b/modules/getting-started/pages/database-definition.adoc @@ -1,5 +1,5 @@ = Database Definition -//:page-aliases: tigergraph-server:data-definition:README.adoc +//:page-aliases: data-definition:README.adoc Before you can load data into a graph and write queries on TigerGraph, you must first define a graph schema. A graph schema in TigerGraph is made up of different vertex types and edge types. @@ -12,7 +12,7 @@ A graph schema is a "dictionary" that defines the types of entities, vertices an Each vertex or edge type has a name and a set of attributes (properties) associated with it. For example, a `Book` vertex could have title, author, publication year, genre, and language attributes. -To learn about the GSQL commands used to define a schema, see xref:3.9@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[GSQL language reference: Defining a Graph Schema.] +To learn about the GSQL commands used to define a schema, see xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[GSQL language reference: Defining a Graph Schema.] == Modify a schema After you define a graph schema, you can still modify it. This includes but is not limited to: @@ -26,7 +26,7 @@ Data already stored in the graph and which is not logically part of the change w For example, if you had 100 Book vertices and then added an attribute to the Book schema, you would still have 100 Books, with default values for the new attribute. If you dropped a Book attribute, you still would have all your books, but the one attribute would be gone. -To learn about the GSQL commands used in modifying a schema, see xref:3.9@gsql-ref:ddl-and-loading:modifying-a-graph-schema.adoc[GSQL language reference: Modifying a Graph Schema.] +To learn about the GSQL commands used in modifying a schema, see xref:{page-component-version}@gsql-ref:ddl-and-loading:modifying-a-graph-schema.adoc[GSQL language reference: Modifying a Graph Schema.] [#_reset_all] == Reset all diff --git a/modules/getting-started/pages/docker.adoc b/modules/getting-started/pages/docker.adoc index 98e95080..f098fcec 100644 --- a/modules/getting-started/pages/docker.adoc +++ b/modules/getting-started/pages/docker.adoc @@ -41,8 +41,13 @@ TigerGraph on Docker is not supported for ARM processors, including M1 Macs. *** Ubuntu: https://docs.docker.com/install/linux/docker-ce/ubuntu/ ** To install Docker for Windows OS, follow this video: https://www.youtube.com/watch?v=ymlWt1MqURY . Configure Docker Desktop with sufficient resources: - ** Recommended: 4 cores and 16GB memory - ** Minimum: 2 cores and 10GB memory ++ +Please see our xref:installation:hw-and-sw-requirements.adoc#_hardware_recommendations[hardware recommendations]. +The minimum sizes are for the TigerGraph software itself and a small amount of data. +If your RAM is not large enough to hold all your data, plus some extra RAM for computation, you will experience slower performance, possibly significantly slower. + + ** Recommended: 8 cores and 24GB memory, or about 80% of what your computer allows. + ** Minimum: 4 cores and 8 GB memory ** Click the Docker Desktop icon, click *Preferences* >> *Resources*, drag the CPU and Memory sliders to the desired configuration, save and restart Docker Desktop == Prepare a shared folder for your container @@ -67,7 +72,7 @@ Run the following command to pull the TigerGraph Docker image, bind ports, map a [source.wrap,console] ---- -$ docker run -d \ <1> +sudo docker run -d \ <1> -p 14022:22 \ <2> -p 9000:9000 \ <2> -p 14240:14240 \ <2> @@ -75,8 +80,8 @@ $ docker run -d \ <1> --ulimit nofile=1000000:1000000 \ <4> -v ~/data:/home/tigergraph/mydata \ <5> -v tg-data:/home/tigergraph \ <6> - -t \ <7> - tigergraph/tigergraph:latest <8> + -t tigergraph/tigergraph:latest <7> + ---- <1> `-d`: make the container run in the background. <2> `-p`: map Docker port 22 to the host OS port 14022, 9000 to host OS 9000, and 14240 to host OS 14240. @@ -89,21 +94,34 @@ For example, `c:\data` If the volume doesn't exist, Docker creates it automatically. This allows you to retain the data from your container. The next time you start up a new container with the same volume, all your changes are preserved. -<7> `-t`: allocate a pseudo terminal. -<8> `tigergraph/tigergraph:latest`: download the latest Docker image from the TigerGraph Docker registry URL tigergraph/tigergraph. +<7> `tigergraph/tigergraph:latest`: download the latest Docker image from the TigerGraph Docker registry URL tigergraph/tigergraph. Replace "latest" with a specific version number if a dedicated version of TigerGraph is to be used. For example, if you want to get the 3.0.5 version, the URL should be: `tigergraph/tigergraph:3.0.5`. -If you use Windows and have write permission issues with the above command, try the following command instead (this command does not map the shared folder on your host machine to your container) : +In Linux shells, a *backslash (\)* lets you split a long command across multiple lines. Our examples use it for readability. If you do the same, make sure the backslash is the very last character on the line with no space after it, or the command will fail with an *invalid reference format* error. +If you see a permission denied error like: + +[source.wrap,console] +---- +docker: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Head "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied +---- + +Run the command with `sudo` (as shown above), or configure your system so your user can run Docker without `sudo`. + +If you are on Windows and encounter file permission issues, you can run the command without mounting the shared folder: [source.wrap,console] ---- -$ docker run -d -p 14022:22 -p 9000:9000 -p 14240:14240 --name tigergraph --ulimit nofile=1000000:1000000 -t tigergraph/tigergraph:latest +sudo docker run -d -p 14022:22 -p 9000:9000 -p 14240:14240 \ + --name tigergraph --ulimit nofile=1000000:1000000 \ + -t tigergraph/tigergraph:latest ---- [NOTE] -If you intend to have multiple instances of TigerGraph in the same Docker environment, whether they are the same version or not, we advise using unique <3> container names and <6> volume names for each one, to avoid accidental reuse or interference between volumes. +==== +If you intend to have multiple instances of TigerGraph in the same Docker environment, whether they are the same version or not, we advise using unique <3> container names and <6> volume names for each one, to avoid conflicts and data overwrite. +==== == Connect to your container (via SSH or `docker exec`) @@ -137,14 +155,14 @@ WARNING: Your TigerGraph image is preconfigured with a Linux user called `tigerg . Change the password of the Linux user `tigergraph`. -. xref:tigergraph-server:user-access:user-management.adoc#_change_a_users_password[Change the password] of the database user called `tigergraph`: +. xref:user-access:user-management.adoc#_change_a_users_password[Change the password] of the database user called `tigergraph`: + [source,console] ---- $ gsql ALTER PASSWORD tigergraph ---- -. For additional TigerGraph security settings, see xref:tigergraph-server:security:index.adoc[]. +. For additional TigerGraph security settings, see xref:security:index.adoc[]. Please follow best practices for securing and hardening the docker container especially when installing in a shared environment. @@ -157,7 +175,7 @@ Please follow best practices for securing and hardening the docker container esp $ gadmin start all ---- -. Run the `gsql` command as shown below to start the GSQL shell. If you are new to TigerGraph, you can run the xref:gsql-ref:tutorials:gsql-101/index.adoc[GSQL 101] tutorial now. +. Run the `gsql` command as shown below to start the GSQL shell. If you are new to TigerGraph, you can run the xref:{page-component-version}@gsql-ref:tutorials:gsql-101/index.adoc[GSQL 101] tutorial now. + [source,console] ---- diff --git a/modules/getting-started/pages/index.adoc b/modules/getting-started/pages/index.adoc index 1d0d9b88..533c0260 100644 --- a/modules/getting-started/pages/index.adoc +++ b/modules/getting-started/pages/index.adoc @@ -2,20 +2,22 @@ //:page-aliases: getting-started:readme.adoc, getting-started:README.adoc This Get Started section covers the various options for users who are installing the TigerGraph database themselves. -If you are using TigerGraph Cloud, please refer to xref:tigergraph-server:intro:comparison-of-editions.adoc[] for a comparison chart of TigerGraph Server vs. TigerGraph Cloud. +If you are using TigerGraph Cloud, please refer to xref:intro:comparison-of-editions.adoc[] for a comparison chart of TigerGraph Server vs. TigerGraph Cloud. -== xref:tigergraph-server:getting-started:docker.adoc[] +== xref:getting-started:docker.adoc[Get Started with Docker] If you are a Mac or Windows user, we recommend you use Docker to start up TigerGraph on your computer. NOTE: The Docker image version of TigerGraph is for personal or R&D use and not for production use. +== xref:getting-started:kubernetes.adoc[Getting started on Kubernetes] -== xref:tigergraph-server:getting-started:linux.adoc[Get Started on Linux] +Deploy a TigerGraph cluster on GKE, EKS, AKS, or OpenShift using the TigerGraph Operator. -If you have a Linux machine that meets our xref:installation:hw-and-sw-requirements.adoc[], you can install TigerGraph on your machine directly. +== xref:getting-started:linux.adoc[Get Started on Linux] +If you have a Linux machine that meets our xref:installation:hw-and-sw-requirements.adoc[], you can install TigerGraph on your machine directly. -== xref:tigergraph-server:getting-started:cloud-images/index.adoc[Get Started from a Cloud Marketplace] +== xref:getting-started:cloud-images/index.adoc[Get Started from a Cloud Marketplace] You can also start up TigerGraph instances from Cloud Images on AWS, Microsoft Azure, or Google Cloud Platform. diff --git a/modules/getting-started/pages/kubernetes.adoc b/modules/getting-started/pages/kubernetes.adoc new file mode 100644 index 00000000..2e84b9ff --- /dev/null +++ b/modules/getting-started/pages/kubernetes.adoc @@ -0,0 +1,16 @@ += Getting started on Kubernetes + +To get started go to our https://github.com/tigergraph/ecosys/blob/master/k8s/docs/02-get-started/get_started.md[Get Started] GitHub page where you will install the Operator and provision your first cluster using the Operator. + +Use the Operator on any cloud platform with Kubernetes support. + +TigerGraph has verified the full functionality of the operator on the Kubernetes services of the following platform: + +* https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-gke.md[Google Kubernetes Engine (GKE)] +* https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-openshift.md[Red Hat OpenShift] +* https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-eks.md[AWS Elastic Kubernetes Service (EKS)] +* https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-aks.md[Azure Kubernetes Service (AKS)] + +Additionally, TigerGraph Kubernetes Operator can be installed and deployed without internet access. + +* https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/deploy-without-internet.md[Install and Deploy without internet access] diff --git a/modules/getting-started/pages/linux.adoc b/modules/getting-started/pages/linux.adoc index c1775a3c..cf5e4a24 100644 --- a/modules/getting-started/pages/linux.adoc +++ b/modules/getting-started/pages/linux.adoc @@ -1,7 +1,7 @@ = Install TigerGraph on Linux You can install TigerGraph on a Linux machine that meets the xref:installation:hw-and-sw-requirements.adoc[Hardware and Software Requirements]. -For a step-by-step guide on installing TigerGraph on your Linux machine, please visit xref:tigergraph-server:installation:bare-metal-install.adoc[]. +For a step-by-step guide on installing TigerGraph on your Linux machine, please visit xref:installation:bare-metal-install.adoc[]. == Quickstart guide for New Users @@ -25,4 +25,4 @@ sudo ./install.sh sudo ./install.sh -n ---- - .. For additional options, see xref:tigergraph-server:installation:bare-metal-install.adoc[]. + .. For additional options, see xref:installation:bare-metal-install.adoc[]. diff --git a/modules/gsql-shell/pages/index.adoc b/modules/gsql-shell/pages/index.adoc index d99fc1cf..cbb0038d 100644 --- a/modules/gsql-shell/pages/index.adoc +++ b/modules/gsql-shell/pages/index.adoc @@ -5,7 +5,7 @@ The GSQL shell is a fully functional Java environment for interacting with the TigerGraph database. It is one of the primary ways to interact with the TigerGraph database and is included in a standard TigerGraph installation. -To learn more about the GSQL language, follow our tutorial series starting with xref:gsql-ref:tutorials:gsql-101/index.adoc[]. +To learn more about the GSQL language, follow our tutorial series starting with xref:{page-component-version}@gsql-ref:tutorials:gsql-101/index.adoc[]. == Launch the shell As the TigerGraph Linux user, type `gsql` into the bash terminal to start a GSQL shell session: diff --git a/modules/gsql-shell/pages/using-a-remote-gsql-client.adoc b/modules/gsql-shell/pages/using-a-remote-gsql-client.adoc index d2772cc4..675e1152 100644 --- a/modules/gsql-shell/pages/using-a-remote-gsql-client.adoc +++ b/modules/gsql-shell/pages/using-a-remote-gsql-client.adoc @@ -102,7 +102,7 @@ echo | openssl s_client -connect :443 | sed -ne '/-BEGIN CERTIFICATE-/ === Generate a secret -Use Admin Portal to generate a secret for your user account: xref:gui:admin-portal:management/user-management.adoc[] +Use Admin Portal to generate a secret for your user account: xref:{page-component-version}@gui:admin-portal:management/user-management.adoc[] === Connect to the cloud instance diff --git a/modules/installation/nav.adoc b/modules/installation/nav.adoc index bdd3046d..09d5a117 100644 --- a/modules/installation/nav.adoc +++ b/modules/installation/nav.adoc @@ -4,6 +4,7 @@ ** xref:bare-metal-install.adoc[] ** xref:post-install-check.adoc[] ** xref:change-port.adoc[] +** xref:change-ip-or-hostname.adoc[] ** xref:upgrade.adoc[Upgrade] ** xref:uninstallation.adoc[Uninstallation] ** xref:license.adoc[] diff --git a/modules/installation/pages/change-ip-or-hostname.adoc b/modules/installation/pages/change-ip-or-hostname.adoc new file mode 100644 index 00000000..4c8ec7ae --- /dev/null +++ b/modules/installation/pages/change-ip-or-hostname.adoc @@ -0,0 +1,34 @@ += Change IP or Hostname +:description: How to change the IP or hostname of one or more nodes in a cluster. + +If you need to change the IP address or hostname of a node or a cluster after TigerGraph is up and running, +you can change them by running the following commands as the tigergraph admin user. + +1. Pause your database operations. + +2. Shut down services, as shown below: + +[source,bash] +---- +gadmin init etcd +gadmin stop all +---- + +The command `gadmin init etcd` initializes the etcd component of TigerGraph, while `gadmin stop all` stops all running TigerGraph services. + +3. Make your IP changes (outside of TigerGraph). + +4. Apply the new IP/hostnames, using the two commands below. + +The first command applies the updated TigerGraph host list from '~/.tg.cfg' to define cluster nodes for deployment. + +The second command initializes the TigerGraph cluster without stopping running services `(--skip-stop)` and also initializes etcd, +the distributed key-value store for configuration and coordination `(--init-etcd)`. + + After change IP/hostname(s): + +[source,bash] +---- +gadmin config entry System.HostList --file ~/.tg.cfg # Edit IP/hostname here +gadmin init cluster --skip-stop --init-etcd +---- diff --git a/modules/installation/pages/hw-and-sw-requirements.adoc b/modules/installation/pages/hw-and-sw-requirements.adoc index 04ade295..e25f28ea 100644 --- a/modules/installation/pages/hw-and-sw-requirements.adoc +++ b/modules/installation/pages/hw-and-sw-requirements.adoc @@ -26,7 +26,7 @@ The software has been tested on the operating systems listed below: | RedHat (RHEL) 7.0 to 8.9 | ✓ -| RedHat (RHEL) 9 +| RedHat (RHEL) 9.0 to 9.2/9.5 | ✓ @@ -141,6 +141,9 @@ Other browser-based products, such as TigerGraph Insights, have their own browse Choosing the right hardware to host your TigerGraph system is crucial for the right balance of cost and performance. This page provides some general guidelines for hardware selection based on simple hypothetical assumptions, but your actual hardware requirements will vary based on your data size, workload, and performance requirements. +The minimum sizes are for the TigerGraph software itself and a small amount of data. +If your RAM is not large enough to hold all your data, plus some extra RAM for computation, you will experience slower performance, possibly significantly slower. + The sizing recommendations below apply to each server node. If you have more than several hundred gigabytes of data, you should consider deploying a cluster of multiple nodes, to distribute your data. NOTE: Consult a TigerGraph solution architects for an estimate of memory and storage needs. @@ -263,6 +266,17 @@ Here is an example of how to specify different mount points in the `install_conf ``` TIP: To economize, you can opt to use a magnetic hard disk (HDD) for the logs; the remaining ones must be SSD. +[WARNING] +==== +Disks must be mounted with the *exec* option to AppRoot and DataRoot. Failure to do so will result in the inability to run any installed queries. Users can check the disk option by running the _mount_ command in a shell, e.g. + +``` +$ mount +/dev/sda2 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota) +``` +The *noexec* option will be shown when the disk is mounted without the *exec* option. +==== + Another important point when choosing the Disk type is the disk IOPS capacity. We *strongly* suggest opting for ≥3000 IOPS. diff --git a/modules/installation/pages/upgrade.adoc b/modules/installation/pages/upgrade.adoc index ef21ebca..3be6f299 100644 --- a/modules/installation/pages/upgrade.adoc +++ b/modules/installation/pages/upgrade.adoc @@ -9,11 +9,16 @@ TigerGraph to TigerGraph {page-component-version}. [IMPORTANT] ==== -Always check the xref:release-notes:index.adoc[] for all the versions between your current version and your target version, for deprecated features, known issues, or other behavioral changes. +Always check the xref:release-notes:index.adoc[] for all the versions between your current version and your target version, for deprecated features, known issues, new reserved words, or other behavioral changes. You may need to make migration changes to your TigerGraph application either before or after the upgrade. +For example, change the name of schema elements and refactor queries if a new reserved word was introduced. If you have any questions or uncertainty, please contact TigerGraph Support. ==== +=== Review Upgrade-Related Issues +See the list of xref:#_known_issues_and_workarounds[]. +In some cases, there are limitations with migrating data from one version to a newer version. + ==== User-defined function (UDF) compatibility TigerGraph version 3.9 introduced changes in the way user-defined functions are accepted by the system. @@ -52,11 +57,11 @@ System upgrade does not support version rollback at this time. === Prepare a pre-3.2 database -. First perform a backup xref:backup-and-restore:index.adoc[using GBAR (Graph Backup and Restore)] on your existing installation. -. *Restore* your installation with GBAR to rebuild it. +. First perform a backup xref:backup-and-restore:index.adoc[backup] on your existing installation. +. *Restore* your installation with `gadmin backup restore` to rebuild it. This step is necessary because TigerGraph 3.2 made major changes to the Graph Storage Engine and a restore is needed to remove certain files that would make the upgrade operation fail. . Ensure that the restore is completely finished and there are no pending graph modifications (schema change, insert, update, or delete) before starting the upgrade. -You can do this by calling the xref:tigergraph-server:API:built-in-endpoints.adoc#_rebuild_graph_engine[/rebuildnow] endpoint and waiting until there are no more PullDelta messages being printed in the logs. +You can do this by calling the xref:API:built-in-endpoints.adoc#_rebuild_graph_engine[/rebuildnow] endpoint and waiting until there are no more PullDelta messages being printed in the logs. === Ensure that the database is inactive @@ -66,27 +71,54 @@ Ensure that the database will be inactive throughout the upgrade process. . Stop any new database requests. . Ensure that all previous operations such as queries, loading jobs, schema changes, and data updates and deletions are completely finished. Check the appropriate logs. -Call the xref:tigergraph-server:API:built-in-endpoints.adoc#_rebuild_graph_engine[/rebuildnow] endpoint to force the data store to consume all pending updates and then wait until there are no more PullDelta messages being printed in the logs. +Call the xref:API:built-in-endpoints.adoc#_rebuild_graph_engine[/rebuildnow] endpoint to force the data store to consume all pending updates and then wait until there are no more PullDelta messages being printed in the logs. == Upgrading a CRR system + +[NOTE] +==== +After upgrading to a newer version, you must perform a backup and restore on the newer version before enabling CRR, even if no data has changed. You cannot enable CRR using a backup from an earlier version. +==== + To upgrade the TigerGraph software on a CRR system, follow this sequence of steps. +. Stop CRR on your DR cluster. ++ +[source,console] +---- +gadmin crr stop -y +---- + . Disable CRR on your DR cluster. + -[source.wrap,console] +[source,console] ---- -$ gadmin config set System.CrossRegionReplication.Enabled false -$ gadmin config apply -y -$ gadmin restart all -y +gadmin config set System.CrossRegionReplication.Enabled false +gadmin config apply -y +gadmin restart all -y ---- -. Upgrade both the primary cluster and DR cluster, according to the instructions on this page. +. Upgrade the cluster. + +. Start CRR on the new DR cluster. ++ +[source,console] +---- +gadmin crr start +---- -. Enable CRR on the DR cluster. [[upgrading-from-v3x]] == Upgrading from v3.x +Always upgrade from the most recent maintenance version in any minor release. +If you are upgrading from a version that does not have the most recent patches applied to its minor version, upgrade to the most recent patches first before upgrading to another minor or major version. +For example, if you are upgrading from 3.5.0 to 3.11.x, upgrade to 3.5.3 first using the installation script and then upgrade to 3.11.x. + +Additionally, versions upgrading from 3.0 or 3.1 must first go through 3.2.4. +For example, if you are running version 3.1.3, you must first upgrade to 3.1.6, then 3.2.4, then to 3.10.x. + + [IMPORTANT] ==== When switching to a new version of TigerGraph it will stop the current services which will make the cluster temporarily unavailable. @@ -107,15 +139,13 @@ $ ./install.sh -U .... [NOTE] +.Known Issue - File Permissions ==== -Please upgrade from the most recent maintenance version in any minor release. -If you are upgrading from a version that does not have the most recent patches applied to its minor version, upgrade to the most recent patches first before upgrading to another minor or major version. -For example, if you are upgrading from 3.5.0 to 3.10.x, upgrade to 3.5.3 first using the installation script and then upgrade to 3.10.x. - -Additionally, versions upgrading from 3.0 or 3.1 must first go through 3.2.4. -For example, if you are running version 3.1.3, you must first upgrade to 3.1.6, then 3.2.4, then to 3.10.x. +The `pre_upgrade_check.sh` script might encounter a permission-denied error for the destination directory. +If this occurs, manually give permission to the `/var/tmp` folder. ==== + Once binaries and config files are installed on local machine and also distributed to all the other machines, a message will be prompted to user: [console] @@ -203,7 +233,7 @@ link:https://tigergraph.zendesk.com/hc/en-us/articles/8173584319892-2-6-x-to-3-x === Upgrading from 3.x -. xref:backup-and-restore:backup-and-restore.adoc[Back up] your TigerGraph instance using GBAR. +. xref:backup-and-restore:backup-and-restore.adoc[Back up] your TigerGraph instance using `gadmin backup`. . Start a new instance from the latest cloud marketplace listing. . Use the backup files you generated earlier to xref:backup-and-restore:backup-and-restore.adoc[restore] the new instance. @@ -224,7 +254,7 @@ In previous releases, `to_string()` was included in the default `ExprFunctions` Users need to rename or remove UDFs that are called `to_string()`. Now, that it is added as a built-in function users are no longer needing to include it in the `ExprFunctions` file. -For more reference on how to prepare for an upgrade please refer back to the section: xref:tigergraph-server:installation:upgrade.adoc#_before_you_begin[Before You Begin]. +For more reference on how to prepare for an upgrade please refer back to the section: xref:installation:upgrade.adoc#_before_you_begin[Before You Begin]. === `tg_` is Now a Reserved Keyword @@ -236,7 +266,7 @@ Users can either rename, remove, or comment out any functions in their `ExprFunc Additionally, users should avoid prefixing future functions with this reserved prefix. This is to avoid naming collisions with queries. -For more reference on how to prepare for an upgrade please refer back to the section: xref:tigergraph-server:installation:upgrade.adoc#_before_you_begin[Before You Begin]. +For more reference on how to prepare for an upgrade please refer back to the section: xref:installation:upgrade.adoc#_before_you_begin[Before You Begin]. === UDF File Policy @@ -282,7 +312,7 @@ Sometimes users can encounter a issue with this policy after an upgrade. ==== What to do: Before (recommended) or after an upgrade users should change or use different constant paths for queries and loading jobs that do *not* violate policy. -For more on file policy see xref:tigergraph-server:security:file-output-policy.adoc[] and/or xref:tigergraph-server:security:gsql-file-input-policy.adoc[]. +For more on file policy see xref:security:file-output-policy.adoc[] and/or xref:security:gsql-file-input-policy.adoc[]. === Other issues diff --git a/modules/intro/pages/index.adoc b/modules/intro/pages/index.adoc index f60876af..71fdecff 100644 --- a/modules/intro/pages/index.adoc +++ b/modules/intro/pages/index.adoc @@ -5,7 +5,7 @@ //Introduction and Welcome The TigerGraph(R) database runs on standard, commodity-grade Linux servers and is designed in C++ to fit into your existing environment with minimum fuss. -NOTE: To view the documentation of a particular TigerGraph version, please see xref:tigergraph-server:additional-resources:legacy-tg-versions.adoc[]. +NOTE: To view the documentation of a particular TigerGraph version, please see the xref:additional-resources:legacy-tg-versions.adoc[Version Directory]. == Get to Know Your TigerGraph DB [.home-card,cols="2",grid=none,frame=none, separator=¦ ] @@ -14,11 +14,11 @@ NOTE: To view the documentation of a particular TigerGraph version, please see x image:getstarted-homecard.png[alt=getstarted,width=74,height=74] *Get Started* -Step-by-step guides to help you get up and running. +Step-by-step guides to get you up and running. -xref:tigergraph-server:installation:hw-and-sw-requirements.adoc[System Requirements] | -xref:tigergraph-server:getting-started:index.adoc[Get Started] | -xref:tigergraph-server:gsql-shell:index.adoc[The GSQL Shell] +xref:installation:hw-and-sw-requirements.adoc[System Requirements] | +xref:getting-started:index.adoc[Get Started] | +xref:gsql-shell:index.adoc[The GSQL Shell] ¦ image:installation-homecard.png[alt=installation,width=74,height=74] @@ -26,29 +26,29 @@ image:installation-homecard.png[alt=installation,width=74,height=74] Learn what you need to install TigerGraph. -xref:tigergraph-server:installation:bare-metal-install.adoc[Bare Metal] | -xref:tigergraph-server:getting-started:docker.adoc[On Docker] | -xref:tigergraph-server:getting-started:cloud-images/index.adoc[On Cloud Marketplace] +xref:installation:bare-metal-install.adoc[Bare Metal] | +xref:getting-started:docker.adoc[On Docker] | +xref:getting-started:cloud-images/index.adoc[On Cloud Marketplace] ¦ image:designdatbase-homecard.png[alt=designdatbase,width=74,height=74] -*Create Database* +*Create a Database* Learn how to design a database, create loading jobs and write queries. -xref:tigergraph-server:getting-started:database-definition.adoc[Database Definition] | +xref:getting-started:database-definition.adoc[Database Definition] | xref:multigraph-overview.adoc[Multi-graph] | -xref:gsql-ref:intro:index.adoc[GSQL Language Reference] +xref:{page-component-version}@gsql-ref:intro:index.adoc[GSQL Language Reference] ¦ image:DataLoading-Homecard.png[alt=dataloading,width=74,height=74] -*Load Data* +*Load and Export Data* -Learn how to load and export data into a TigerGraph system. +Learn how to load data into and export data from TigerGraph. -xref:tigergraph-server:data-loading:index.adoc[Data Loading] | -xref:tigergraph-server:backup-and-restore:database-import-export.adoc[Database Import/Export All] | -xref:tigergraph-server:backup-and-restore:single-graph-import-export.adoc[] +xref:data-loading:index.adoc[Data Loading] | +xref:backup-and-restore:database-import-export.adoc[Database Import/Export All] | +xref:backup-and-restore:single-graph-import-export.adoc[] |=== @@ -59,29 +59,29 @@ xref:tigergraph-server:backup-and-restore:single-graph-import-export.adoc[] image:systemmanagment-homecard.png[alt=useraccess,width=74,height=74] *Operations Mgmt* -Understand `gadmin` the tool for managing TigerGraph servers and how to setup system backups. +Learn to manage TigerGraph servers and how to set up system backups. -xref:tigergraph-server:system-management:management-with-gadmin.adoc[System Management] | -xref:tigergraph-server:backup-and-restore:index.adoc[Backup and Restore ] +xref:system-management:management-with-gadmin.adoc[System Management] | +xref:backup-and-restore:index.adoc[Backup and Restore ] ¦ image:security-homecard.png[alt=security,width=74,height=74] *User Access Mgmt* -Learn about TigerGraph's role-based access control (RBAC) model and other security access features. +Learn to use TigerGraph's role-based access control (RBAC) and other security features. -xref:tigergraph-server:user-access:index.adoc[User Access Management] | -xref:tigergraph-server:security:index.adoc[Security ] +xref:user-access:index.adoc[User Access Management] | +xref:security:index.adoc[Security ] ¦ image:systemconig-homecard.png[alt=systemconig,width=74,height=74] *System Config* -Learn how to manage clusters and setup high availability (HA). +Learn to configure clusters and set up high availability (HA). -xref:tigergraph-server:cluster-and-ha-management:index.adoc[Overview] | -xref:tigergraph-server:cluster-and-ha-management:crr-index.adoc[Cross-Region Replication] | -xref:tigergraph-server:cluster-and-ha-management:ha-overview.adoc[High Availability ] +xref:cluster-and-ha-management:index.adoc[Overview] | +xref:cluster-and-ha-management:crr-index.adoc[Cross-Region Replication] | +xref:cluster-and-ha-management:ha-overview.adoc[High Availability ] ¦ image:ArchtectureOverview-homecard.png[alt=ArchtectureOverview,width=74,height=74] @@ -91,35 +91,34 @@ Go deeper and learn what's behind the platform. xref:internal-architecture.adoc[Internal Architecture] | xref:transaction-and-acid.adoc[Transaction and ACID] | -xref:tigergraph-server:intro:continuous-availability-overview.adoc[] +xref:intro:continuous-availability-overview.adoc[] ¦ image:TG_Icon_Library-08.png[alt=ArchtectureOverview,width=74,height=74] *Kubernetes* -Learn how deploy TigerGraph single servers and clusters using Kubernetes. +Automate the deployment and managing of TigerGraph clusters using Kubernetes. -xref:tigergraph-server:kubernetes:index.adoc[Kubernetes] | -xref:tigergraph-server:kubernetes:k8s-operator/index.adoc[] +xref:kubernetes:index.adoc[Kubernetes] | +xref:kubernetes:k8s-operator/index.adoc[] ¦ image:documentation-homecard.png[alt=ArchtectureOverview,width=74,height=74] *Additional Resources* -Explore additional resources to find our troubleshooting guide and other references. +Explore API directories, troubleshooting guides, and more. xref:additional-resources:best-practice-guides/best-practices-overview.adoc[] | -xref:tigergraph-server:troubleshooting:troubleshooting-guide.adoc[Troubleshooting Guide] | -xref:tigergraph-server:reference:glossary.adoc[Glossary] | -xref:tigergraph-server:reference:ports.adoc[List of Ports] | -xref:tigergraph-server:reference:configuration-parameters.adoc[Configuration Parameters] +xref:troubleshooting:troubleshooting-guide.adoc[Troubleshooting Guide] | +xref:reference:glossary.adoc[Glossary] | +xref:reference:ports.adoc[List of Ports] | +xref:reference:configuration-parameters.adoc[Configuration Parameters] |=== == Release Notes To keep up-to-date on key new features of the most recent LTS versions of TigerGraph, please see: -* xref:tigergraph-server:release-notes:index.adoc[Release Notes - TigerGraph 3.10] -* xref:3.9@tigergraph-server:release-notes:index.adoc[Release Notes - TigerGraph 3.9] -* xref:3.6@tigergraph-server:release-notes:index.adoc[Release Notes - TigerGraph 3.6] +* xref:3.11@tigergraph-server:release-notes:index.adoc[Release Notes - TigerGraph 3.11] +* xref:3.10@tigergraph-server:release-notes:index.adoc[Release Notes - TigerGraph 3.10] diff --git a/modules/kubernetes/pages/k8s-operator/connect-to-cluster.txt b/modules/kubernetes/pages/k8s-operator/connect-to-cluster.txt index 25619606..235604b6 100644 --- a/modules/kubernetes/pages/k8s-operator/connect-to-cluster.txt +++ b/modules/kubernetes/pages/k8s-operator/connect-to-cluster.txt @@ -28,7 +28,7 @@ In the output, under the external-IP column, you can find the external IP of the Visit port 14240 the IP to access GraphStudio. == Access REST endpoints -To access TigerGraph's xref:tigergraph-server:API:index.adoc[REST endpoints], make sure port 9000 on your cluster is accessible. +To access TigerGraph's xref:API:index.adoc[REST endpoints], make sure port 9000 on your cluster is accessible. Run the following command to retrieve the IP address of the REST external service. [.wrap,console] diff --git a/modules/kubernetes/pages/k8s-operator/index.adoc b/modules/kubernetes/pages/k8s-operator/index.adoc index 75a59702..6319be1f 100644 --- a/modules/kubernetes/pages/k8s-operator/index.adoc +++ b/modules/kubernetes/pages/k8s-operator/index.adoc @@ -15,7 +15,7 @@ NOTE: We have provided a public https://github.com/tigergraph/ecosys/tree/master //* xref:k8s-operator/cluster-operations.txt#_check_cluster_version_and_status[Checking cluster status] //* xref:k8s-operator/cluster-operations.txt#_shrink_expand_cluster[Shrink or expand a cluster] //* xref:k8s-operator/backup-and-restore.txt[] -//* Clusters can contain xref:tigergraph-server:kubernetes:k8s-operator/custom-containers.txt[custom containers and volumes] [3.9.2+] +//* Clusters can contain xref:kubernetes:k8s-operator/custom-containers.txt[custom containers and volumes] [3.9.2+] == Getting started @@ -32,6 +32,7 @@ TigerGraph has verified the full functionality of the operator on the Kubernetes * https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-gke.md[Google Kubernetes Engine (GKE)] * https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-openshift.md[Red Hat OpenShift] * https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-eks.md[AWS Elastic Kubernetes Service (EKS)] +* https://github.com/tigergraph/ecosys/blob/master/k8s/docs/03-deploy/tigergraph-on-aks.md[Azure Kubernetes Service (AKS)] Additionally, TigerGraph Kubernetes Operator can be installed and deployed without internet access diff --git a/modules/reference/pages/configuration-parameters.adoc b/modules/reference/pages/configuration-parameters.adoc index ede27b37..dcaffd99 100644 --- a/modules/reference/pages/configuration-parameters.adoc +++ b/modules/reference/pages/configuration-parameters.adoc @@ -89,6 +89,8 @@ heartbeats that can be missed before one service is considered dead by the controller |`5` |Controller.Port |The serving gRPC (Google Remote Procedure Call) port for Controller |`9188` + +|Controller.ServiceManager.AutoRestart |When set to `true`, all services start with the `--auto-restart` option during xref:system-management:management-commands.adoc#_gadmin_start[`gadmin start`] or xref:system-management:management-commands.adoc#_gadmin_restart[`gadmin restart`]. This enables services to automatically restart if they crash. The default value is `false`. |`false` |=== == Dict @@ -515,15 +517,25 @@ in seconds |`10800` |GSQL.UDF.Policy.Enable | Whether to enforce a policy on the contents of UDF files (see xref:security:index.adoc#_udf_file_scanning[UDF file scanning]). |`true` -|GSQL.UDF.Policy.HeaderAllowlist | A default set of C{plus}{plus} headers that are allowed to be included in a UDF file. -|`["stdlib.h", "string", "tuple", +|GSQL.UDF.Policy.HeaderAllowlist +a|A list of allowed C++ header files for UDF development. +This parameter accepts an array. +You can update it in two ways: + +* Using `gadmin config entry`: +This command is interactive and allows you to add, remove, or modify array items step-by-step. + +* Using `gadmin config set`: +When using this command, wrap the entire JSON array in *single quotes* so the shell passes it as one argument. + +|`gadmin config set GSQL.UDF.Policy.HeaderAllowlist '["stdlib.h", "string", "tuple", "vector", "list", "deque", "arrays", "forward_list", "queue", "priority_queue", "stack", "set", "multiset", "map", "multimap", "unordered_set", "unordered_multiset", "unordered_map", "unordered_multimap", "iterator", "sstream", -"algorithm", "math.h"]` +"algorithm", "math.h"]'` |GSQL.UserInfoLimit.TokenSizeLimit |The max number of tokens allowed |`60000` @@ -676,7 +688,8 @@ Longer retention results in higher disk space usage and slower search for histor |=== |Name |Description |Example |Kafka.BasicConfig.Env | A list of `=` pairs, separated by `;` -|`nan` +|' ' + |Kafka.BasicConfig.LogConfig.LogFileMaxSizeMB |The maximum size in megabytes of the log file before it gets rotated |`100` @@ -735,7 +748,7 @@ eligible for deletion (gigabytes) |`40` |KafkaConnect.AllowedTaskPerCPU |[v3.9.2+] Maximum number of allowed connector tasks = (#CPUs) x AllowedTaskPerCPU. Range is [0.5,10]. It is recommended to stay below 2.0. |`1.5` (default) -|KafkaConnect.BasicConfig.Env |A list of `=` pairs, separated by `;` |`nan` +|KafkaConnect.BasicConfig.Env |A list of `=` pairs, separated by `;` |' ' |KafkaConnect.BasicConfig.LogConfig.LogFileMaxSizeMB |The maximum size in megabytes of the log file before it gets rotated |`100` @@ -783,7 +796,7 @@ attempting to retry a failed fetch request to a given topic partition [width="100%",cols="34%,33%,33%",options="header",] |=== |Name |Description |Example -|KafkaLoader.BasicConfig.Env | A list of `=` pairs, separated by `;` |`nan` +|KafkaLoader.BasicConfig.Env | A list of `=` pairs, separated by `;` |' ' |KafkaLoader.BasicConfig.LogConfig.LogFileMaxDurationDay |The maximum number of days to retain old log files based on the timestamp encoded in @@ -838,7 +851,9 @@ node |`1` [width="100%",cols="34%,33%,33%",options="header",] |=== |Name |Description |Example -|KafkaStreamLL.BasicConfig.Env | A list of `=` pairs, separated by `;` |`nan` +|KafkaStreamLL.BasicConfig.Env | A list of `=` pairs, separated by `;` | +' ' + |KafkaStreamLL.BasicConfig.LogConfig.LogFileMaxSizeMB |The maximum size in megabytes of the log file before it gets rotated |`100` @@ -1017,19 +1032,19 @@ concurrent queries in the delay queue |`20` |Name |Description |Example | Security.JWT.RSA.PublicKey -| Configure a RSA public key for xref:tigergraph-server:user-access:jwt-token.adoc[]. +| Configure a RSA public key for xref:user-access:jwt-token.adoc[]. | `gadmin config set Security.JWT.RSA.PublicKey ` | Security.JWT.HMAC.Secret -| Configure a HMAC Secret for xref:tigergraph-server:user-access:jwt-token.adoc[]. +| Configure a HMAC Secret for xref:user-access:jwt-token.adoc[]. | `gadmin config set Security.JWT.HMAC.Secret ` | Security.JWT.Issuer -| Configure the `iss` claim that will be verified against this configured value for xref:tigergraph-server:user-access:jwt-token.adoc[]. +| Configure the `iss` claim that will be verified against this configured value for xref:user-access:jwt-token.adoc[]. | `gadmin config set Security.JWT.Issuer ""` | Security.JWT.Audience -| Configure this JWT Token authentication to verify if the `aud` (recipient for which the JWT is intended) defined in the JWT Token matches the configured one or not for xref:tigergraph-server:user-access:jwt-token.adoc[]. +| Configure this JWT Token authentication to verify if the `aud` (recipient for which the JWT is intended) defined in the JWT Token matches the configured one or not for xref:user-access:jwt-token.adoc[]. | `gadmin config set Security.JWT.Audience ""` |Security.LDAP.AdminDN |Configure the DN of LDAP user who has read @@ -1276,7 +1291,7 @@ primary cluster’s IPs, separator by `,' |`nan` primary cluster’s KafkaPort |`30002` |System.CrossRegionReplication.TopicPrefix |The prefix of GPE/GUI/GSQL Kafka Topic, by default is empty. -For details of what it does and how it is used, please refer to xref:tigergraph-server:cluster-and-ha-management:set-up-crr.adoc[].|`nan` +For details of what it does and how it is used, please refer to xref:cluster-and-ha-management:set-up-crr.adoc[].|`nan` |System.DataRoot |The root directory for data @@ -1326,7 +1341,7 @@ interval (s) |`60` |System.Metrics.IncludeHostName | If set to true, the hostname/ip will be included in all metrics output, in OpenMetrics format, as part of the variable labels. -Otherwise, the default is `false` and the response will not include hostname/ip as part of the variable labels. As in the example xref:tigergraph-server:API:built-in-endpoints.adoc#_monitor_system_metrics_openmetrics_format[Monitor system metrics (OpenMetrics format)] +Otherwise, the default is `false` and the response will not include hostname/ip as part of the variable labels. As in the example xref:API:built-in-endpoints.adoc#_monitor_system_metrics_openmetrics_format[Monitor system metrics (OpenMetrics format)] | `System.Metrics.IncludeHostName true` @@ -1482,7 +1497,7 @@ If you use `gadmin config set GPE.BasicConfig.Env` or `gadmin config entry GPE.B |RESPP | `SSL_CA_CERT`, RESETPP.BasicConfig.Env -| Set the CA certificate `SSL_CA_CERT` to establish the connection with the URL being set with xref:tigergraph-server:user-access:jwt-token.adoc[]. +| Set the CA certificate `SSL_CA_CERT` to establish the connection with the URL being set with xref:user-access:jwt-token.adoc[]. | `SSL_CA_CERT=/home/tigergraph/cacertificate/example/;` |GPE diff --git a/modules/reference/pages/glossary.adoc b/modules/reference/pages/glossary.adoc index 27b8b4fd..21e33411 100644 --- a/modules/reference/pages/glossary.adoc +++ b/modules/reference/pages/glossary.adoc @@ -18,8 +18,10 @@ | *gbar* | Graph Backup and Restore. TigerGraph's utility program for backing up and restoring system data. + Since 3.10.0 the command `gbar` is removed and is no longer available. -However, if you are using a version of TigerGraph before 3.10.0 you can still use `gbar` to xref:tigergraph-server:backup-and-restore:gbar-legacy.adoc[create a backup with gbar] of the primary cluster. +It has been replaced with `gadmin backup`. + | *GPE* | Graph Processing Engine. The server component which accepts requests from the REST{pp} server for querying and updating the graph store and which returns data. diff --git a/modules/reference/pages/list-of-privileges.adoc b/modules/reference/pages/list-of-privileges-legacy.adoc similarity index 84% rename from modules/reference/pages/list-of-privileges.adoc rename to modules/reference/pages/list-of-privileges-legacy.adoc index 3010aab2..912b4b55 100644 --- a/modules/reference/pages/list-of-privileges.adoc +++ b/modules/reference/pages/list-of-privileges-legacy.adoc @@ -1,9 +1,20 @@ -= List of Legacy Privilege Syntax += Legacy RBAC Privileges +:description: List of the original RBAC privileges, which are now deprecated +:page-aliases: list-of-privileges.adoc -This page provides a complete list of privileges in TigerGraph's Role-based Access Control system. +This page lists the privileges in TigerGraph's Legacy Role-Based Access Control system. +In version 3.10, xref:user-access:rbac-row-policy/rbac-row-policy.adoc#_object_based_privileges[Object-Based Privileges] were introduced to eventually replace the original (legacy) RBAC system. +See xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[] for a comparison with the legacy syntax. +[WARNING] +==== +The legacy system of privileges is deprecated. +Users should move to the new object-based privileges. +==== + +[#_legacy_privilege_syntax_limitations] +== Legacy Privilege Limitations -== Legacy Privilege Syntax Limitations * Any privilege marked "`Global only`" can only be granted to a global role. It cannot be granted to a local role (See xref:user-access:access-control-model.adoc[Global role vs local role]). @@ -14,15 +25,9 @@ It cannot be granted to a local role (See xref:user-access:access-control-model. * Legacy privilege syntax for function privileges is only supported on the global scope. ** To add function privileges, it's best to use the Object-Based Privileges syntax. -[NOTE] -==== -It’s recommended to use the xref:tigergraph-server:user-access:rbac-row-policy/rbac-row-policy.adoc#_object_based_privileges[Object-Based Privileges] syntax. -See xref:tigergraph-server:user-access:rbac-row-policy/row-policy-privileges-table.adoc[] for a comparison with the legacy syntax. -==== - +== Table of Legacy RBAC Privileges -== Table of Privileges [width="100%",cols="22%,63%,15%",options="header",] |=== |*Privilege Name* |*Commands Associated* |*Global Only* diff --git a/modules/reference/pages/patents-and-third-party-software.adoc b/modules/reference/pages/patents-and-third-party-software.adoc index 88ded96a..6e8e56eb 100644 --- a/modules/reference/pages/patents-and-third-party-software.adoc +++ b/modules/reference/pages/patents-and-third-party-software.adoc @@ -1,5 +1,5 @@ = Patents and Third Party Software -//:page-aliases: tigergraph-server:legal:patents-and-third-party-software.adoc +//:page-aliases: legal:patents-and-third-party-software.adoc Patent and Third Party Notice for TigerGraph Platform diff --git a/modules/reference/pages/return-codes.adoc b/modules/reference/pages/return-codes.adoc index 4a23ffa8..afe0f64b 100644 --- a/modules/reference/pages/return-codes.adoc +++ b/modules/reference/pages/return-codes.adoc @@ -145,10 +145,10 @@ Other RESTPP errors. | The parameter is invalid (general error). | `REST-30200` -| The parameter for xref:tigergraph-server:API:upsert-rest.adoc[upserting data] is invalid. +| The parameter for xref:API:upsert-rest.adoc[upserting data] is invalid. | `REST-30400` -| The parameter for xref:tigergraph-server:API:built-in-endpoints.adoc#_show_query_performance[showing query performance] is invalid. +| The parameter for xref:API:built-in-endpoints.adoc#_show_query_performance[showing query performance] is invalid. |=== === GSQL diff --git a/modules/release-notes/pages/index.adoc b/modules/release-notes/pages/index.adoc index 88e4670a..6a11f7d7 100644 --- a/modules/release-notes/pages/index.adoc +++ b/modules/release-notes/pages/index.adoc @@ -6,6 +6,8 @@ :toc: :toclevels:2 +TigerGraph Server 3.11.1 was released on December 18, 2024. + TigerGraph Server 3.11.0 preview version was released on October 25, 2024. Features in the preview stage should not be used for production purposes. General Availability (GA) versions of the feature will be available in a later release. @@ -21,9 +23,11 @@ Features in the preview stage should not be used for production purposes. Genera * **xref:3.11@tigergraph-server:backup-and-restore:point-in-time-restore.adoc[Point in Time Restore]**: Users can roll back the database to a moment they select, not only the time of an available backup snapshots. -* **xref:3.11@tigergraph-server:backup-and-restore:configurations.adoc#_configure_backup_to_aws_s3_endpoint[Role ARN for Backup to AWS S3 buckets]**: +* **xref:3.11@tigergraph-server:backup-and-restore:configurations.adoc#_configuration_parameters[Role ARN for Backup to AWS S3 buckets]**: Users can use AWS Role ARNs (Amazon Resource Names) for convenient and secure management of backups. +* [3.11.1] **xref:3.11@tigergraph-server:backup-and-restore:configurations.adoc[Backup to Azure and Google clouds]**: Backup operations can be configured to store files on Azure and Google cloud storage, adding to existing support for AWS backup storage. + * The **xref:3.11@tigergraph-server:backup-and-restore:backup-cluster.adoc#_data_backup[--custom-tag flag]** gives users more control over the naming of backup files. * See also xref:_high_availability_ha_enhancements[] @@ -110,6 +114,10 @@ Fine tune TigerGraph operator's performance by customizing the maximum number of ==== +=== Language Enhancements + +* Added the ability to set xref:3.11@gsql-ref:querying:data-types.adoc#_set_s3_connection_credentials[AWS S3 access credentials] as GSQL session parameters, enabling users to configure S3 query output without requiring admin privileges. + == TigerGraph Suite === GraphStudio @@ -152,6 +160,33 @@ for more context-aware and interactive dashboard displays == Fixed issues + +=== Fixed and Improved in 3.11.1 +// +==== Functionality + +* Fixed issue where the loading job would hang when loading a blank file with only a header line (TP-6635). +* Fixed issue where the job status was not correctly reported when the loading job failed to start (TP-6131). +* Fixed issue with query installation failure for single-node queries which initialize vertex set variables in conditional branches (GLE-8846). +* Fixed issue with error codes in the log when a CDC message failed to deliver to external Kafka (CORE-4326). +* Fixed critical disk issue caused by the rebuilder getting stuck in a partitioned cluster after dropping vertex or edge attributes (CORE-4357). +* Fixed issue where local accumulators defined across multiple lines in a query were misinterpreted as a file in the GSQL client (GLE-8260). +* Fixed issue with loading job progress requiring a read graph lock, which could block schema change operations (GLE-8825). + +==== Improvements + +* Improved the performance of GSQL queries containing delete statements intended for deleting all vertices of a given type (GLE-8931). +* Added validation to prevent EXE from reading files with negative length and enforced gRPC message maximum size when the length was too large (TP-6764). +* Added the 'graph' field to CDC messages generated by the TigerGraph CDC service (CORE-4146). + +==== Security + +* Third-party Vulnerabilities NOT impacting TigerGraph: ++ +Fixed the following security vulnerabilities: CVE-2023-44981, CVE-2024-43382, CVE-2024-8184, CWE-311, CWE-400, and CWE-639. ++ +Third-party Vulnerability impacting TigerGraph: None + === Fixed and Improved in 3.11.0 // ==== Functionality @@ -162,8 +197,10 @@ for more context-aware and interactive dashboard displays * Fixed situation where a query containing a `BREAK` or `CONTINUE` statements could produce incorrect results (GLE-7874). * Fixed regression problem with installing queries which create lists containing mixed types of numeric data (GLE-7928). * Fixed int64 value underflow error by explicitly type casting uint64 (CORE-4108). +* Fixed an issue so IMPORT ALL will no longer fail due to the schema size being very large (GLE-6505). * Restored the ability to run the TigerGraph `gcollect` command on Kubernetes (TP-6351). + //==== Security ==== Crashes and Deadlocks @@ -176,23 +213,13 @@ for more context-aware and interactive dashboard displays * Added an error report if a schema check is requested but cannot be performed because the GPE is in warmup status (GLE-7898). * Allowed installation to continue on Oracle and RedHat Linux, 8 even if the TigerGraph user is not listed in AllowedUsers in /etc/ssh/sshd_config (TP-5105). -=== Fixed and Improved in 3.11.1 -// -==== Functionality - -* Fixed issue where the loading job would hang when loading a blank file with only a header line (TP-6635). -* Fixed issue where the job status was not correctly reported when the loading job failed to start (TP-6131). -* Fixed issue with query installation failure for single-node queries which initialize vertex set variables in conditional branches (GLE-8846). -* Fixed issue with error codes in the log when a CDC message failed to deliver to external Kafka (CORE-4326). -* Fixed critical disk issue caused by the rebuilder getting stuck in a partitioned cluster after dropping vertex or edge attributes (CORE-4357). -* Fixed issue where local accumulators defined across multiple lines in a query were misinterpreted as a file in the GSQL client (GLE-8260). -* Fixed issue with loading job progress requiring a read graph lock, which could block schema change operations (GLE-8825). +==== Security -==== Improvements - -* Improved the performance of GSQL queries containing delete statements intended for deleting all vertices of a given type (GLE-8931). -* Added validation to prevent EXE from reading files with negative length and enforced gRPC message maximum size when the length was too large (TP-6764). -* Added the 'graph' field to CDC messages generated by the TigerGraph CDC service (CORE-4146). +* Third-party Vulnerabilities NOT impacting TigerGraph: ++ +Fixed the following security vulnerabilities: CVE-2017-5645, CVE-2019-10202, CVE-2019-10172, CVE-2021-44228, CVE-2021-45046, CVE-2021-45105, CVE-2022-41723, CVE-2022-45685, CVE-2023-1346, CVE-2023-44487, CVE-2023-45288, CVE-2023-50387, CVE-2023-50868, CVE-2023-52428, CVE-2024-1597, CVE-2024-25638, CVE-2024-25710, CVE-2024-29131, CVE-2024-29133, CVE-2024-47561, and CVE-2024-7254. ++ +Third-party Vulnerability impacting TigerGraph: None //==== Performance @@ -202,6 +229,11 @@ for more context-aware and interactive dashboard displays |=== ¦ Description ¦ Found In ¦ Workaround ¦ Fixed In +¦ When xref:{page-component-version}@installation:upgrade.adoc#_upgrading_from_3_x[upgrading], possible permission error for destination folder. +¦ 3.10.1 +¦ Manually grant permission to `/var/tmp` +¦ TBD + ¦Running either `EXPORT GRAPH ALL` or `IMPORT GRAPH ALL` resets the TigerGraph superuser's password back to its default value. ¦3.9.1 ¦ After running either command, change the superuser's password to make it secure again. @@ -212,16 +244,6 @@ for more context-aware and interactive dashboard displays ¦ ¦TBD -¦When using xref:{page-component-version}@tigergraph-server:backup-and-restore:database-import-export.adoc[IMPORT ALL] if a users schema size in the `.zip` file is exceedingly large, the import may fail with an error messages like this: - -`Large catalog file key: /1/ReplicaList.json` - -¦ 3.2 -a¦ -* 3.9 and below users need to run the import process manually by executing the GSQL scripts in the `.zip`. -* 3.10.0 and above users should xref:{page-component-version}@tigergraph-server:backup-and-restore:single-graph-import-export.adoc[import single or smaller batches of multiple graphs]. -¦ TBD - a¦ If importing a role, policy, or function that has a different signature or content from the existing one, the one being imported will be skipped and not aborted. .For example: @@ -413,10 +435,16 @@ Then the global loading process will exit and fail the local job after timeout w === Compatibility Issues -[cols="2", separator=¦ ] +[cols="2,1", separator=¦ ] |=== ¦ Description ¦ Version Introduced +¦The 'graph' field is now included in xref:system-management:change-data-capture/cdc-message-example.adoc#_message_examples[CDC messages] generated by the TigerGraph CDC service. +¦v3.11.0 + +¦In xref:system-management:change-data-capture/cdc-message-example.adoc#_message_examples[CDC messages], the format of *map values* has changed. +¦v3.11.0 + ¦ Users could encounter file input/output policy violations when upgrading a TigerGraph version. See xref:{page-component-version}@tigergraph-server:security:gsql-file-input-policy.adoc#_backward_compatibility[Input policy backward compatibility.] ¦ v3.10.0 @@ -453,24 +481,24 @@ a¦ Some user-defined functions (UDFs) may no longer be accepted due to xref:{pa |=== ¦ Description ¦ Deprecated ¦ Removed -¦ The use of plaintext tokens in xref:{page-component-version}@tigergraph-server:API:authentication.adoc[authentication] is deprecated. -Use xref:{page-component-version}@tigergraph-server:user-access:jwt-token.adoc[] instead. +¦ The use of plaintext tokens in xref:3.11@tigergraph-server:API:authentication.adoc[authentication] is deprecated. +Use xref:3.11@tigergraph-server:user-access:jwt-token.adoc[] instead. ¦ 3.10.0 ¦ 4.1 ¦ The command `gbar` is removed and is no longer available. -However, if you are using a version of TigerGraph before 3.10.0 you can still use `gbar` to xref:{page-component-version}@tigergraph-server:backup-and-restore:gbar-legacy.adoc[create a backup with gbar] of the primary cluster. -See also xref:{page-component-version}@tigergraph-server:backup-and-restore:gbar-legacy.adoc[Backup and Restore with gbar] on how to create a backup. +It has been replaced by `gadmin backup`. +If you are using a version of TigerGraph before 3.10.0, you can use `gbar` to create a backup file and restore it in 3.11 with `gadmin backup restore`. ¦ 3.7 ¦ 3.10.0 -¦ xref:{page-component-version}@tigergraph-server:user-access:vlac.adoc[Vertex-level Access Control (VLAC)] and xref:{page-component-version}@gsql-ref:querying:func/vertex-methods.adoc#_vlac_vertex_alias_methods_deprecated[VLAC Methods] are now deprecated and will no longer be supported. +¦ xref:3.11@tigergraph-server:user-access:vlac.adoc[Vertex-level Access Control (VLAC)] and xref:3.11@gsql-ref:querying:func/vertex-methods.adoc#_vlac_vertex_alias_methods_deprecated[VLAC Methods] are now deprecated and will no longer be supported. ¦ 3.10.0 ¦ 4.0 -¦ xref:{page-component-version}@tigergraph-server:data-loading:spark-connection-via-jdbc-driver.adoc[Spark Connection via JDBC Driver] is now deprecated and will no longer be supported. +¦ xref:3.11@tigergraph-server:data-loading:spark-connection-via-jdbc-driver.adoc[Spark Connection via JDBC Driver] is now deprecated and will no longer be supported. ¦ 3.10.0 ¦ TBD @@ -491,5 +519,3 @@ we are focusing on xref:{page-component-version}@insights:widgets:index.adoc[Ins == Release notes for previous versions * xref:3.10@tigergraph-server:release-notes:index.adoc[Release notes - TigerGraph 3.10] -* xref:3.9@tigergraph-server:release-notes:index.adoc[Release notes - TigerGraph 3.9] -* xref:3.6@tigergraph-server:release-notes:index.adoc[Release notes - TigerGraph 3.6] diff --git a/modules/release-notes/pages/v3.0-removal-of-previously-deprecated-features.adoc b/modules/release-notes/pages/v3.0-removal-of-previously-deprecated-features.adoc deleted file mode 100644 index 42d64ed7..00000000 --- a/modules/release-notes/pages/v3.0-removal-of-previously-deprecated-features.adoc +++ /dev/null @@ -1,176 +0,0 @@ -=== V3.0 Removal of Previously Deprecated Features - -TigerGraph 2.x contained some features which were labeled as deprecated. -These features are no longer necessary because they have been superseded already by improved approaches for using the TigerGraph platform. - -The new approaches were developed because they use more consistent grammar, are more extensible, or offer higher performance. -Therefore, TigerGraph 3.0 and above has streamlined the product by removing support for some of these deprecated features, listed below: - -==== Data Types - -|=== -| Deprecated type | Alternate approach - -| `REAL` -| Use `FLOAT` or `DOUBLE` - -| `INT_SET` -| Use `SET` - -| `INT_LIST` -| Use `LIST` - -| `STRING_SET_COMPRESS` -| Use `SET` - -| `STRING_LIST_CONPRESS` -| Use `LIST` - -| `UINT_SET` -| Use `SET` - -| `UINT32_UINT32_KV_LIST` -| Use `MAP` - -| `INT32_INT32_KV_LIST` -| Use `MAP` - -| `UINT32_UDT_KV_LIST` -| Use `MAP`, where `UDT_type` is a user-defined tuple type - -| `INT32_UDT_KV_LIST` -| Use `MAP`, where `UDT_type` is a user-defined tuple type -|=== - -==== Syntax for Control Flow Statements - - - -|=== -|Deprecated statement |Alternate statement - -|`FOREACH ... DO ... DONE` -|`FOREACH... DO... END` - -a| ----- -FOREACH (condition) { - body -} ----- -a| ----- -FOREACH condition DO - body -END ----- - -a| ----- -IF (condition) { - body1 -} -else { - body2 -} ----- -a| ----- -IF condition THEN - body1 -ELSE - body2 -END ----- -a| ----- -WHILE (condition) { - body -} ----- -a| ----- -WHILE condition DO - body -END ----- -|=== - - - -==== Vertex set variable declaration - -See xref:3.11@gsql-ref:querying:declaration-and-assignment-statements.adoc#_vertex_set_variables[Vertex Set Variable Declaration and Assignment] - -If a vertex type is specified, the vertex type must be within parentheses. - -|=== -| Deprecated Statement | Alternate Statement - -| `MySet Person = ...` -| `MySet (Person) = ...` -|=== - - -==== Query, Job, and Token Management - - - -|=== -|Deprecated operation |Header 2 - -|`CREATE JOB` -a|Job types need to be specified: - -* `CREATE LOADING JOB` -* `CREATE SCHEMA_CHANGE JOB` -* `CREATE GLOBAL SCHEMA_CHANGE JOB` - -|`RUN JOB` -a|Job types need to be specified: - -* `RUN LOADING JOB` -* `RUN SCHEMA_CHANGE JOB` -* `RUN GLOBAL SCHEMA_CHANGE JOB` - -|`CREATE / SHOW/ REFRESH TOKEN` -|To create a token, use the xref:3.11@tigergraph-server:API:built-in-endpoints.adoc#_request_a_token[REST endpoint GET /requesttoken]. - -|`offline2online` -|The offline loading job mode was discontinued in v2.0. -Do not write loading jobs using this syntax. -|=== - - - -==== Output - -See xref:3.11@gsql-ref:querying:output-statements-and-file-objects.adoc#_print_statement_api_v2[PRINT Statement] - -|=== -| Deprecated Syntax | Alternate Syntax - -| JSON API v1 -| v2 has been the default JSON format since TigerGraph 1.1. No alternate JSON version will be available. - -| `PRINT ... TO_CSV [filepath]` -| Define a file object, then `PRINT ... TO_CSV [file_object]` -|=== - - -==== Built-in Queries - -[Run Built-in Queries in 'GSQL 101'] - - -|=== -|Deprecated statement |Alternate statement - -|`SELECT count() FROM ...` -a| -* `SELECT approx_count(*) FROM ...` -** May not include all the latest data updates -* `SELECT count(*) FROM ...` -** exact, but slower than `approx_count(*)` -|=== - diff --git a/modules/security/pages/gsql-file-input-policy.adoc b/modules/security/pages/gsql-file-input-policy.adoc index 1f23a518..1d647cc8 100644 --- a/modules/security/pages/gsql-file-input-policy.adoc +++ b/modules/security/pages/gsql-file-input-policy.adoc @@ -18,7 +18,7 @@ gadmin config apply -y gadmin restart gsql restpp -y ---- -Similar to the xref:tigergraph-server:security:file-output-policy.adoc[] the value for the config is a list of strings that start with either `!/` for the `blocklist` or `/` for the `allowlist`. +Similar to the xref:security:file-output-policy.adoc[] the value for the config is a list of strings that start with either `!/` for the `blocklist` or `/` for the `allowlist`. == Format diff --git a/modules/security/pages/index.adoc b/modules/security/pages/index.adoc index e34d03cc..866a16b2 100644 --- a/modules/security/pages/index.adoc +++ b/modules/security/pages/index.adoc @@ -12,11 +12,11 @@ TigerGraph provides a comprehensive set of security features, including authenti * xref:user-access:enabling-user-authentication.adoc#_enable_restpp_authentication[RESTPP authentication] * xref:user-access:enabling-user-authentication.adoc#_enable_gsql_authentication[GSQL authentication] -* xref:user-access:sso.adoc[SSO with SAML 2.0] -** xref:user-access:sso.adoc#_azure_ad[Azure AD] -** xref:user-access:sso.adoc#_okta[Okta] -** xref:user-access:sso.adoc#_auth0[Auth0] -** xref:user-access:sso.adoc#_pingfederate[PingFederate] +* xref:user-access:sso-with-saml.adoc[SSO with SAML 2.0] +** xref:user-access:sso-with-saml.adoc#_azure_ad[Azure AD] +** xref:user-access:sso-with-saml.adoc#_okta[Okta] +** xref:user-access:sso-with-saml.adoc#_auth0[Auth0] +** xref:user-access:sso-with-saml.adoc#_pingfederate[PingFederate] * xref:user-access:ldap.adoc[LDAP authentication] * xref:password-policy.adoc[Strong password policy enforcement](TigerGraph 3.7+) @@ -30,7 +30,6 @@ TigerGraph provides a comprehensive set of security features, including authenti * xref:encrypting-connections.adoc[Data in-transit encryption (TLS 1.2)] * xref:encrypting-data-at-rest.adoc[Data at-rest encryption] * <<_pgp_key,PGP key-signed download package>> -* xref:tigergraph-server:data-loading:kafka-ssl-security-guide.adoc[] == Operational compliance TigerGraph Server meets the following security compliance standards as certified by third-party audits: @@ -72,7 +71,7 @@ You can also find our key on link:https://pgp.mit.edu[]. Available in TigerGraph 3.9+ -TigerGraph users can upload xref:gsql-ref:querying:func/query-user-defined-functions.adoc[user-defined function (UDF) files] to the server and run them as part of a query or loading job. +TigerGraph users can upload xref:{page-component-version}@gsql-ref:querying:func/query-user-defined-functions.adoc[user-defined function (UDF) files] to the server and run them as part of a query or loading job. In order to prevent security issues with code execution, TigerGraph Server disables this ability by default and requires it to be enabled manually by an administrator. In addition, the UDF files are scanned to make sure they comply with the file policy. The scanning process, by default, consists of three parts. diff --git a/modules/system-management/pages/change-data-capture/cdc-message-example.adoc b/modules/system-management/pages/change-data-capture/cdc-message-example.adoc index 8997bc63..7d6f82ac 100644 --- a/modules/system-management/pages/change-data-capture/cdc-message-example.adoc +++ b/modules/system-management/pages/change-data-capture/cdc-message-example.adoc @@ -92,7 +92,7 @@ It can be one of: `"Overwrite"`, `"Add"`, `"Max"`, `"Min"`, `"And"`, `"Or"`, `"I [NOTE] .Content format change ==== -Beginning with version 3.11, the format for the value of the `content` message field has been revised to be consistent with the xref:tigergraph-server:API:index.adoc#_formatting_data_in_json[POST request data format]. +Beginning with version 3.11, the format for the value of the `content` message field has been revised to be consistent with the xref:API:index.adoc#_formatting_data_in_json[POST request data format]. For example: * Updated format: `{ "keylist":["k1", "k2"], "valuelist": [10.5, 24.8] }` @@ -287,11 +287,21 @@ In such cases, TigerGraph CDC will generate an additional CDC message for the co |=== ¦ Case ¦ Description -¦ For directed edge without reverse edge type -¦ For insertion or modification on directed edge type without a reverse edge type, TigerGraph CDC will generate an extra CDC message with a field "operator": "insert-only" for a target vertex, however, there is no CDC message for source vertex. +¦ For directed edge without reverse edge type ("Simple Edge Type") +¦ When a directed edge is inserted without a reverse edge type, TigerGraph automatically creates the source and/or target vertex if they do not exist. -¦ For undirected edge, and directed edge with reverse edge type -¦ For insertion/modification/deletion on an undirected edge, or directed edge with a reverse edge type, TigerGraph will update 2 edges simultaneously: +CDC messages are generated: + +- For the source vertex: Only if it is new and has `primary_id_as_attribute="true"`. + +- For the target vertex: Always if it is new. + +¦ For undirected edge, and directed edge with reverse edge type ("Complex Edge Type") +¦For insertion/modification/deletion on an undirected edge, or directed edge with a reverse edge type, TigerGraph will update 2 edges simultaneously: the “origin” edge and the “extra” edge with switched source and target vertex. -|=== \ No newline at end of file +When Undirected Edge or Directed Edge is inserted With a Reverse Edge Type, TigerGraph automatically creates the source and/or target vertex if they don’t exist: + +- **CDC messages** are generated for both source and target vertices only if they are new and have `primary_id_as_attribute="true"`. + +|=== diff --git a/modules/system-management/pages/change-data-capture/cdc-overview.adoc b/modules/system-management/pages/change-data-capture/cdc-overview.adoc index 37005fbe..0541ec7c 100644 --- a/modules/system-management/pages/change-data-capture/cdc-overview.adoc +++ b/modules/system-management/pages/change-data-capture/cdc-overview.adoc @@ -8,14 +8,14 @@ The Change Data Capture (CDC) equips TigerGraph users with the capability to aut * Maintains sequence of changes to facilitate reproduction of data updates for debugging. * Structured in JSON format, promoting readability and compatibility with third-party tools. -== xref:tigergraph-server:system-management:change-data-capture/cdc-setup.adoc[] -Learn about xref:tigergraph-server:system-management:change-data-capture/cdc-setup.adoc#_setup_configuration[setup configurations] and get started with the xref:tigergraph-server:system-management:change-data-capture/cdc-setup.adoc#_setup_tutorial[setup tutorial]. +== xref:change-data-capture/cdc-setup.adoc[] +Learn about xref:change-data-capture/cdc-setup.adoc#_setup_configuration[setup configurations] and get started with the xref:change-data-capture/cdc-setup.adoc#_setup_tutorial[setup tutorial]. -== xref:tigergraph-server:system-management:change-data-capture/cdc-message-example.adoc[] -Deep dive into the xref:tigergraph-server:system-management:change-data-capture/cdc-message-example.adoc#_message_format[CDC messages format] and showcased xref:tigergraph-server:system-management:change-data-capture/cdc-message-example.adoc#_message_examples[message examples]. +== xref:change-data-capture/cdc-message-example.adoc[] +Deep dive into the xref:change-data-capture/cdc-message-example.adoc#_message_format_fields[CDC messages format] and showcased xref:change-data-capture/cdc-message-example.adoc#_message_examples[message examples]. -== xref:tigergraph-server:system-management:change-data-capture/cdc-state-monitoring.adoc[] -Here users can delve into xref:tigergraph-server:system-management:change-data-capture/cdc-state-monitoring.adoc#_state_monitoring[state monitoring], including xref:tigergraph-server:system-management:change-data-capture/cdc-state-monitoring.adoc#_state_of_dim_service[DIM state monitoring] with the CDC service. +== xref:change-data-capture/cdc-state-monitoring.adoc[] +Here users can delve into xref:change-data-capture/cdc-state-monitoring.adoc#_state_monitoring[state monitoring], including xref:change-data-capture/cdc-state-monitoring.adoc#_state_of_dim_service[DIM state monitoring] with the CDC service. == CDC Reaction to Other Features @@ -27,7 +27,7 @@ When that happens, TigerGraph CDC will skip all historical data updates. Some commands will call `gadmin reset gpe` implicitly, so the CDC will reset simultaneously with these commands: * `gadmin backup` and `gadmin restore` -* node xref:tigergraph-server:cluster-and-ha-management:expand-a-cluster.adoc[expansion] and xref:tigergraph-server:cluster-and-ha-management:shrink-a-cluster.adoc[shrink]. +* node xref:cluster-and-ha-management:expand-a-cluster.adoc[expansion] and xref:cluster-and-ha-management:shrink-a-cluster.adoc[shrink]. * gsql command: `clear graph store` * gsql command: `drop all` * gsql command: `import graph all` @@ -51,6 +51,16 @@ If you wish to disable CDC High Availability (HA) in a multi-replica cluster and == CDC Limitations +[WARNING] +==== +The current version of CDC is unstable when: + +* The schema has edges with discriminators AND +* A DELETE operation is performed on edges without specifying a discriminator. + +If CDC is being used, avoid performing this type of unrestricted deletion. +==== + === Limitation on CDC Setup ==== Not applicable on DR cluster @@ -61,9 +71,10 @@ DR clusters are replicas, not sources. When distinguishing between modification and insertion for vertex/edge attribute modification, the TigerGraph CDC message will have the `"operator": "insert"` key value pair, same as vertex/edge insertion. However, the `"content"` will only contain the field for the modified attribute. -==== No CDC message for implicit edge deletion -When a vertex is deleted, any edge that uses the vertex as source or target will be implicitly deleted. -However, TigerGraph CDC currently does not generate a CDC message for such “implicit edge deletion”. +==== No CDC message for implicit vertex insertion without `primary_id_as_attribute="true"` +For insertion on undirected edge, or directed edge with reverse edge type, the TigerGraph database will implicitly insert source and target vertex if it does not exist (This behavior can be configured via `VERTEX_MUST_EXIST` in a loading job and `POST` data api). + +In this scenario, TigerGraph CDC will generate CDC messages with `"operator": "insert-only"` for new source and new target vertex, unless the corresponding vertex type does not have `primary_id_as_attribute="true"` (See the section “Extra CDC message for Edge Update“ ). ==== No CDC message for implicit source vertex insertion For insertion/modification on undirected edge, or directed edge with reverse edge type, the TigerGraph database will implicitly insert source and target vertex if it does not exist (This behavior can be configured via `VERTEX_MUST_EXIST` in a loading job and `POST` data api). diff --git a/modules/system-management/pages/change-data-capture/cdc-restore-by-backup.txt b/modules/system-management/pages/change-data-capture/cdc-restore-by-backup.txt index 428885e2..4d80da98 100644 --- a/modules/system-management/pages/change-data-capture/cdc-restore-by-backup.txt +++ b/modules/system-management/pages/change-data-capture/cdc-restore-by-backup.txt @@ -9,7 +9,7 @@ The steps are: . Setup a backup. . Restore a backup. + -(See the documentation xref:tigergraph-server:backup-and-restore:index.adoc[] for more details.) +(See the documentation xref:backup-and-restore:index.adoc[] for more details.) . Users will need to write their own script to replay TigerGraph CDC messages. To determine whether a CDC message needs to be replayed or not. Please check for the TigerGraph backup (`gadmin backup create`), find a file metadata in the generated backup folder, such as `/tmp/backup/backup-2024-01-04T231921`: diff --git a/modules/system-management/pages/change-data-capture/cdc-setup.adoc b/modules/system-management/pages/change-data-capture/cdc-setup.adoc index 793f1d20..e81daacb 100644 --- a/modules/system-management/pages/change-data-capture/cdc-setup.adoc +++ b/modules/system-management/pages/change-data-capture/cdc-setup.adoc @@ -4,60 +4,58 @@ When using the CDC feature, a “CDC service” running in GPE nodes will proces [IMPORTANT] ==== -Users are required to establish and manage an external Kafka service independently from TigerGraph. -The TigerGraph CDC service will then generate CDC messages directed to the external Kafka service. -For guidance on setting up the external Kafka service, refer to the https://kafka.apache.org/quickstart[Official Apache Kafka documentation]. +* **External Kafka Cluster** : Users must set up and manage their own Kafka cluster. The TigerGraph CDC service will send CDC messages to this external Kafka cluster. +* For guidance on setting up the external Kafka service, refer to the https://kafka.apache.org/quickstart[Official Apache Kafka documentation]. ==== == Setup Configuration TigerGraph employs librdkafka 1.1.0 for the Kafka producer in the CDC service. Refer to the Global configuration properties and the Topic configuration properties sections of the librdkafka documentation for all other properties not mentioned in this guide, noting the applicable ones marked with “P”(Producer) or with “*” for both Producer and Consumer. -To configure CDC Producer and Topic settings in TigerGraph, utilize gadmin commands below. +=== Configuring CDC Producer and Topic Settings -[NOTE] -==== -If you use `gadmin config` to change any of these parameters, you need to run `gadmin config apply -y` for it to take effect. -You can change multiple parameters and then run `gadmin config apply` for all of them together +Use the following `gadmin` commands to configure the CDC producer and topic settings in TigerGraph. + +=== Applying Configuration Changes -.After modifying, run the following command to apply the changes: +* If you modify any configuration using gadmin config, ensure to run the following commands to apply the changes: [source, console] ---- gadmin config apply gadmin restart gpe restpp ---- -==== == CDC Configuration Parameters === System.CDC.Enable -.This is the CDC enable config entry: -[source, console] ----- -gadmin config entry ----- +Controls whether CDC is enabled or disabled. + +Enable or disable CDC using: -=== System.CDC.Enable -.Set it to true or false, to enable CDC or not as in these examples: [source, console] ---- -gadmin config set System.CDC.Enable true -gadmin config set System.CDC.Enable false +gadmin config set System.CDC.Enable true // To enable CDC +gadmin config set System.CDC.Enable false //To disable CDC ---- -CDC messages will only generate after the CDC is enabled and services have been applied and restarted. +NOTE: CDC messages are generated only after enabling the CDC service and restarting the system. === System.CDC.ProducerConfig -.This is the CDC producer config entry: -[console] ----- -gadmin config entry System.CDC.ProducerConfig ----- -This configuration entry is designated for the CDC producer. -Properties are passed through a file, with each line adhering to the format `=` separated by “new line”. +Specifies properties for the CDC producer. + +To update properties non-interactively, create a file (e.g., `cdc_producer_config`) where each line has the +format `=` separated by “new line”. +Then use + +`gadmin config set System.CDC.ProducerConfig @` + +to read in the settings. + +IMPORTANT: It is mandatory to include the property `bootstrap.servers` , which specifies the IP and port of the external Kafka cluster for CDC. + +Example: -It is mandatory to include the property `bootstrap.servers`, specifying the IP and port for the broker(s) that the CDC producer connects to as in the example below: [console] ---- mkdir -p /home/tigergraph/test_cdc @@ -68,22 +66,83 @@ echo -e "bootstrap.servers=$(gmyip):9092\nenable.idempotence=true" > /home/tiger gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config ---- -=== System.CDC.TopicConfig -.This is the CDC topic config entry: + + +If you prefer to enter the property values interactively, use + +`gadmin config entry System.CDC.ProducerConfig` + +This will walk you through the full set of properties for this component, with a description and the current value for each item. + +The full list of entries for configuration file of `System.CDC.ProducerConfig` can be viewed in the table "Global configuration properties" at https://github.com/confluentinc/librdkafka/blob/v1.1.0/CONFIGURATION.md#global-configuration-properties. + +NOTE: Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable. + +==== Kafka Security + +To secure communication with the external Kafka cluster for CDC, configure authentication settings in System.CDC.ProducerConfig. + +[NOTE] +==== +1. When using a local path in any entry (e.g., `sasl.kerberos.keytab` or `ssl.ca.location`), the local file must exist and be consistent across all nodes in the TigerGraph cluster. +2. The entry `sasl.jaas.config` is not applicable, because it is specific to JAVA-based kafka clients, while librdkafka in TigerGraph engine is a C++ library. +==== + +Example 1: Authenticating with SASL/PLAIN + [console] ---- -gadmin config entry System.CDC.TopicConfig +security.protocol=SASL_PLAINTEXT +sasl.mechanisms=PLAIN +sasl.username= +sasl.password= ---- -Utilize a file to pass properties, separating them with a "new line." -The prescribed format for each line is `=`. -It is imperative to employ the property name to designate the CDC topic, such as `name=`, as in the following example: +Example 2: Authenticating with SASL/GSSAPI +[console] +---- +security.protocol=SASL_PLAINTEXT +sasl.mechanism=GSSAPI +sasl.kerberos.service.name=kafka +sasl.kerberos.principal= +sasl.kerberos.keytab= +---- + +Example 3: Authenticating with SASL/PLAIN and Encrypted with SSL +[console] +---- +security.protocol=SASL_SSL +sasl.mechanisms=PLAIN +sasl.username= +sasl.password= +ssl.ca.location= +ssl.certificate.location= +ssl.key.location= +---- + +For more details on SASL with librdkafka, refer to the: https://github.com/confluentinc/librdkafka/wiki/Using-SASL-with-librdkafka + +=== System.CDC.TopicConfig + +To update properties non-interactively, create a file (e.g., `cdc_producer_config`) where each line has the +format `=` separated by “new line”. +Then use + +`gadmin config set System.CDC.TopicConfig @` + +to read in the settings. + +IMPORTANT: It is mandatory to include the property `name` to designate the CDC topic, as in the following example: + [console] ---- echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config ---- +The full list of entries for configuration file of `System.CDC.TopicConfig` can be viewed in the table "Topic configuration properties" at https://github.com/edenhill/librdkafka/blob/v1.1.0/CONFIGURATION.md#topic-configuration-properties. + +NOTE: Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable. For other configuration settings please see the xref:_other_configuration_settings_table[Other Configuration Settings Table]. @@ -91,37 +150,40 @@ For other configuration settings please see the xref:_other_configuration_settin This tutorial will walk you through how to set up a TigerGraph CDC service. -. Setup external Kafka service for CDC messages +. ** Setting Up the External Kafka Cluster for CDC** + -.Download external Kafka package +If you already have a running external Kafka cluster for CDC, this step can be skipped. + -.First make the folder it will be downloaded to: +.**Download external Kafka package** ++ +.**Create the directory where Kafka will be downloaded**: + [console] ---- mkdir -p /home/tigergraph/test_cdc/download_kafka ---- + -.Use this package: +.**Use this package**: [console] ---- `https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz` ---- + -.And run this command to download Kafka to the folder that was just created: +.**Download Kafka**: [console] ---- curl https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz | tar -xzf - -C "/home/tigergraph/test_cdc/download_kafka" ---- + -.Check if it's successfully downloaded and extracted with this command: +.**Verify the download**: [console] ---- ls -l /home/tigergraph/test_cdc/download_kafka ---- + -Next, start a Zookeeper server. +**Starting the External Zookeeper and Kafka Cluster for CDC** + -.Open a new terminal to start the Zookeeper service. Use the default configuration `Zookeeper.properties`, where it is using default port `2181`: +.Start the External Zookeeper Instance. Use the default configuration `zookeeper.properties`, where it is using default port `2181`: + [console] ---- @@ -129,21 +191,9 @@ KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/zookeeper-server-start.sh $KAFKA_ROOT/config/zookeeper.properties ---- -Now, start a Kafka server. -+ -.Use the configuration file `server.properties`, where it is using default port `9092`. -+ -If you have a cluster environment or if the external Kafka server is not local to the GPE servers, you need to add the following line to `server.properties` file, to enable listening to messages from remote servers: -+ -[console] ----- -listeners=PLAINTEXT://:9092 ----- -+ -To determine the value for ``, use the command `ifconfig` or `ip addr show`, and find the ip after `inet`. -+ -Command to start a Kafka server: +.Start the External Kafka Cluster for CDC + +.Use the default configuration `server.properties`, where it is using default port `9092`: [console] ---- KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 @@ -151,23 +201,25 @@ KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/kafka-server-start.sh $KAFKA_ROOT/config/server.properties ---- + -*(Optional)* clear Kafka topic +NOTE: To listen to messages produced from remote servers, edit the `server.properties` to add `listeners=PLAINTEXT://:9092`. +For the value of ``, use the command `ifconfig` or `ip addr show`, and find the ip after `inet`. + ++ +***(Optional)* clear Kafka topic** + .Run this command to clear existing old Kafka messages in the Kafka. [console] ---- -MYIP=127.0.0.1 - -KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 +KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1 $KAFKA_ROOT/bin/kafka-topics.sh --bootstrap-server $MYIP:9092 --delete --topic cdc_topic ---- -. Setup TigerGraph CDC service +. **Setting Up the TigerGraph CDC Service** + -Now, start the CDC service in TigerGraph. +After configuring the external Kafka cluster for CDC, set up the TigerGraph CDC service. + -.Use the setup configuration commands as followed. +.**Configure the CDC producer and topic settings:** + [console] ---- @@ -187,29 +239,27 @@ gadmin config apply gadmin restart gpe restpp ---- + -. Test TigerGraph CDC service +. **Testing the TigerGraph CDC Service** + -Once the service is up and running, test it, by making an update to an existing graph with xref:gsql-ref:querying:data-modification-statements.adoc[]. +**Once the TigerGraph CDC service is running, test it by making updates to an existing graph with xref:{page-component-version}@gsql-ref:querying:data-modification-statements.adoc[].** + Statements like: + -* xref:gsql-ref:querying:data-modification-statements.adoc#_update_statement[Update] -* Running a custom or xref:gsql-ref:tutorials:gsql-101/built-in-select-queries.adoc[built-in query] -* Running a xref:tigergraph-server:API:built-in-endpoints.adoc#_loading_jobs[loading job]. +* xref:{page-component-version}@gsql-ref:querying:data-modification-statements.adoc#_update_statement[Update] +* Running a custom or xref:{page-component-version}@gsql-ref:tutorials:gsql-101/built-in-select-queries.adoc[built-in query] +* Running a xref:API:built-in-endpoints.adoc#_loading_jobs[loading job]. + [NOTE] ==== -If an existing graph is not available, create a new graph by following TigerGraph’s xref:gsql-ref:tutorials:gsql-101/index.adoc[] tutorial documentation and using the provided xref:gsql-ref:appendix:example-graphs.adoc[] data. +If an existing graph is not available, create a new graph by following TigerGraph’s xref:{page-component-version}@gsql-ref:tutorials:gsql-101/index.adoc[] tutorial documentation and using the provided xref:{page-component-version}@gsql-ref:appendix:example-graphs.adoc[] data. ==== + -. Lastly, check CDC messages. +. **Checking CDC Messages in the External Kafka Cluster.** + -.To consume and display CDC messages, run: +.**To consume and view CDC messages from the external Kafka cluster for CDC, run**: [console] ---- -MYIP=127.0.0.1 - -KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 +KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1 $KAFKA_ROOT/bin/kafka-console-consumer.sh --topic cdc_topic --from-beginning --bootstrap-server $MYIP:9092 ---- @@ -243,22 +293,4 @@ When set to -1, there is an infinite timeout, which may slow the GPE shutdown. ¦ Interval for purging outdated entries in deleted id map. ¦ minutes: 30. -|=== - - - - - - - - - - - - - - - - - - +|=== \ No newline at end of file diff --git a/modules/system-management/pages/change-data-capture/cdc-state-monitoring.adoc b/modules/system-management/pages/change-data-capture/cdc-state-monitoring.adoc index 6e3488f6..d8d594ef 100644 --- a/modules/system-management/pages/change-data-capture/cdc-state-monitoring.adoc +++ b/modules/system-management/pages/change-data-capture/cdc-state-monitoring.adoc @@ -1,13 +1,19 @@ -= CDC Monitoring and Reset += CDC State Monitoring == State Monitoring -.Check the state of CDC Service with this command: +Check the state of CDC Service with this command: + [console] ---- grun_p gpe "cat $(gadmin config get System.DataRoot)/gstore/0/part/cdc.yaml" ---- -.The response will look like this: +The cdc state file should only exist on GPE Replica 1 node of each partition: + +* `GPE_1#1` +* `GPE_2#1` + +.Example Response [console] ---- ------------------ Output example for cdc.yaml on 1x2 cluster ------------------- @@ -20,11 +26,7 @@ index: 0 cat: /home/tigergraph/tigergraph/data/gstore/0/part/cdc.yaml: No such file or directory ---- -If xref:tigergraph-server:system-management:change-data-capture/cdc-overview.adoc#_cdc_ha[CDC HA] is not enabled, the cdc state file should only exist on GPE Replica 1 node of each partition, such as `GPE_1#1` or `GPE_2#1`. - -If xref:tigergraph-server:system-management:change-data-capture/cdc-overview.adoc#_cdc_ha[CDC HA] is enabled (the default setting in multi-replica clusters), the CDC state file may reside on any node with GPE servers. For each partition, only the state file on the GPE leader is active; state files on other nodes may be missing or outdated. You can determine this by checking the tid in the file or the file's last update time. - -.For each CDC state field, see the table below: +.Information Fields for CDC Status Report [cols="2", separator=¦ ] |=== ¦ Field ¦ Meaning @@ -37,7 +39,7 @@ For a transaction, it’s the tid of the last delta batch for the transaction (m ¦ The `safe_persistent_tid` is the tid of the earliest delta batch among all open transactions that are not committed/rolled-back in CDC. If there’s no open transactions, this is the same as tid. -All delta batches with tid `` must have been written to external kafka. +All delta batches with `tid ` must have been written to external kafka. ¦ `split_index` ¦ For non-transaction, it’s always 0. For transactions, the split_index is the index of the delta batch that was most recently written to external kafka among all delta batches for the same transaction. @@ -48,15 +50,13 @@ All delta batches with tid `` must have been written to ext == State of DIM Service -DIM (Deleted Id Map) is an internal service designed to assist the CDC service in processing data updates that involve vertices already deleted from the database. - -.Check the state of DIM(Deleted Id Map) service with this command: +.Command to check the state of DIM(Deleted Id Map) service [console] ---- grun_p gpe "cat $(gadmin config get System.DataRoot)/gstore/0/part/deleted_idmap_state.yaml" ---- -.Check disk usage of DIM(Deleted Id Map) service with this command: +.Command to check disk usage of DIM(Deleted Id Map) service [console] ---- grun_p gpe "du -sh $(gadmin config get System.DataRoot)/gstore/0/part/deletedID_store" @@ -64,7 +64,7 @@ grun_p gpe "du -sh $(gadmin config get System.DataRoot)/gstore/0/part/deletedID_ The dim state file exists on all GPE nodes, unlike the cdc state file. -.The response will look like this: +.Example response for DIM state [console] ---- ------------ Output example for deleted_idmap_state.yaml on 1x2 cluster ------------ @@ -85,7 +85,7 @@ purged_deletedvid_curr_tid: 3752091 The purging task runs every 30 minutes by default. ==== -.For each DIM state field, see the table below: +.Information fields for DIM state report [cols="2", separator=¦ ] |=== ¦ Field ¦ Meaning @@ -108,16 +108,15 @@ All `deleted vid` deltas with an origin `tid` smaller than this value (excluding Note: The `idmap` is cached in memory, and if the map size is small (configured in `GPE.BasicConfig.Env` as `DIMCacheLimitInMB`, set at `30MB`), it is not flushed to disk. Consequently, this `tid` may lag behind in such cases. -¦`purged_deletedvid_curr_tid` -¦ Periodically, TigerGraph purges the entries in RocksDB based on the `safe_persistent_tid`. -This is because the CDC already used some of the `vid-uid` entries and wrote to external Kafka, hence we don’t need these entries anymore. +¦ `purged_deletedvid_curr_tid` +¦ Shows the transaction ID (`tid_origin`) of the latest purged "deleted vertex ID" record. +TigerGraph regularly removes old entries from RocksDB based on the `safe_persistent_tid`. + +These entries are no longer needed because the Change Data Capture (CDC) process has already used them and sent the data to external Kafka. +All "deleted vid–uid" entries with a smaller `tid_origin` value have already been purged from RocksDB. -¦ `purged_deletedvid_next_tid` -¦ Purges `deleted vid` delta’s `tid_origin`. -The `tid_orgin` represents the `tid` of origin delta that contains the vertex deletion message. -All `deleted vid-uid` entries with `tid_orgin` smaller than this already have been purged from RocksDB. +By default, this purging task runs every 30 minutes. You can change the interval by updating `GPE.BasicConfig.Env:DIMPurgeIntervalInMin`. -Configure this with `GPE.BasicConfig.Env: DIMPurgeIntervalInMin`), and update this `tid` if purging is actually performed. |=== == CDC Reset diff --git a/modules/system-management/pages/management-commands.adoc b/modules/system-management/pages/management-commands.adoc index ae60dccb..0c5cd09c 100644 --- a/modules/system-management/pages/management-commands.adoc +++ b/modules/system-management/pages/management-commands.adoc @@ -134,7 +134,7 @@ Flags: [NOTE] ==== -See xref:tigergraph-server:backup-and-restore:backup-cluster.adoc[] for more examples. +See xref:backup-and-restore:backup-cluster.adoc[] for more examples. ==== === gadmin backup list @@ -361,7 +361,11 @@ Global Flags: ==== gadmin config entry -Change a configuration entry. +Configure configuration entries *interactively*. + +This command is especially useful for parameters that accept *lists or arrays*, because you can add, remove, and modify entries step-by-step without needing to format JSON manually. + +You may also use patterns to filter the entries you want to configure. [source,console] ---- @@ -481,6 +485,22 @@ Global Flags: [#_gadmin_config_set] ==== gadmin config set +Configure a configuration entry in a *non-interactive* way. + +Use this command when you want to set a value directly by providing the name and value in one line. + +[IMPORTANT] +==== +For parameters that accept an *array*, wrap the entire JSON array in *single quotes* so the shell treats it as a single argument. + +Example: +[source,bash] +---- +gadmin config set GSQL.UDF.Policy.HeaderAllowlist \ +'["stdlib.h","string","tuple","vector","list","deque","arrays","forward_list","queue","priority_queue","stack","set","multiset","map","multimap","unordered_set","unordered_multiset","unordered_map","unordered_multimap","iterator","sstream","algorithm","math.h"]' +---- +==== + [source,console] ---- $ gadmin config set -h @@ -741,7 +761,7 @@ Flags: Global Flags: --debug enable debug log output to stdout ---- - +[#_gadmin_restart] === gadmin restart The `gadmin restart` command is used to restart one, many, or all TigerGraph services. You will need to confirm the restarting of services by either entering y (yes) or n (no). To bypass this prompt, you can use the -y flag to force confirmation. @@ -783,7 +803,7 @@ $ gadmin restart all -y [ Info] Starting CTRL [ Info] Starting ZK ETCD DICT KAFKA ADMIN GSE NGINX GPE RESTPP KAFKASTRM-LL KAFKACONN TS3SERV GSQL TS3 IFM GUI ---- - +[#_gadmin_start] === gadmin start The `gadmin start` command can be used to start one, many, or all services. @@ -827,6 +847,9 @@ $ gadmin start all [ Info] Starting ZK ETCD DICT KAFKA ADMIN GSE NGINX GPE RESTPP KAFKASTRM-LL KAFKACONN TS3SERV GSQL TS3 IFM GUI ---- +When xref:reference:configuration-parameters.adoc#_controller[`Controller.ServiceManager.AutoRestart`] is set to `true`, TigerGraph automatically restarts all services if they crash during `gadmin start` or `gadmin restart`. +The default value is `false`. It does not restart services that are manually stopped. + [#_gadmin_status] === gadmin status diff --git a/modules/system-management/pages/management-with-gadmin.adoc b/modules/system-management/pages/management-with-gadmin.adoc index dab877c2..eed3cf9f 100644 --- a/modules/system-management/pages/management-with-gadmin.adoc +++ b/modules/system-management/pages/management-with-gadmin.adoc @@ -6,8 +6,8 @@ TigerGraph Graph Administrator (`gadmin`) is a tool for managing TigerGraph serv To see a listing of all the options or commands available for gadmin, see the xref:management-commands.adoc[] page. -* xref:tigergraph-server:system-management:manage-services.adoc[TigerGraph service management] -* xref:tigergraph-server:system-management:memory-management.adoc[Memory Management] +* xref:manage-services.adoc[TigerGraph service management] +* xref:memory-management.adoc[Memory Management] == Manage licenses @@ -18,7 +18,7 @@ To add a new license key, use the command `gadmin license set`. You can either p To simply check the status of the license on the current solution, use the command `gadmin license status`. -See xref:tigergraph-server:installation:license.adoc[] for a complete guide to activating or renewing a TigerGraph license. +See xref:installation:license.adoc[] for a complete guide to activating or renewing a TigerGraph license. == Manage system configurations @@ -30,10 +30,6 @@ Examples of parameters that can be changed with these commands include: * System root directory * Timeout length * Port numbers -* xref:tigergraph-server:cluster-and-ha-management:crr-index.adoc[Replica numbers] +* xref:cluster-and-ha-management:crr-index.adoc[Replica numbers] -See xref:tigergraph-server:reference:configuration-parameters.adoc[] for a full list of these parameters. All of them can be accessed with the `gadmin config` command. - -== Nginx configuration - -Follow the steps documented in https://kb.tigergraph.com/knowledge_base/v3/how_to_articles/how_to_create_an_nginx_configuration_template[this support article] to update the Nginx configurations of your TigerGraph instance. +See xref:reference:configuration-parameters.adoc[] for a full list of these parameters. All of them can be accessed with the `gadmin config` command. diff --git a/modules/system-management/pages/memory-management.adoc b/modules/system-management/pages/memory-management.adoc index a1156581..80040424 100644 --- a/modules/system-management/pages/memory-management.adoc +++ b/modules/system-management/pages/memory-management.adoc @@ -16,7 +16,7 @@ Most database operations require more memory the more data you have. * Queries ** Queries that read or write a high volume of data use more memory. ** In a distributed cluster, a non-distributed query can be memory-intensive for the node where the query is run. -See xref:gsql-ref:querying:distributed-query-mode.adoc[]. +See xref:{page-component-version}@gsql-ref:querying:distributed-query-mode.adoc[]. == Monitor memory usage You can monitor memory usage by query and by machine. @@ -122,7 +122,7 @@ grun all 'grep -i "" $(gadmin config get System.LogRoot)/gpe/INFO.* === Monitor system free memory percentage ==== Through Admin Portal -If you have access to Admin Portal, you can monitor memory usage by node through the cluster monitoring tool in the xref:gui:admin-portal:dashboard.adoc[Dashboard]. +If you have access to Admin Portal, you can monitor memory usage by node through the cluster monitoring tool in the xref:{page-component-version}@gui:admin-portal:dashboard.adoc[Dashboard]. ==== Through Linux commands The following is a list of Linux commands to measure system memory and check for out-of-memory errors: @@ -230,7 +230,7 @@ You must xref:manage-services.adoc#_start_stop_or_restart_a_service[restart the === By HTTP header -Another way to limit the query memory usage is to specify the memory limit at the time of the request through the HTTP header `GSQL-QueryLocalMemLimitMB` when using the xref:tigergraph-server:API:built-in-endpoints.adoc#_run_an_installed_query_post[Run Query REST endpoint]. +Another way to limit the query memory usage is to specify the memory limit at the time of the request through the HTTP header `GSQL-QueryLocalMemLimitMB` when using the xref:API:built-in-endpoints.adoc#_run_an_installed_query_post[Run Query REST endpoint]. This applies to the specific request being run only, and overrides the system configuration. For example, to set the limit to 100 MB, make the following request: diff --git a/modules/system-management/pages/system-metrics.adoc b/modules/system-management/pages/system-metrics.adoc index 5630df38..5dcf609a 100644 --- a/modules/system-management/pages/system-metrics.adoc +++ b/modules/system-management/pages/system-metrics.adoc @@ -10,6 +10,8 @@ You can use `gadmin metric` commands to collect the following metrics related to For a full list of options available to `gadmin metric`, see xref:management-commands.adoc#_gadmin_metric[`gadmin metric`] or run `gadmin metric -h` in the command line to view the help text for the command. +TigerGraph also provides xref:troubleshooting:log-files.adoc[log files]. + == Report CPU usage Use the command `gadmin metric -t cpu` to report CPU usage by different TigerGraph services on a cluster or a node in the cluster. diff --git a/modules/system-management/pages/workload-management.adoc b/modules/system-management/pages/workload-management.adoc index 8dffc014..72121b63 100644 --- a/modules/system-management/pages/workload-management.adoc +++ b/modules/system-management/pages/workload-management.adoc @@ -1,190 +1,203 @@ = Workload Management :description: Overview of workload management in TigerGraph. -Certain TigerGraph operations, such as running online analytical processing (OLAP) queries that touch a lot of data, can be memory-intensive. -TigerGraph provides the following mechanisms for you to manage workload in your TigerGraph instances. +This page explains the different methods available to manage system workloads in TigerGraph. -[#_workload_queue] -== Workload Queue -You can configure workload queues so that queries are routed to the appropriate queues during runtime. -Each queue has a few properties, such as the maximum number of concurrent queries allowed and the maximum number of queries that can be queued so it can help prevent the system overload. -You can grant workload queues to users based on their roles so that the users can submit queries to the appropriate workload queues to be managed. +In distributed or replicated clusters, users have multiple processing units at their disposal. Depending on their needs, they can configure xref:#_query_routing_schemes[query routing schemes] such as round-robin or CPU load-aware routing. For replicated clusters, queries can also be xref:#_specify_replica_to_run_query_on[directed to specific replicas]. -=== What APIs are managed by the workload queue? -The following types of requests will be routed to either the default workload queue or the one specified by the user: -* Run installed queries. -* Interpret queries. -* Run heavy built-in queries, mostly used to "Explore Graph" in GraphStudio. +[#_workload_queues] +== Workload Queues +Workload Queues help manage queries by routing them to predefined queues with specific configurations. +Each queue can have properties like the maximum number of concurrent queries and delay queue size, preventing system overload. +Queries can be assigned to queues based on user roles and permissions. -=== Configurations -You can toggle the workload queue feature on and off, and add, update, or delete workload queues as you need. +In 3.10, *workload queues* were introduced. Multiple independent queues can be configured, allowing application designers to decide how specific operations are handled. -==== Put Workload Queue -[source.wrap] ----- -POST /gsqlserver/gsql/workload-manager/configs ----- -Upload the workload queue configs. +Workload queues manage the following types of operations: -===== Request Body -The request body expects a JSON object with the following schema: -[source, json] ----- -{ - "isEnabled": true, - "queues": { - "OLTP": { - "description": "OLTP queries", - "isDefault": false, - "maxConcurrentQueries": 100, - "maxDelayQueueSize": 200 - }, - "scheduled_jobs": { - "description": "Scheduled jobs", - "maxConcurrentQueries": 10, - "maxDelayQueueSize": 20 - }, - "AdHoc": { - "description": "Ad-hoc queries", - "isDefault": true, - "maxConcurrentQueries": 1, - "maxDelayQueueSize": 2 - } - } -} ----- -The request body must have the following fields at the top level: -[cols="20h,~,20h"] -|=== -|Field|Description|Data type +* Running an installed query. +* Interpreting ad-hoc queries. +* Executing heavy built-in queries, commonly used for "Explore Graph" in GraphStudio. -|`isEnabled`|The feature flag to enable or disable the workload queue.|`BOOL` -|`queues`|The map of the available workload queues.|`OBJECT` -|=== - -Objects under `queues` consist of queue ID (key) and properties (value). - -CAUTION: The queue ID must be a string of less than 64 characters including alphanumeric and underscore. +=== Configure Workload Queues +Workload queues help control how queries are executed within the system by defining limits such as the maximum number of concurrent queries and the maximum size of the delay queue. +Administrators can configure workload queues by creating, modifying, or deleting them as needed. Each queue has the following properties: [cols="20h,~,~"] |=== |Field|Description|Data type - |`description`|The description of the queue.|`STRING` (< 256 characters) |`isDefault` (optional)|The flag to indicate if the queue is the default queue. Must be set to `true` for exactly one queue.|`BOOL` |`maxConcurrentQueries`|The maximum number of concurrent queries allowed in the queue.|`UINT` in the range of `(0, 131072)` |`maxDelayQueueSize`|The maximum number of queries that can be queued in the delay queue.|`UINT` in the range of `[0, 131072)` |=== - ==== maxConcurrentQueries and maxDelayQueueSize `maxConcurrentQueries` and `maxDelayQueueSize` are enforced on per machine level. More specifically, it puts a limit on how many requests ONE `GPE` process can handle. - For example, in a TigerGraph cluster with `4` nodes (there will be `4` `GPE` processes), the total number of qureies allowed for a `WorkloadQueue` is `4*maxConcurrentQueries`. Similarly, the total number of queries can be put into the corresponding delay queue is `4*maxDelayQueueSize`. - TigerGraph internally would try to evenly distribute queries evenly among the nodes, hence, the `WorkloadQueue` from each `GPE` would be filled at a similar pace. -[WARNING] -==== -The query concurrency is also confined by the number of physical cores that the machine has. -Therefore, `maxConcurrentQueries` of a `WorkloadQueue` is not recommended to be too large (i.e less than `3 * number of machine's physical cores`). -Once the configurations change, GPE must be restarted to take effect. +Before making changes, it’s a good practice to retrieve the current configuration using the *GET Workload Queue* command. After reviewing, you can update or apply new configurations using the *PUT Workload Queue* command. +These commands are available both as GSQL commands and as REST endpoints. + +[NOTE] +==== +Use a JWT token with `-H "Authorization: Bearer "` for RESTPP APIs in workload management to ensure secure authentication. `username:password` is supported only for GSQL APIs, not RESTPP. ==== -===== Examples +*GSQL Commands* -To modify the whole config: -[source.warp, bash] +* *Retrieve Current Configuration:* +[source.wrap,gsql] +---- +GET WORKLOAD QUEUE TO "/path/to/queue.json" +---- + +* *Update Configuration:* +[source.wrap,gsql] +---- +PUT WORKLOAD QUEUE FROM "/path/to/queue.json" +---- + +*REST Endpoints* + +* *Retrieve Current Configuration:* +[source.wrap] +---- +GET /gsqlserver/gsql/workload-manager/configs ---- -curl -X POST -u tigergraph:tigergraph \ - :/gsqlserver/gsql/workload-manager/configs \ - -d '{"isEnabled":true,"queues":{"OLTP":{"description":"OLTP queries","isDefault":false,"maxConcurrentQueries":100,"maxDelayQueueSize":200},"scheduled_jobs":{"description":"Scheduled jobs","maxConcurrentQueries":10,"maxDelayQueueSize":20},"AdHoc":{"description":"Ad-hoc queries","isDefault":true,"maxConcurrentQueries":1,"maxDelayQueueSize":2}}}' + +* *Configure Workload Queues* +[source.wrap] ---- +POST /gsqlserver/gsql/workload-manager/configs +---- + +*Examples* -To just toggle the feature flag, simply skip `queues`: +1. *Retrieve Current Configuration:* [source.warp, bash] ---- -curl -X POST -u tigergraph:tigergraph \ - :/gsqlserver/gsql/workload-manager/configs \ - -d '{"isEnabled":true}' +curl -X GET -H "Authorization: Bearer " \ + :/gsqlserver/gsql/workload-manager/configs ---- -To add, delete, or update the `queues` while keeping the feature flag untouched, simply skip `isEnabled`: +2. *Modify Workload Queue Configuration:* [source.warp, bash] ---- -curl -X POST -u tigergraph:tigergraph \ - :/gsqlserver/gsql/workload-manager/configs \ - -d '{"queues":{"OLTP":{"description":"OLTP queries","isDefault":false,"maxConcurrentQueries":100,"maxDelayQueueSize":200},"scheduled_jobs":{"description":"Scheduled jobs","maxConcurrentQueries":10,"maxDelayQueueSize":20},"AdHoc":{"description":"Ad-hoc queries","isDefault":true,"maxConcurrentQueries":1,"maxDelayQueueSize":2}}}' +curl -X POST -H "Authorization: Bearer " \ +:/gsqlserver/gsql/workload-manager/configs \ +-d '{"isEnabled":true,"queues":{"OLTP":{"description":"OLTP queries","isDefault":false,"maxConcurrentQueries":100,"maxDelayQueueSize":200}}}' ---- -===== Response Status Codes +*Response Status Codes* + +*For GET Workload Queue* [cols="20h,~"] |=== |Status Code|Description -|200|The queue configs have been uploaded successfully. +|200|The queue configurations were successfully retrieved. +|403|The user does not have the `READ_WORKLOAD_QUEUE` privilege. +|500|Server error occurred while processing the request. +|=== + +*For PUT Workload Queue* +[cols="20h,~"] +|=== +|Status Code|Description +|200|The queue configurations were successfully uploaded. |400|The payload is ill-formed. -|403|The user doesn't have the privilege `WRITE_WORKLOAD_QUEUE`. +|403|The user does not have the `WRITE_WORKLOAD_QUEUE` privilege. +|409|Conflict in the configuration, such as multiple queues set as default. +|500|Server error occurred while processing the request. |=== -===== GSQL Command -From a local file: -[source.wrap,gsql] ----- -PUT WORKLOAD QUEUE FROM "/path/to/queue.json" ----- +*maxConcurrentQueries and maxDelayQueueSize* +Each workload queue has two critical properties to manage system resources: + +1. *maxConcurrentQueries:* Defines the maximum number of queries that can run simultaneously in a queue. This helps prevent resource contention and ensures that the system does not become overloaded. + +2. *maxDelayQueueSize:* Specifies the maximum number of queries that can wait in a queue when the current workload exceeds the concurrent query limit. If this limit is reached, new queries are rejected until the queue has capacity. + +[NOTE] +==== +Both properties are enforced at the per-machine level in a cluster. +Setting the values too high may degrade performance. It’s recommended to keep `maxConcurrentQueries` below three times the number of physical CPU cores in a machine. + +Any configuration changes require a GPE restart to take effect. +==== + +=== List Workload Queues +Displays a list of all workload queues available to the current user, including their configurations and permissions. -From a raw string: +*GSQL Command* [source.wrap,gsql] ---- -PUT WORKLOAD QUEUE FROM "{\"queues\":{\"OLTP\":{\"description\":\"OLTP queries\",\"isDefault\":false,\"maxConcurrentQueries\":100,\"maxDelayQueueSize\":200},\"scheduled_jobs\":{\"description\":\"Scheduled jobs\",\"maxConcurrentQueries\":10,\"maxDelayQueueSize\":20},\"AdHoc\":{\"description\":\"Ad-hoc queries\",\"isDefault\":true,\"maxConcurrentQueries\":1,\"maxDelayQueueSize\":2}}}" +LIST WORKLOAD QUEUE ---- -==== Get Workload Queue - +*REST Endpoints* [source.wrap] ---- -GET /gsqlserver/gsql/workload-manager/configs +GET /restpp/workload-manager/queue ---- -Dump the queue configs so that the response would be the equivalent of the payload for `POST`. -The purpose of this API is to retrieve the active configs and modify them on top of it. -Other than the administrative purposes, one may use `SHOW WORKLOAD QUEUE` instead. -===== Example Request +*Example Request* [source.warp, bash] ---- -curl -X GET -u tigergraph:tigergraph \ - :/gsqlserver/gsql/workload-manager/configs +curl -X GET -H "Authorization: Bearer " \ + :/restpp/workload-manager/queue ---- -===== Response Status Codes +*Example Response* +The response will include the information available to the general users. +[source, json] +---- +[ + { + "id": "AdHoc", + "description": "Ad-hoc queries", + "isDefault": true + }, + { + "id": "OLTP", + "description": "OLTP queries" + } +] +---- + +*Response Status Codes* [cols="20h,~"] |=== |Status Code|Description -|200|The queue configs have been retrieved successfully. -|403|The user doesn't have the privilege `READ_WORKLOAD_QUEUE`. + +|200|The queue info has been retrieved successfully. +|403|The user doesn't have the privilege `READ_DATA`. |=== -===== GSQL Command +=== Grant/Revoke Workload Queue Access +You can grant or revoke workload queues to a user based on its user name, groups, and/or roles. + +*GSQL Command* [source.wrap,gsql] ---- -GET WORKLOAD QUEUE ----- - -=== Permissions -You can grant or revoke workload queues to a user based on its user name, groups, and/or roles. +# GRANT +GRANT WORKLOAD QUEUE TO USER , -==== Grant/Revoke Workload Queue +# REVOKE +REVOKE WORKLOAD QUEUE FROM USER , +---- +*REST Endpoint* [source.wrap] ---- POST /gsqlserver/gsql/workload-manager/permission ---- -Grant a workload queue to users, groups, and/or roles. -===== Request Body +*Request Body* The request body expects a JSON object with the following schema: [source, json] ---- @@ -209,15 +222,18 @@ The request body must have the following fields at the top level: |`role` (optional)|The list of the role names to be granted/revoked.|`STRING` or `STRING[]` |=== -TIP: -You can use the wildcard " * " to grant/revoke the queue to all users, groups, or roles. -Note that " * " must be the only entry in the list when available. +[TIP] +==== +* You can use the wildcard " * " to grant/revoke the queue to all users, groups, or roles. +* Note that " * " must be the only entry in the list when available. +==== + +*Example Request* -===== Example Request Grant the queue `OLTP` to the user `u1` and `u2`: [source.warp, bash] ---- -curl -X GET -u tigergraph:tigergraph \ +curl -X POST -H "Authorization: Bearer " \ :/gsqlserver/gsql/workload-manager/permission \ -d '{"action": "grant", "queue": "OLTP", "user": ["u1", "u2"]}' ---- @@ -225,69 +241,59 @@ curl -X GET -u tigergraph:tigergraph \ Revoke the queue `scheduled_jobs` from all users and the role `r1`: [source.warp, bash] ---- -curl -X GET -u tigergraph:tigergraph \ +curl -X POST -H "Authorization: Bearer " \ :/gsqlserver/gsql/workload-manager/permission \ - -d '{"action": "REVOKE" "queue": "scheduled_jobs", "user": "*", role": ["r1"]}' + -d '{"action": "revoke", "queue": "scheduled_jobs", "user": "*", "role": ["r1"]}' ---- -===== Response Status Codes +*Response Status Codes* [cols="20h,~"] |=== |Status Code|Description |200|The queue has been granted/revoked successfully. |400|The payload is ill-formed so none of the given entities could be granted/revoked. -|403|The user doesn't have the privilege `WRITE_WORKLOAD_QUEUE`` +|403|The user doesn't have the privilege `WRITE_WORKLOAD_QUEUE` |=== -===== GSQL Command -[source.wrap,gsql] ----- -# GRANT -GRANT WORKLOAD QUEUE OLTP TO USER u1, u2 -GRANT WORKLOAD QUEUE OLTP TO GROUP g1, g2 -GRANT WORKLOAD QUEUE OLTP TO ROLE r1, r2 -GRANT WORKLOAD QUEUE OLTP TO ALL USERS -GRANT WORKLOAD QUEUE OLTP TO ALL GROUPS -GRANT WORKLOAD QUEUE OLTP TO ALL ROLES - -# REVOKE -REVOKE WORKLOAD QUEUE OLTP FROM USER u1, u2 -REVOKE WORKLOAD QUEUE OLTP FROM GROUP g1, g2 -REVOKE WORKLOAD QUEUE OLTP FROM ROLE r1, r2 -REVOKE WORKLOAD QUEUE OLTP FROM ALL USERS -REVOKE WORKLOAD QUEUE OLTP FROM ALL GROUPS -REVOKE WORKLOAD QUEUE OLTP FROM ALL ROLES ----- - -NOTE: Unlike REST API, the GSQL commands don't allow you to specify USER, GROUP, and ROLE in a command. +[NOTE] +==== +Unlike REST API, the GSQL commands don't allow you to specify USER, GROUP, and ROLE in a command. You must use separate commands for each entity type. +==== -==== Show Workload Queue +=== Show Workload Queue Permissions +The `SHOW WORKLOAD QUEUE` command lists detailed information about workload queues, including their permissions, descriptions, and limits. It is primarily used to inspect queue settings and permissions. + +*GSQL Command* +To show the permission info of all queues: +[source.wrap,gsql] ---- -GET gsqlserver/gsql/workload-manager/permission +SHOW WORKLOAD QUEUE ---- -Show info on a specific workload queue or all. -===== Query Parameters -[cols="20h,~,20h"] -|=== -|Parameter|Description|Data type +To show the permission info of a specific queue, for example `OLTP`: +[source.wrap,gsql] +---- +SHOW WORKLOAD QUEUE OLTP +---- -|`id` (optional)|The ID of the queue to be shown. -If not specified, all queues will be shown. -|`STRING` -|=== +*REST Endpoint* +[source.wrap] +---- +GET /gsqlserver/gsql/workload-manager/permission +---- -===== Example Request +*Example Request* To retrieve the permission info of the queue `OLTP`: [source.warp, bash] ---- -curl -X GET -u tigergraph:tigergraph \ - localhost:14240/gsql/workload-manager/permission?id=OLTP +curl -X GET -H "Authorization: Bearer " \ + localhost:14240/gsql/v1/workload-manager/permission?id=OLTP ---- -===== Example Response +*Example Response* + The response will be the combination of configs and permission, e.g. [source, json] ---- @@ -306,7 +312,7 @@ The response will be the combination of configs and permission, e.g. } ---- -===== Response Status Codes +*Response Status Codes* [cols="20h,~"] |=== |Status Code|Description @@ -315,130 +321,22 @@ The response will be the combination of configs and permission, e.g. |403|The user doesn't have the privilege `READ_WORKLOAD_QUEUE`. |=== -===== GSQL Command -To show the permission info of all queues: -[source.wrap,gsql] ----- -GET WORKLOAD QUEUE ----- - -To show the permission info of a specific queue, for example `OLTP`: -[source.wrap,gsql] ----- -GET WORKLOAD QUEUE OLTP ----- - -==== List Workload Queue - ----- -GET restpp/workload-manager/queue ----- -List all granted workload queues to the current user so the user can choose the appropriate queue from the list. - -===== Example Request -[source.warp, bash] ----- -curl -X GET -u tigergraph:tigergraph \ - :/restpp/workload-manager/queue ----- - -===== Example Response -The response will include the information available to the general users. -[source, json] ----- -[ - { - "id": "AdHoc", - "description": "Ad-hoc queries", - "isDefault": true - }, - { - "id": "OLTP", - "description": "OLTP queries" - } -] ----- - -===== Response Status Codes -[cols="20h,~"] -|=== -|Status Code|Description - -|200|The queue info has been retrieved successfully. -|403|The user doesn't have the privilege `READ_DATA`. -|=== - -==== Use Cases -Suppose we have configured the following workload queues that are the output of the `SHOW WORKLOAD QUEUE` command: -[source, json] ----- -{ - "OLTP": { - "description": "OLTP queries", - "isDefault": true, - "maxConcurrentQueries": 100, - "maxDelayQueueSize": 100, - "granted": { - "USER": [], - "GROUP": ["g1", "g2"], - "ROLE": [] - } - }, - "scheduled_jobs": { - "description": "Scheduled jobs", - "maxConcurrentQueries": 5, - "maxDelayQueueSize": 0, - "granted": { - "USER": ["u1"], - "GROUP": [], - "ROLE": ["r1"] - } - }, - "AdHoc": { - "description": "Ad-hoc queries", - "isDefault": false, - "maxConcurrentQueries": 10, - "maxDelayQueueSize": 10, - "granted": { - "USER": [], - "GROUP": ["g3"], - "ROLE": ["r2"] - } - } -} ----- -===== Running a Query -When running a query, you can specify the workload queue to run the query on. -If the queue is not specified, the query will be routed to the default queue. -To specify the queue in the GSQL shell, you can use the `-queue` option, e.g. ----- -RUN QUERY -queue AdHoc q1() ----- -or you can use the HTTP header `Workload-Queue`: ----- -curl -X POST -u tigergraph:tigergraph \ - -H "Workload-Queue: AdHoc" \ - :14240/restpp/query/ldbc_snb/q1" ----- - -If the given queue is not granted to the current user, the query will be rejected with the error code `REST-14000` and return `HTTP 422 Unprocessable Entity`. - -For example, if the user `tigergraph` who does not belong to the group `g3` or holds the role `r2` tries to run a query on the queue `AdHoc`, the query will be rejected. - - -NOTE: If the queue is full of capacity, the query will be rejected. - -==== Monitoring +[NOTE] +==== +* Use `SHOW WORKLOAD QUEUE` to inspect queue configurations and access permissions. +* This command focuses on *visibility of queue settings*, unlike `GET WORKLOAD QUEUE`, which exports configurations. +==== +=== Check Queue Status You can use the following API to check the status of the workload queues for monitoring purposes. -===== Check Running Queries +[source.wrap] ---- POST /restpp/workload-manager/queuestatus ---- Return the status of the given workload queue on each GPE instance. -===== Request Body +*Request Body* [cols="20h,~,20h"] |=== |Field|Description|Data type @@ -450,15 +348,15 @@ For `mode` field, if `stats` is specified, response only gives the numbers of qu If Request Body is not provided, response is generated as if both fields are using the default values. -===== Example Request +*Example Request* [source.warp, bash] ---- -curl -X POST -u tigergraph:tigergraph \ +curl -X POST -H "Authorization: Bearer " \ :/restpp/workload-manager/queuestatus \ - -d '{"queuelist": ["AdHoc"], "mode": "verbose"}' + -d '{"queuelist": ["AdHoc"], "mode": "verbose"}' ---- -===== Example Response +*Example Response* [source, json] ---- { @@ -523,9 +421,75 @@ curl -X POST -u tigergraph:tigergraph \ } ---- -== Other Query Concurrency Control Methods +=== Use Cases +Suppose we have configured the following workload queues that are the output of the `SHOW WORKLOAD QUEUE` command: +[source, json] +---- +{ + "OLTP": { + "description": "OLTP queries", + "isDefault": true, + "maxConcurrentQueries": 100, + "maxDelayQueueSize": 100, + "granted": { + "USER": [], + "GROUP": ["g1", "g2"], + "ROLE": [] + } + }, + "scheduled_jobs": { + "description": "Scheduled jobs", + "maxConcurrentQueries": 5, + "maxDelayQueueSize": 0, + "granted": { + "USER": ["u1"], + "GROUP": [], + "ROLE": ["r1"] + } + }, + "AdHoc": { + "description": "Ad-hoc queries", + "isDefault": false, + "maxConcurrentQueries": 10, + "maxDelayQueueSize": 10, + "granted": { + "USER": [], + "GROUP": ["g3"], + "ROLE": ["r2"] + } + } +} +---- + +*Running a Query* + +When running a query, you can specify the workload queue to run the query on. +If the queue is not specified, the query will be routed to the default queue. +To specify the queue in the GSQL shell, you can use the `-queue` option, e.g. +---- +RUN QUERY -queue AdHoc q1() +---- + +or you can use the HTTP header `Workload-Queue`: +[source.warp, bash] +---- +curl -X POST -H "Authorization: Bearer " \ + -H "Workload-Queue: AdHoc" \ + :14240/restpp/query/ldbc_snb/q1" +---- + +If the given queue is not granted to the current user, the query will be rejected with the error code `REST-14000` and return `HTTP 422 Unprocessable Entity`. + +For example, if the user `tigergraph` who does not belong to the group `g3` or holds the role `r2` tries to run a query on the queue `AdHoc`, the query will be rejected. + +[NOTE] +==== +If the queue is full of capacity, the query will be rejected. +==== + +== Other Query Workload Management Methods -=== Limit the number of current built-in heavy queries +=== Limit number of concurrent heavy queries WARNING: This configuration is deprecated as of TG 3.10.0 and will be removed in a future release. This is ignored once the xref:#_workload_queue[workload queue] feature is enabled. @@ -571,19 +535,9 @@ $ gadmin config set RESTPP.WorkLoadManager.MaxDelayQueueSize 20 You must xref:manage-services.adoc#_start_stop_or_restart_a_service[restart the RESTPP service] for the change to take effect. -=== Specify number of threads used by a query -You can specify the limit of the number of threads that can be used by one query through the xref:tigergraph-server:API:built-in-endpoints.adoc#_run_an_installed_query_post[Run Query REST endpoint]. - -For example, to specify a limit of four threads that can be used by a query, use the `GSQL-THREAD-LIMIT` parameter and set its value to 4: - -.Specify that the query run with a limit of 4 threads -[source.wrap,bash] ----- -curl -X POST -H "GSQL-THREAD-LIMIT: 4" -d '{"p":{"id":"Tom","type":"person"}}' "http://localhost:9000/query/social/hello" ----- - === Specify replica to run query on -On a distributed cluster, you can specify on which replica you want a query to be run through the xref:tigergraph-server:API:built-in-endpoints.adoc#_run_an_installed_query_post[Run Query REST endpoint]. + +On a distributed cluster, you can specify on which replica you want a query to be run through the xref:API:built-in-endpoints.adoc#_run_an_installed_query_post[Run Query REST endpoint]. For example, to run the query on the primary cluster, use the `GSQL-REPLICA` header when running a query and set its value to 1: @@ -591,7 +545,7 @@ For example, to run the query on the primary cluster, use the `GSQL-REPLICA` hea [source.wrap,bash] ---- curl -X POST -H "GSQL-REPLICA: 1" -d '{"p":{"id":"Tom","type":"person"}}' -"http://localhost:9000/query/social/hello" +"http://localhost:14240/restpp/query/social/hello" ---- == Query Routing Schemes diff --git a/modules/troubleshooting/images/browser-aw-snap-error.png b/modules/troubleshooting/images/browser-aw-snap-error.png new file mode 100644 index 00000000..3a48f3fb Binary files /dev/null and b/modules/troubleshooting/images/browser-aw-snap-error.png differ diff --git a/modules/troubleshooting/images/config-enable-debug.png b/modules/troubleshooting/images/config-enable-debug.png new file mode 100644 index 00000000..1a91c20e Binary files /dev/null and b/modules/troubleshooting/images/config-enable-debug.png differ diff --git a/modules/troubleshooting/images/config-loglevel-debug.png b/modules/troubleshooting/images/config-loglevel-debug.png new file mode 100644 index 00000000..5dcfd409 Binary files /dev/null and b/modules/troubleshooting/images/config-loglevel-debug.png differ diff --git a/modules/troubleshooting/images/log-file-list.png b/modules/troubleshooting/images/log-file-list.png new file mode 100644 index 00000000..29881586 Binary files /dev/null and b/modules/troubleshooting/images/log-file-list.png differ diff --git a/modules/troubleshooting/images/terminal-nginx-log-grep.png b/modules/troubleshooting/images/terminal-nginx-log-grep.png new file mode 100644 index 00000000..8aa3f2ab Binary files /dev/null and b/modules/troubleshooting/images/terminal-nginx-log-grep.png differ diff --git a/modules/troubleshooting/images/terminal-restpp-log-grep.png b/modules/troubleshooting/images/terminal-restpp-log-grep.png new file mode 100644 index 00000000..c54142d9 Binary files /dev/null and b/modules/troubleshooting/images/terminal-restpp-log-grep.png differ diff --git a/modules/troubleshooting/nav.adoc b/modules/troubleshooting/nav.adoc index d8379388..5aae5444 100644 --- a/modules/troubleshooting/nav.adoc +++ b/modules/troubleshooting/nav.adoc @@ -1,7 +1,8 @@ -* Troubleshooting and FAQs -** link:https://kb.tigergraph.com/[Knowledge base and FAQs] -** xref:system-administration-faqs.adoc[] -** xref:log-files.adoc[] -*** xref:service-log-tracking.adoc[] -*** xref:elk-filebeat.adoc[] -** xref:troubleshooting-guide.adoc[] +// NOTE: /troubleshooting/nav.adoc is not used; its content is covered by /additional-resources/nav.adoc. +//* Troubleshooting and FAQs +//** xref:system-administration-faqs.adoc[] +//** xref:log-files.adoc[] +//*** xref:audit-log.adoc[] +//*** xref:gcollect.adoc[] +//*** xref:elk-filebeat.adoc[] +//** xref:troubleshooting-guide.adoc[] diff --git a/modules/troubleshooting/pages/audit-log.adoc b/modules/troubleshooting/pages/audit-log.adoc index a4d424f0..acc10459 100644 --- a/modules/troubleshooting/pages/audit-log.adoc +++ b/modules/troubleshooting/pages/audit-log.adoc @@ -1,21 +1,21 @@ = Audit Logs :pp: {plus}{plus} -:page-aliases: troubleshooting:audit-logs.adoc +:page-aliases: troubleshooting:audit-logs.adoc, service-log-tracking.adoc Audit logs maintain a historical record of activity events, noting the time, responsible user or service, and affected entity. Audit logs enable organizations to meet certain compliance and business policy requirements. Additionally, Administrators use audit logs to track user activity, and security teams use them to investigate breaches and ensure regulatory compliance. -== Key Features +The sections below outline the <<_key_features_and_considerations>>, <<_enabling_audit_logging>>, <<_log_file_management_policies>>, <<_audit_log_format>>, <<_data_masking>>, and <<_known_issues>>. -* The audit logs are structured in JSON format, ensuring machine-readability. -* This format facilitates easy integration with third-party tools. -* It eliminates the need for users to navigate through numerous log files. +== Key Features and Considerations -== Considerations +* The audit logs are structured in JSON format, ensuring machine-readability and easy integration with third-party tools. + +* The gathering of audit data into one log eliminates the need to navigate through numerous log files. === Security @@ -23,13 +23,13 @@ Additionally, Administrators use audit logs to track user activity, and security * Access to the audit log is restricted to those who have access to TigerGraph logs. -* Sensitive data or PII in the audit log, such as credentials or sensitive query payload information is masked by default. +* Sensitive data or PII in the audit log, such as credentials or sensitive query payload information is masked by default. See the section xref:#_data_masking[] for more details. === Performance and Scalability * Enabling audit logging records audit logs to separate files, adding a slight disk I/O workload. However, compared to the debug log, the audit log contains significantly less data, ensuring negligible performance impact. -== Configuration Commands +== Configuration Parameters .Users can enable, disable, and configure audit logging with the following `gadmin` configs: [cols="3", separator=¦ ] @@ -38,9 +38,9 @@ However, compared to the debug log, the audit log contains significantly less da ¦ `System.Audit.Enable` ¦ Setting to enable audit logs. ¦ boolean (false) -¦ `System.Audit.DataBaseName` ¦ Modify the DataBaseName field in log file header. ¦ name (TigerGraph) +¦ `System.Audit.DataBaseName` ¦ Modify the DataBaseName field in log file header. ¦ string (`TigerGraph`) -¦ `System.Audit.LogDirRelativePath` ¦ Modify the relative audit log path. ¦ path string (/AuditLog/) +¦ `System.Audit.LogDirRelativePath` ¦ Modify the relative audit log path. ¦ path string (`/AuditLog/`) ¦ `System.Audit.LogConfig.LogFileMaxDurationDay` ¦ Modify the modification date when a new audit log file is created. ¦ Integer (90) @@ -50,14 +50,11 @@ However, compared to the debug log, the audit log contains significantly less da ¦ `System.Audit.MaskPII` ¦ Mask Sensitive data or PII in the audit log. -¦ Default value is: `true` +¦ boolean (true) |=== -Additionally, all audit log files are stored in JSON format. -Each log entry is a JSON object, and the log file is one JSON array of these objects. - -For more information see section xref:audit-log.adoc#_consuming_audit_logs[] or for a complete list of TigerGraph `gadmin` configs see xref:tigergraph-server:reference:configuration-parameters.adoc[]. +For a complete list of TigerGraph `gadmin` configuration parameters, see the xref:reference:configuration-parameters.adoc[] page. === Apply Configuration Changes @@ -71,11 +68,11 @@ $ gadmin config apply -y $ gadmin restart gsql -y ---- -== Setup Environment +== Enabling Audit Logging -Audit logging is disabled by default in a TigerGraph system. +Audit logging is disabled by default. +To enable audit logging, use the following `gadmin` command: -.To enable audit logging, use `gadmin` to run the following command to enable the setting: [console] ---- $ gadmin config set System.Audit.Enable true @@ -84,13 +81,13 @@ $ gadmin config set System.Audit.Enable true Once enabled, the audit log will be found in the `/AuditLog` folder. The `` represents the default path for all TigerGraph logs, set to `tigergraph/log` by default. -.To customize the relative path for the audit log (in relation to ``), execute the following command: +To customize the *relative path* for the audit log (in relation to ``), run the following command: [console] ---- $ gadmin config set System.Audit.LogDirRelativePath ---- -.You can also modify the on your environment by: +You can also modify the *``* on your environment by: [console] ---- $ gadmin config set System.LogRoot @@ -98,74 +95,60 @@ $ gadmin config set System.LogRoot [CAUTION] ==== -Take note that altering `` impacts the root path for all TigerGraph logs. +Altering `` impacts the root path for ALL TigerGraph logs. ==== -== Log File Policy +There is a *"DatabaseName"* field appended to the header of each audit log file. +*The* value of this field is "TigerGraph" by default. + +You can modify this value by running the following command: +[source, console] +---- +$ gadmin config set System.Audit.DataBaseName +---- + +== Log File Management Policies + +=== File creation and size A new audit log file will be created when *any* of the following situations occur: * The GSQL service is restarted. * The latest audit log file size exceeds the `System.Audit.LogConfig.LogFileMaxSizeMB` value. -The value of this field is `100MB` by default. +The default value is 100 MB. ++ +You can modify this value by running the following command: + -.You can modify this value by running the following command: [console] ---- $ gadmin config set System.Audit.LogConfig.LogFileMaxSizeMB ---- -* Log files will be removed if its lifetime exceeds `System.Audit.LogConfig.LogFileMaxDurationDay`. -The value of this field is `90` days by default. -+ -.You can modify this value by running the following command: +=== File deletion +A log file will be removed if its lifetime exceeds `System.Audit.LogConfig.LogFileMaxDurationDay` days. +The default value is 90 days. + +You can modify this value by running the following command: [console] ---- $ gadmin config set System.Audit.LogConfig.LogFileMaxDurationDay ---- -* Lastly, the oldest audit log file will be automatically deleted, if the amount of audit logs in the audit log folder exceeds the `System.Audit.LogConfig.LogRotationFileNumber` value. -The default of this value is `100` audit logs. -+ -.You can modify this value by running the following command: +The oldest audit log file will be automatically deleted if the number of audit logs in the audit log folder exceeds the `System.Audit.LogConfig.LogRotationFileNumber` value. +The default value is 100 log files. + +You can modify this value by running the following command: [console] ---- $ gadmin config set System.Audit.LogConfig.LogRotationFileNumber ---- -=== Other Customizable Configs -There is a “DatabaseName” field appended to the header of each audit log file. -The value of this field is "TigerGraph" by default. -.You can modify this value by running the following command: -[source, console] ----- -$ gadmin config set System.Audit.DataBaseName ----- - -==== `gadmin` Customizable Configs -You can configure `gadmin` command’s audit logging rotation rules with the following gadmin configs: -.You can modify the lifetime of a file with the following command: -[source, console] ----- -gadmin config set Gadmin.BasicConfig.LogConfig.LogFileMaxDurationDay: ----- - -.You can modify the file rotation size with the following command: -[source, console] ----- -gadmin config set Gadmin.BasicConfig.LogConfig.LogFileMaxSizeMB: ----- -.You can modify the rotation file number with the following command: -[source, console] ----- -gadmin config set Gadmin.BasicConfig.LogConfig.LogRotationFileNumber: ----- +== Audit Log Format -== Consuming Audit Logs -=== Log Format +=== General Format All audit log files are stored in JSON format, even when users are actively interacting with TigerGraph, so audit logs can be consumed at run time. @@ -179,7 +162,7 @@ Audit log files are separated by GSQL service and REST++ API calls. Both having ==== .Here is an example of a whole audit log file: -[console] +[source, json] ---- [ {"serverHostIP":"127.0.0.1","databaseName":"TigerGraph","version":"1.0","timestamp":"2023-12-20T14:42:50.243-07:00"}, @@ -196,7 +179,7 @@ Audit log files are separated by GSQL service and REST++ API calls. Both having === GSQL Service Audit Logs .The first JSON object is the header of this file, which consists of the following fields: -[console] +[source, json] ---- { "version": "1.0", @@ -219,7 +202,7 @@ The audit log will record any user-triggered activity, such as: Each activity will have its own audit log entry and fields. ==== .The `createQuery` activity will produce an audit log entry with the following fields: -[console] +[source, json] ---- { "timestamp":"2023-12-20T14:42:50.243-07:00", @@ -230,7 +213,7 @@ Each activity will have its own audit log entry and fields. "userAgent": "GSQL Shell", "endpoint": "/gsql/file", "actionName": "createQuery", - "status": “SUCCESS”, + "status": "SUCCESS", "message": "Successfully created query 'query_name'" } ---- @@ -239,7 +222,7 @@ For user `login/auth` related activities, one more field called `failedAttempts` This field indicates how many times this user failed to provide the correct credentials. .Here is an example for user login event: -[console] +[source, json] ---- { "timestamp": "2023-12-20T14:42:50.243-07:00", @@ -262,13 +245,13 @@ Audit logs for REST++ calls are found in the `log.Audit-RESTPP` file. They are similar to GSQL service audit logs, such as, the status for API calls will be either `SUCCESS` or `FAILURE`, but they are different in these ways: * The duration is the runtime cost with seconds. -* The code will be the return codes from the REST++ server, see xref:tigergraph-server:reference:return-codes.adoc#_rest[Return Codes]. +* The code will be the return codes from the REST++ server, see xref:reference:return-codes.adoc#_rest[Return Codes]. * A new field `requestId` is also added. Values in `requestParams` and `requestBody` will be masked if they contain PII data. .Here is an example of the REST++ call event: -[source, console] +[source, json] ---- { "timestamp": "2023-10-02T15:06:18.365Z", @@ -295,7 +278,7 @@ Audit logs for gadmin command executions are found in the `log.Audit-GADMIN` fil Besides the basic information such as `timestamp`, `OS`, `username`, they capture the `gadmin` commands executed on the node and the outcomes: .Here is an example of the event after running command: `gadmin start all` -[source, console] +[source, json] ---- { "timestamp": "2024-05-13 23:45:34.940", @@ -308,6 +291,57 @@ Besides the basic information such as `timestamp`, `OS`, `username`, they captur ---- +== Data Masking + +By default, logs will not contain potentially sensitive data such as query parameters and query contents. +For audit logs, there are three data files which are masked: + +* `"queryConent"` +* `"queryParameters"` +* `"fileNames"` + +These fields will have the value `""`. For example: + +[source ,json] +---- +{ + "queryContent":"", + "endpoint":"/gsql/v1/statements", + "clientHost":"10.244.6.79", + "clientOSUsername":"tigergraph", + "queryParameters":"", + "userAgent":"GSQL Shell", + "userName":"tigergraph", + "authType":"USER_PASS", + "message":"Successfully ran query 'SumOfTwoInt'.", + "timestamp":"2025-04-25T08:51:20.159Z", + "actionName":"runQuery", + "status":"SUCCESS" +} +---- + +However, if `System.Audit.MaskPII` is set to `false`, then the log will contain the unmasked data, such as + +[source ,json] +---- +{ + "queryContent":"CREATE OR REPLACE QUERY SumOfTwoInt(Int x, Int y) {\n Print x+y;\n}", + "endpoint":"/gsql/v1/statements", + "clientHost":"10.244.6.79", + "clientOSUsername":"tigergraph", + "queryParameters":{"x":["3"],"y":["7"]}, + "userAgent":"GSQL Shell", + "userName":"tigergraph", + "authType":"USER_PASS", + "message":"Successfully ran query 'SumOfTwoInt'.", + "timestamp":"2025-04-25T08:51:20.159Z", + "actionName":"runQuery", + "status":"SUCCESS" +} +---- + + + == Known Issues * The real client IP address could be removed or masked by a firewall or another intermediate redirect layer before arriving at the TigerGraph service. diff --git a/modules/troubleshooting/pages/gcollect.adoc b/modules/troubleshooting/pages/gcollect.adoc new file mode 100644 index 00000000..d113975e --- /dev/null +++ b/modules/troubleshooting/pages/gcollect.adoc @@ -0,0 +1,60 @@ += Gathering Log Information with gcollect + +Admin users can use the `gcollect` utility to search through and retreive selected information from the log files. This is very handy because the log files can be very large and users may not know the exact pattern to look for. + `gcollect` is included when the TigerGraph database is installed. + +[source] +---- +Usage: gcollect [Options] COMMAND +12345678901234567890123456789012345678901234567890123456789012345678901234567890 +Options: +-h, --help show this help message and exit +-A num, --after-context num print num lines of trailing context after each match. +-B num, --before-context num print num lines of leading context before each match. +-c, --components gpe,gse,rest only collect information related to the specified component(s). + All by default. Supported components: gpe,gse,gsql,dict, + tsar,kafka,zk,rest,nginx,admin,fileLoader,kafkaLoader, + kafka-stream,kafka-connect,gui +-n, --nodes m1,m2 only search patterns for specified nodes. + (only works in together with command "grep") +-s, --start DateTime logs older than this DateTime will be ignored. + Format: 2006-01-02,15:04:05 +-e, --end DateTime logs newer than this DateTime will be ignored. + Format: 2006-01-02,15:04:05 +-t, --tminus num only search for logs that are generated in the past num seconds. +-r, --request_id id only collect information related to the specified request id. + Lines match "pattern" will also be printed. +-b, --before num how long before the query should we start collecting. + (in seconds, can ONLY be used with [--reqest_id] option). +-d, --duration num how long after the query should we stop collecting. + (in seconds, can ONLY be used with [--reqest_id] option). +-o, --output_dir dir specify the output directory, "./output" by default. (ALERT: files in this folder will be DELETED.) +-p, --pattern regex collect lines from logs which match the regular expression. (Can have more than one regex, lines that match any of + the regular expressions will be printed.) +-i, --ignore-case ignore case distinctions in both the PATTERN and the input files. +-D, --display print to screen. +-g for GraphStudio to collect logs. +-v verbose mode. + +COMMANDS: +grep search patterns from logs files that have been collected before. +show show all the requests during the specified time window. +collect collect all the debugging information which satisfy all the requirements specified by Options. +---- + +Examples: + +[source] +---- +# show all requests during the last hour +gcollect -t 3600 show + +# collect debug info for a specific request +gcollect -r RESTPP_2_1.1559075028795 -b 60 -d 120 -p "error" collect + +# collect debug info for all components +gcollect -i -p "error" -p "FAILED" -s "2019-05-22,18:00:00"-e "2019-05-22,19:00:00" collect + +# Search from log files that have been collected before +gcollect -i -p "unknown" -c admin,gpe -D -A 1 -B 2 grep +---- \ No newline at end of file diff --git a/modules/troubleshooting/pages/log-files.adoc b/modules/troubleshooting/pages/log-files.adoc index 252948b9..b6d9a5be 100644 --- a/modules/troubleshooting/pages/log-files.adoc +++ b/modules/troubleshooting/pages/log-files.adoc @@ -1,27 +1,39 @@ = Log Files -The TigerGraph database captures key information on activities occurring across its different components through log functions that output to log files. -These log files are not only helpful in xref:troubleshooting-guide.adoc[troubleshooting] but also serve as a resource for auditing. +TigerGraph captures key information about activities across its components through log files. These logs are essential for xref:troubleshooting:troubleshooting-guide.adoc[troubleshooting] and auditing. +Logs may contain sensitive information, so direct access is restricted. -This page provides a general overview of the way log files are stored in TigerGraph. +This page provides an overview of the log files available in TigerGraph, including where to find them, how they are stored, and what information they contain. -For information and examples specific to logging of RESTPP requests, queries, and user management tasks, see xref:service-log-tracking.adoc[]. +xref:audit-log.adoc[Audit logs] are structured in JSON, ensuring machine-readability and facilitating easy integration with third-party tools. +TigerGraph Linux admin users also use the xref:gcollect.adoc[gcollect] utility to search for and gather selected information from the logs. +We also provide instructions on how to +xref:elk-filebeat.adoc[set up log viewing with Elasticsearch, Kibana, or Filebeat]. -== TigerGraph log structure +== Available Log Files + +TigerGraph generates a variety of log files for its different components. +Understanding what logs are available and what they contain is the first step in effective troubleshooting and system monitoring. + +=== Log File Locations + +Logs in TigerGraph are stored in the log root directory, which is configured at install time. You can find this location by running: -Logs in TigerGraph are stored in TigerGraph's log root directory, which is configured at install time. -You can find the location by running the console command `gadmin config get System.LogRoot`. +[source,console] +---- +gadmin config get System.LogRoot +---- -Within this directory are separate directories for the various TigerGraph services: +Within this directory, you will find subdirectories for each TigerGraph component (admin, gpe, gsql, gui, kafka, nginx, zk, etc.). [source,console] ---- $ ls /home/tigergraph/tigergraph/log -admin dict executor gpe gsql informant kafkaconn nginx ts3 zk -controller etcd fileLoader gse gui kafka kafkastrm-ll restpp ts3serv +admin dict executor gpe gsql informant kafkaconn nginx zk +controller etcd fileLoader gse gui kafka kafkastrm-ll restpp ---- -You can also use the `gadmin log` command to list log files: +Use the `gadmin log` command to list log files: [source, console] ---- @@ -31,14 +43,11 @@ ADMIN : /home/tigergraph/tigergraph/log/admin/ADMIN.INFO CTRL : /home/tigergraph/tigergraph/log/controller/CTRL#1.log CTRL : /home/tigergraph/tigergraph/log/controller/CTRL#1.out ... -TS3 : /home/tigergraph/tigergraph/log/ts3/TS3_1.log -TS3 : /home/tigergraph/tigergraph/log/ts3/TS3_1.out -TS3SERV: /home/tigergraph/tigergraph/log/ts3serv/TS3SERV#1.out ZK : /home/tigergraph/tigergraph/log/zk/ZK#1.out ZK : /home/tigergraph/tigergraph/log/zk/zookeeper.log ---- -Use the command `gadmin log ` to just get the logs for a specific service: +Use the command `gadmin log ` to get the logs for a specific service: [source, console] ---- @@ -47,43 +56,17 @@ GPE : /home/tigergraph/tigergraph/log/gpe/GPE_1#1.out GPE : /home/tigergraph/tigergraph/log/gpe/log.INFO ---- -The `log.INFO` file contains messages logged by the application code. -The `.out` log contains the redirection of the process output, and is used for debugging significantly less frequently than `log.INFO`. - -[CAUTION] -The log format differs between the `.out` and `INFO` logs. -It also differs between certain TigerGraph services. -An internal project to unify log formats is ongoing. - -Log formats also vary across the different components. -In folders where logs are checked often, such as `restpp`, `gsql`, and `admin`, there are symbolic links that help you quickly get to the most recent log file of that category: - -* `log.INFO` -** Contains regular output and errors -* `log.ERROR` -** Contains errors only -* `.out` -** Contains all output from the component process. Current `.out` logs have the form `.out`. -Historical logs have the form `-old-YYYY-MM-DDTHH-MM-SS.fff.out` - -* `log.WARNING` or `log.DEBUG` -** `log.WARNING` contains warnings and all error level messages -* `log.FATAL` -** Contains outputs for any fatal level events - -[NOTE] -All services do not create a `log.DEBUG` file by default. -To change this, modify the parameter `.BasicConfig.LogConfig.LogLevel`. -For example, `GSQL.BasicConfig.LogConfig.LogLevel`. See xref:reference:configuration-parameters.adoc[] for more information. - -== Log locations on a cluster +Third-Party components like Zookeeper and Kafka have logs that are not listed by `gadmin log`. You can find them at: -In a TigerGraph cluster, each node only keeps logs of activities that took place on the node itself. -For example, the GSQL logs on the m1 node only record events for m1 and are not replicated across the cluster. +[source,console] +---- +zookeeper : ~/tigergraph/zk/zookeeper.out.* +kafka : ~/tigergraph/kafka/kafka.out +---- -For GSQL specifically, the cluster elects a leader to which all GSQL requests will be forwarded. -To check which node is the leader, start by checking the GSQL logs of the m1 node. -Check the most recent lines of `log.INFO` and look for lines containing information about a leader switch. +In a *TigerGraph cluster*, each node maintains logs only for the activities that occur on that node. Logs are not automatically replicated across nodes. +For example, the GSQL logs on the m1 node reflect only the operations performed on m1. +To determine which node is currently the GSQL leader, check the most recent `log.INFO` file on m1. For example, the logs below recorded a GSQL leader switch from m2 to m1: @@ -97,18 +80,83 @@ I@20210709 13:56:52.220 (SessionManager.java:204) All sessions aborted. I@20210709 13:56:52.224 (GsqlHAHandler.java:283) switched to new leader m1 ---- +=== TigerGraph Component Log Files + +* `.out` files capture *standard output (stdout)* and log runtime information, including error stack traces when services crash or unexpected errors occur. +These logs are especially useful for errors that aren't logged by the service's internal logging mechanism. + +* `.ERROR` files are used to log errors captured by the system, typically from exceptions caught in try-catch blocks. If an error occurs before the logging system initializes or is uncaught, it is logged in the `.out` file instead. + +* `.INFO` files log regular operational information about the system's normal functioning. + +To diagnose an issue for a given component, check the `.out` log file for that component. + +image::log-file-list.png[] + +[NOTE] +==== +The GUI component writes all log levels to a single log file and does not generate separate `.log`, `.error`, or `.info` files. + +* Each GUI log file (e.g., `gui_ADMIN.log`, `gui_INFO.log`) captures the standard output of the GUI process and includes all log levels (error, warning, info). +* The log level for the GUI component is *configurable*. You can set it using: + +[source,console] +---- +gadmin config set GUI.BasicConfig.LogConfig.LogLevel +---- + +Replace `` with one of: `DEBUG`, `INFO`, `WARN`, `ERROR`, `PANIC`, or `FATAL`. The default level is `INFO`. +==== + +==== Symbolic Links + +In directories with frequently checked logs, such as `restpp`, `gsql`, and `admin`, symbolic links make it easier to access the latest log file. +These links are automatically updated to point to the newest log. + +For example, `log.INFO` is a symbolic link that points to the current `.INFO` log file. To see what a symbolic link points to, use `ls -ll` followed by the symbolic link name: + +[source,console] +---- +ls -ll log.INFO +log.INFO -> log.INFO.2024-07-01-10-00-00 +---- + +Here, `log.INFO` is a symbolic link pointing to the current `.INFO` log file. -== Open source TigerGraph components +=== Third-Party Component Log Files -The open source components that TigerGraph includes (Kafka, Nginx, ZooKeeper, Kafkaconn, Kafkastream) follow their respective logging behavior instead of having an `INFO/WARNING/ERROR` log, in addition to having an `.out` file for process output redirection. -For example, the Kafka logs have a `controller.log`, `kafka.log`, `kafka-request.log`, `state-change.log`, and `server.log`. +TigerGraph uses several open-source components (such as Kafka, Nginx, ZooKeeper, Kafkaconn, Kafkastream) that maintain their own log conventions. + +* *NGINX Logs:* The NGINX log files (e.g., `nginx.out`, `nginx.error.log`, `nginx.access.log`) are generated directly by the NGINX web server itself and are not internal TigerGraph component logs. + +* *Kafka Logs:* Kafka logs include `controller.log`, `kafka.log`, `kafka-request.log`, `state-change.log`, and `server.log`. + +* *ZooKeeper Logs:* ZooKeeper logs are typically found as `zookeeper.out.*` in the ZooKeeper directory. + +== TigerGraph log structure + +[CAUTION] +==== +Log formats may differ between `.out` and `.INFO` logs and between different TigerGraph services. +==== + +* `log.INFO`: Contains regular output and errors. +* `log.ERROR`: Contains errors only. +* `.out`: Contains all output from the component process. Current `.out` logs have the form `.out`. Historical logs have the form `-old-YYYY-MM-DDTHH-MM-SS.fff.out` +* `log.WARNING` or `log.DEBUG` +** `log.WARNING` contains warnings and all error-level messages. +** `log.DEBUG` contains debug-level messages (not created by default). +* `log.FATAL`: Contains outputs for any fatal level events + +[NOTE] +==== +All services do not create a `log.DEBUG` file by default. +To change this, modify the parameter `.BasicConfig.LogConfig.LogLevel`. +For example, `GSQL.BasicConfig.LogConfig.LogLevel`. See xref:reference:configuration-parameters.adoc[Configuration Parameters] for more information. +==== == Log rotation TigerGraph also handles log rotation. -When the log is rotated, the log.LEVEL symlink is updated to point to the newest log. -The default configuration is to rotate under any of the following circumstances: - -* Log file max size exceeds 100mb -* Log is older than 90 days -* There are more than 100 files for that service +When a log is rotated, the symlink (e.g., `log.INFO`) is updated to point to the newest log file. +Logs are rotated when the file size exceeds *100 MB*, the log is older than *90 days*, or more than *100 files* exist for that service. diff --git a/modules/troubleshooting/pages/service-log-tracking.adoc b/modules/troubleshooting/pages/service-log-tracking.adoc deleted file mode 100644 index e429fa6e..00000000 --- a/modules/troubleshooting/pages/service-log-tracking.adoc +++ /dev/null @@ -1,162 +0,0 @@ -= Service Logs Tracking - -This guide describes how to obtain log information from the service logs. - -Service logs track information about user actions such as the action taken, the user permissions, the client used, and so on. - -Logging in the same files can also be found for login, and all other requests sent to any of these services. - -== REST endpoint request history - -All requests made to TigerGraph's REST endpoints are recorded by the RESTPP logs and Nginx logs. Information available in the logs includes: - -* Timestamp of the request -* API request parameters -* Request Status -* User information (when RESTPP authentication is turned on) - -RESTPP is responsible for many tasks in the TigerGraph internal architecture and records many internal API calls, which can be hard to distinguish from manual requests. When xref:user-access:enabling-user-authentication.adoc#_enable_restpp_authentication[RESTPP authentication is on], the RESTPP log will record the user information and mark a call if it is made by an internal API. Therefore, you can use the command below to filter for manual requests: - - -[source, console] ----- -# In the restpp log directory -$ grep -i "requestinfo" log.INFO | grep -v "__INTERNAL_API__" - -# All requests exluding the ones made by internal API -I0315 21:11:59.666318 14535 handler.cpp:351] RequestInfo|,1.RESTPP_1_1.1615842719666.N,NNN,0,0,0|user:tigergraph|api:v2|function:NoSchema|graph_name:social|libudf: -I0315 21:41:36.462616 14541 handler.cpp:351] RequestInfo|,196622.RESTPP_1_1.1615844496462.N,NNN,0,0,0|user:tigergraph|api:v2|function:NoSchema|graph_name:social|libudf: ----- - -`RequestInfo` contains the ID of the request, which you can use to look up more information on the request : - -image::image (75).png[Screenshot showing the ID of the request highlighted in the console.] - -Here is an example of using a request ID to look up a request in the restpp log: - -[source, console] ----- -$ grep "1615842719666" log.INFO - -# Returns all information about the specific request -# RawRequest log is captured at the entry point of a query -I0315 21:11:59.666026 14535 handler.cpp:285] RawRequest|,1.RESTPP_1_1.1615842719666.N,NNN,0,0,0|GET|/echo?parameter1=parameter_value|async = 0|payload_data.size() = 0|api = v2 -# RequestInfo log is captured after the request has been parsed, -# and contains information such as username and the function or UDF to run -I0315 21:11:59.666318 14535 handler.cpp:351] RequestInfo|,1.RESTPP_1_1.1615842719666.N,NNN,0,0,0|user:tigergraph|api:v2|function:NoSchema|graph_name:social|libudf: -# ReturnResult is captured when the request has been processed -I0315 21:11:59.666509 14535 requestrecord.cpp:325] ReturnResult|0|0ms|RESTPP|1.RESTPP_1_1.1615842719666.N|user:tigergraph|/echo|graph_id=1&graph_name=social¶meter1=parameter_value|39 ----- - -== Query execution via RESTPP - -`NGINX#1.out` log contains the endpoint accessed, the IP, and the client. -RESTPP `log.INFO` contains the query run, user who ran it, and the graph it ran against. - -In a cluster, the log for the node that processed the request contains the details. -For example, if the request was routed to m3, the log would be in `NGINX#3.out` on m3 or RESTPP `log.INFO` on m3. - -Example query: -[source, console] ----- -curl -H "Authorization: Bearer 4m6nh1rakn60430rjf5asv" :9000/query/ldbc_snb/example_query ----- -Nginx logs: - -[source, console] ----- - - - [30/Jun/2022:01:37:52 +0000] "GET /query/ldbc_snb/example_query HTTP/1.1" 200 117 "-" "curl/7.29.0" ----- - -RESTPP logs: - -[source, console] ----- -I0630 01:41:29.779425 29409 handler.cpp:312] RawRequest|,131076.RESTPP_1_1.1656553289779.N,NNN,0,0,0,S|GET|/query/ldbc_snb/example_query|async = 0|payload_data.size() = 0|api = v2 -I0630 01:41:29.779491 29409 handler.cpp:434] RequestInfo|,131076.RESTPP_1_1.1656553289779.N,NNN,0,0,0,S|user:example_user|api:v2|function:queryDispatcher|graph_name:ldbc_snb|libudf:libudf_ldbc_snb -I0630 01:41:29.783354 29410 requestrecord.cpp:349] ReturnResult|0|4ms|GPE_1_1|131076.RESTPP_1_1.1656553289779.N|user:example_user|/query/ldbc_snb/example_query|graph_id=1|117 ----- - -== Query execution via GraphStudio - -Nginx `NGINX#1.out` log contains the endpoint accessed, the IP from where it was accessed, and the client. -GraphStudio (GUI) `GUI#1.out` contains the query run, user who ran it, and the graph it ran against. - -=== Nginx logs -[source, console] ----- -136.2.2.2 - - [30/Jun/2022:01:44:58 +0000] "GET /api/restpp/query/ldbc_snb/example_query HTTP/1.1" 202 67 "http://35.2.2.2:14240/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0" ----- - -=== GraphStudio logs -[source, console] ----- -2022-06-30 01:46:28.911 I | middleware/logger.go:63] <-- | | | 8d4b9219-0733-4413-8dd2-e4765a7e08da | example_user | 136.2.2.2 | GET /api/restpp/query/ldbc_snb/example_query -2022-06-30 01:46:28.922 I | middleware/logger.go:96] --> | 202 | 11.297188ms | 8d4b9219-0733-4413-8dd2-e4765a7e08da | example_user | 136.2.2.2 | GET /api/restpp/query/ldbc_snb/example_query ----- - -== Query execution via GSQL CLI - -The `NGINX#1.out` log contains the endpoint accessed, the IP from where it was accessed, and the client. -GSQL `log.INFO` contains all commands run during the client session, including queries run, and graph queries ran against. - -[NOTE] -In a cluster, the GSQL request will be served by the current GSQL leader which could be any node running the GSQL service. - - -=== Nginx logs -[source, console] ----- -10.2.2.2 - - [30/Jun/2022:01:51:07 +0000] "POST /query/ldbc_snb/example_query HTTP/1.1" 200 117 "-" "Apache-HttpClient/4.5.13 (Java/11.0.10)" ----- - -=== GSQL logs -Leader: -[source, console] ----- -I@20220630 01:58:37.050 example_user|localhost:50366|00000003140 (QueryRunner.java:87) RunQuery: http://10.2.2.3:9000/query/ldbc_snb/example_query -I@20220630 01:58:37.050 example_user|localhost:50366|00000003140 (GlobalCatalog.java:769) use 4m6****asv to authenticate graph ldbc_snb -I@20220630 01:58:37.066 example_user|localhost:50366|00000003140 (CommandHandler.java:125) (Succeeded) _GSQL_CMD_DDL: run query example_query(...) ----- - -Follower: -[source, console] ----- -I@20220630 01:52:02.256 (GsqlHAHandler.java:359) 2430|Forward request to http://10.2.2.3:14240/gsqlserver/gsql/command -I@20220630 01:52:02.269 (GsqlHAHandler.java:464) 2430|Forward request finish http://10.2.2.3:14240/gsqlserver/gsql/command ----- - - - -== Monitor user management tasks - -User management activities, such as logins, role and privilege changes are recorded in the GSQL logs in the folder `gsql`. - -To view recent activities, use the symlink `log.INFO`. -To filter for information that you need, you can use Linux commands such as https://linuxcommand.org/lc3_man_pages/grep1.html[`grep`] and http://linuxcommand.org/lc3_man_pages/tail1.html[`tail`]. - -For example, to view recent changes in roles, you can run the following command in the `gsql` log directory: - -[source, console] ----- -$ grep -i "role" log.INFO - -# Returns all lines containing the word "role" -# username source IP -I@20210312 22:41:16.167 tigergraph|127.0.0.1:45854|00000000077 (BaseHandler.java:133) Received|POST|/gsql/roles?action=grant&role=globaldesigner&name=lennessy|0 -I@20210312 22:41:16.863 tigergraph|127.0.0.1:45854|00000000077 (BaseHandler.java:167) Successful|POST|/gsql/roles?action=grant&role=globaldesigner&name=lennessy|application/json; charset=UTF-8|696ms ----- - -To view login activities, search `log.INFO` for `"login"` instead. - -[source, console] ----- -$ grep -i "login" log.INFO - -# Returns all lines containing the world "login" -I@20210315 21:08:42.047 tigergraph|127.0.0.1:53960|00000000001 (BaseHandler.java:133) Received|POST|/gsql/login|28 -I@20210315 21:08:42.061 tigergraph|127.0.0.1:53960|00000000001 (LoginHandler.java:52) The gsql client is started on the server, and the working directory -is /home/tigergraph/tigergraph/log/restpp -I@20210315 21:08:42.072 tigergraph|127.0.0.1:53960|00000000001 (LoginHandler.java:80) Successful|Login|tigergraph -I@20210315 21:08:42.080 tigergraph|127.0.0.1:53960|00000000001 (BaseHandler.java:167) Successful|POST|/gsql/login|application/json; charset=UTF-8|35ms ----- \ No newline at end of file diff --git a/modules/troubleshooting/pages/system-administration-faqs.adoc b/modules/troubleshooting/pages/system-administration-faqs.adoc index 3163c9c4..7ef44dc9 100644 --- a/modules/troubleshooting/pages/system-administration-faqs.adoc +++ b/modules/troubleshooting/pages/system-administration-faqs.adoc @@ -1,5 +1,6 @@ = System Administration FAQs +== General [discrete] === How do I apply or update my license key? @@ -121,18 +122,18 @@ gadmin config set ---- [discrete] -=== How do I backup my data? +=== How do I back up my data? -*GBAR* is the utility to do backup and restore of TigerGraph system. Before a backup, GBAR needs to be configured. Please see xref:backup-and-restore:index.adoc[GBAR - Graph Backup and Restore] for details. +*gadmin backup* is the utility to do backup and restore of TigerGraph system. Before a backup, the backup operations need to be configured. Please see xref:backup-and-restore:index.adoc[] for details. -To backup the current system: +To back up the current system: [source,text] ---- -gbar backup -t +gadmin backup create ---- -Please be advised that GBAR only backs up data and configuration. No logs or binaries will be backed up. +Please be advised that `backup` only backs up data and configuration. No logs or binaries will be backed up. [discrete] === How do I restore a backup? @@ -141,7 +142,7 @@ To restore an existing backup: [source,text] ---- -gbar restore +gadmin backup restore ---- Please be advised that running restore will STOP the service and ERASE existing data. diff --git a/modules/troubleshooting/pages/troubleshooting-guide.adoc b/modules/troubleshooting/pages/troubleshooting-guide.adoc index 0f3e83c8..e2bc13aa 100644 --- a/modules/troubleshooting/pages/troubleshooting-guide.adoc +++ b/modules/troubleshooting/pages/troubleshooting-guide.adoc @@ -7,9 +7,10 @@ This troubleshooting guide is only up to date for v2.6 and below. + Additional guidance for v3.0+ is in development. ==== -== Introduction +The Troubleshooting Guide teaches you how to monitor the status of your TigerGraph system, and when needed, find the log files in order to get a better understanding of why certain errors are occurring. This guide covers log file debugging for data loading and querying. + +For a overview of available log files, their locations, and what information they contain, please refer to the xref:troubleshooting:log-files.adoc#_available_log_files[Log Files] documentation page. -The Troubleshooting Guide teaches you how to monitor the status of your TigerGraph system, and when needed, find the log files in order to get a better understanding of why certain errors are occurring. This section covers log file debugging for data loading and querying. == General @@ -31,36 +32,6 @@ $ grun all "date" (Make sure the time across all nodes are synchronized with time difference under 2 seconds. ) ---- -=== Location of Log Files - -The following command reveals the location of the log files: - -[source,console] ----- -gadmin log ----- - -You will be presented with a list of log files. The left side of the resulting file paths is the component for which the respective log file is logging information. -The majority of the time, these files will contain what you are looking for. You may notice that there are multiple files for each TigerGraph component. - -[NOTE] -==== -The .out file extension is for errors. + -The .INFO file extension is for normal behaviors. -==== - -In order to diagnose an issue for a given component, you'll want to check the .out log file extension for that component. - -image::https://lh5.googleusercontent.com/6MnNakec5fKh5faCoWdZwfzprqXyguDZXt15nz0QAG1M3vW1t0nmwo7oYr3DgwVsgJoIEjub-5tSA81UtOQ-Ot-9m30zZ9Zr5tRG077dgfZ7KaE3tMMafUK63oi6fILQeM-kQw6fKqc[] - -Other log files that are not listed by the *`gadmin log`* command are those for Zookeeper and Kafka, which can be found here: - -[source,console] ----- -zookeeper : ~/tigergraph/zk/zookeeper.out.* -kafka : ~/tigergraph/kafka/kafka.out ----- - === Synchronize time across nodes in a cluster TigerGraph will experience a variety of issues if clocks across different nodes in a cluster are out of sync. If running `grun all "date"` shows that the clocks are out of sync, it is highly recommended that you install NTP implementations such as https://chrony.tuxfamily.org/index.html[`chrony`] or http://manpages.ubuntu.com/manpages/xenial/man8/systemd-timesyncd.service.8.html[`timesyncd`] to keep them in sync. @@ -121,8 +92,9 @@ From calling a query to returning the result, here is how the information flows: grep /home/tigergraph/tigergraph/logs/nginx/ngingx_1.access.log ---- -image::https://lh5.googleusercontent.com/n2ehgN21jrvzuXUj2JJmg-xxqwj4o7Dlc_f1oPAAJDAmxv-1KLwyblrfS4FkWq2vOwHsAVbxYYt2qI_EcG9e4sWEGOSvVjCXtb5yFObazfzBEnVh5juPICoUA2Rc0iseifiCHfDllrE[] +image::terminal-nginx-log-grep.png[] +[start=2] . Nginx sends the request to Restpp. [source,console] @@ -150,6 +122,7 @@ grep /home/tigergraph/tigergraph/logs/GSE_1_1/log.INFO image::screen-shot-2020-03-24-at-5.22.23-pm.png[] +[start=3] . Restpp sends the result back to Nginx. [source,console] @@ -157,8 +130,9 @@ image::screen-shot-2020-03-24-at-5.22.23-pm.png[] grep /home/tigergraph/tigergraph/logs/RESTPP_1_1/log.INFO ---- -image::https://lh6.googleusercontent.com/idUWKQ_1kIkOjwGmjSM7VzbkJGGJaWYrLtExpkTvOuXsDnv5wvDch31dnzsvFy7DZ_T28hWY-BKMJSbmitH6BRjTjXqA27FPXLVyFWDKlUJHdZqlVT5_XePil7TlMPM7HxUpdBpGjzM[] +image::terminal-restpp-log-grep.png[] +[start=4] . Nginx sends the response. [source,console] @@ -287,7 +261,7 @@ System_GSystem|GSystemWatcher|Health|ProcMaxGB|0|ProcAlertGB|0| CurrentGB|1|SysMinFreePct|10|SysAlertFreePct|30|FreePct|69 ---- -When free memory drops below 10 percent (`SysMinFreePct`), all queries are aborted. This threshold is adjustable through xref:tigergraph-server:system-management:management-with-gadmin.adoc[`gadmin config`]. +When free memory drops below 10 percent (`SysMinFreePct`), all queries are aborted. This threshold is adjustable through xref:system-management:management-with-gadmin.adoc[`gadmin config`]. ==== *How to retrieve information on queries aborted due to memory usage* @@ -520,13 +494,13 @@ If your system has unexpectedly high memory usage, here are possible causes: If your browser crashes or freezes (shown below), please refresh your browser. -image::https://lh6.googleusercontent.com/3vmIx6BF3S0YuwLQ9-PrKip5c-Bh15NymmAlGh83cILcMGu7v3wzc23cnMlKAlSuFDjz7ZOGmhg82wUZgeIlG7xb1F0OC6yhstBQEcmRN3rl95O_s1qoGbwiqnaczvg1Y63DTDbYtN4[] +image::browser-aw-snap-error.png[] === GraphStudio Crash If you suspect GraphStudio has crashed, first run `gadmin status` to verify all the components are in good shape. Two known causes of GraphStudio crashes are: -* *Huge JSON response* User-written queries can return very large JSON responses. If GraphStudio often crashes on large query responses, you can try reducing the size limit for JSON responses by changing the `GUI.RESTPPResponseMaxSizeBytes` configuration using xref:tigergraph-server:system-management:management-with-gadmin.adoc[`gadmin config`]. The default limit is 33554432 bytes. +* *Huge JSON response* User-written queries can return very large JSON responses. If GraphStudio often crashes on large query responses, you can try reducing the size limit for JSON responses by changing the `GUI.RESTPPResponseMaxSizeBytes` configuration using xref:system-management:management-with-gadmin.adoc[`gadmin config`]. The default limit is 33554432 bytes. [source, console] ---- @@ -552,9 +526,9 @@ VIS : /home/tigergraph/tigergraph/logs/gui/gui_INFO.log Allowing GraphStudio DEBUG mode will print out more information to the log files. To allow DEBUG mode, please edit the following file : `/home/tigergraph/tigergraph/visualization/server/src/config/local.json` -image::https://lh3.googleusercontent.com/pVTzOYUGWao0YuAjKYr_r1tQNQ9y1zknf8txPThPNJm0nyTaBDok3kBvJ8a3RS2Dr7GnGPcX3HrKu47fbKfPuPWOqjvy12CkXCdYYZLrNvNtjCczwqJayk-QxXTuC5vZ72OSx3KE6BE[] +image::config-enable-debug.png[] -image::https://lh5.googleusercontent.com/VQiOsJ1ez9s21h9QxtwqEAEbI28f6RNFlYt7UqCyVjKHfr2xgi9YbvksZYR1HETttrSLaFPr25FiP995ZRRSPdvb-UH8pjn2yp4w-8ODMpcvS52n1U3VoI70nFE5l0j1kelQRm6_hlI[] +image::config-loglevel-debug.png[] After editing the file, run `gadmin restart gui -y` to restart the GraphStudio service. Follow along the log file to see what is happening : `tail -f /home/tigergraph/tigergraph/logs/gui/gui_INFO.log` @@ -562,7 +536,7 @@ Repeat the error-inducing operations in GraphStudio and view the logs. ==== Known Issues -There is a list of known GraphStudio issues xref:gui:graphstudio:known-issues.adoc[here]. +There is a list of known GraphStudio issues xref:{page-component-version}@gui:graphstudio:known-issues.adoc[here]. == Further Debugging @@ -662,4 +636,4 @@ else echo; echo "Support Collection has been saved as $supportdir.tar.xz" fi ---- -==== \ No newline at end of file +==== diff --git a/modules/user-access/pages/access-control-model.adoc b/modules/user-access/pages/access-control-model.adoc index 1ad4375b..ebeb814e 100644 --- a/modules/user-access/pages/access-control-model.adoc +++ b/modules/user-access/pages/access-control-model.adoc @@ -25,7 +25,7 @@ When a privilege is assigned to a role, it allows users with the role to perform For example, the privilege `READ_SCHEMA` on graph `social` gives a user read permission to the schema of the graph `social`. This allows the user to run commands such as `ls` and `SHOW VERTEX` on the graph. -To view a complete list of privileges available in TigerGraph and the commands they enable a user to run, see xref:reference:list-of-privileges.adoc[List of Privileges]. +To view a complete list of privileges available in TigerGraph and the commands they enable a user to run, see the xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[]. === Privilege scopes @@ -77,7 +77,8 @@ Even regarding a single REST endpoint, whether a request is authorized depends o a|* If granted on a vertex type attribute, it gives permission to create vertices, but only specify values for attributes where the user has `CREATE_DATA` privilege if the user also has `UPDATE_DATA` privilege on all attributes of that type. ** The user must have `CREATE_DATA` privilege on the primary ID of the vertex type to be able to create vertices. * If granted on an edge type attribute, it gives permission to create edges, but only specify values for attributes where the user has `CREATE_DATA` privilege if the user also has `UPDATE_DATA` privilege on all attributes of that type. -* For attributes where the user doesn't have privilege, they must use wildcards(`_`) to use the default value for vertices/edges created by the xref:gsql-ref:querying:data-modification-statements.adoc#_insert_into_statement[`INSERT INTO` statements]. +* For attributes where the user doesn't have privilege, they must use wildcards(`_`) to use the default value for vertices/edges created by the xref:{page-component-version}@gsql-ref:querying:data-modification-statements.adoc#_insert_into_statement[`INSERT INTO` statements]. +* To create an edge, user must also have `CREATE_DATA` privilege on the primary ID attributes of its source and target vertex types. |`READ_DATA` |Permission to access all data of the type where the privilege is granted. @@ -85,19 +86,43 @@ a|* If granted on a vertex type attribute, it gives permission to create vertice To grant `READ_DATA` to a specific attribute of a type, you must grant `READ_DATA` to the primary key of the type first or in the same command. -For edges, you must grant `READ_DATA` to the primary key of the `FROM` and `TO` -vertex types before granting `READ_DATA` to other attributes of the edge type. +To read an edge or its attributes, user must also have `READ_DATA` privilege on the primary ID attributes of its `FROM` and `TO` vertex types. + |`UPDATE_DATA` |Permission to update all data of the type where the privilege is granted. |Permission to update the attribute value where the privilege is granted. `UPDATE_DATA` on all attributes is also required for creating new vertices and edges. +To update an edge, the user must also have `UPDATE_DATA` privilege on the primary ID attributes of its source and target vertex types. |`DELETE_DATA` |Permission to delete data of the type where the privilege is granted. |N/A. This privilege is not applicable on the attribute level. +To delete an edge, the user must also have `DELETE_DATA` privilege on the primary ID attributes of its source and target vertex types. |=== +==== Creating Edges + +When creating an edge, user must have `CREATE_DATA` privilege on the primary ID attributes of both the source and target vertex types, in addition to `CREATE_DATA` on the edge type itself. + +The same principle applies to all edge operations: to create, read, update, or delete an edge, the user must also have the corresponding privilege (`CREATE_DATA`, `READ_DATA`, `UPDATE_DATA`, `DELETE_DATA`) on the primary ID attributes of the edge’s source and target vertex types. + +Creating an edge references specific vertices by their IDs, and the system requires permission to validate those IDs. Without this, the system returns a permission error, even if user have `CREATE_DATA` on the edge type itself. + +Consider the following `INSERT INTO` statement: + +[.wrap,gsql] +---- +INSERT INTO HAS_INTEREST1 VALUES (1 Person1, 2 Tag1); +---- + +To successfully run this command, the user must have: + +* `CREATE_DATA` on the edge type `HAS_INTEREST1` +* `CREATE_DATA` on the id attribute of vertex types `Person1` and `Tag1` + +Otherwise, the query fails with a missing privilege error on the vertex type. + ==== Examples Suppose we have a graph with schema as below: @@ -273,7 +298,7 @@ The following table details the built-in roles and their corresponding set of pr [NOTE] ==== -For Row Policy related Built-in roles see xref:tigergraph-server:user-access:rbac-row-policy/rbac-row-policy.adoc#_built-in-roles[Row Policy built-in role changes] +For Row Policy related Built-in roles see xref:rbac-row-policy/rbac-row-policy.adoc#_built-in-roles[Row Policy built-in role changes] ==== === User-defined roles diff --git a/modules/user-access/pages/enabling-user-authentication.adoc b/modules/user-access/pages/enabling-user-authentication.adoc index 72dc3d8a..407fd559 100644 --- a/modules/user-access/pages/enabling-user-authentication.adoc +++ b/modules/user-access/pages/enabling-user-authentication.adoc @@ -75,5 +75,5 @@ $ gadmin config apply $ gadmin restart restpp nginx gui gsql -y ---- -After enabling user authentication, the xref:tigergraph-server:API:built-in-endpoints.adoc#_request_a_token[`/requesttoken` endpoint] becomes available for you to generate tokens used to authenticate your REST requests to the REST++ server. +After enabling user authentication, the xref:API:built-in-endpoints.adoc#_request_a_token[`/requesttoken` endpoint] becomes available for you to generate tokens used to authenticate your REST requests to the REST++ server. diff --git a/modules/user-access/pages/jwt-token.adoc b/modules/user-access/pages/jwt-token.adoc index 86a8f52d..ea200523 100644 --- a/modules/user-access/pages/jwt-token.adoc +++ b/modules/user-access/pages/jwt-token.adoc @@ -4,7 +4,16 @@ OpenID Connect (OIDC) token-based authentication in JSON web token (JWT) format, Helping to secure users and their data. == OIDC JWT Authentication in TigerGraph -Some basic understanding of TigerGraph’s user access through xref:tigergraph-server:user-access:enabling-user-authentication.adoc[] and xref:tigergraph-server:API:built-in-endpoints.adoc#_authentication[Built-in API Endpoints] is recommended before continuing. + +[NOTE] +==== +* In versions 3.10 and 3.11, JWT token support is limited to RESTPP endpoints and only for third-party JWT tokens. +* GSQL server endpoints do not support JWT tokens in these versions. +* Internally generated JWT tokens are also not supported in these versions. +* Version 4.1.2 adds support for internally-generated JWT tokens and use with GSQL server endpoints, removing these limitations. +==== + +Some basic understanding of TigerGraph’s user access through xref:enabling-user-authentication.adoc[] and xref:API:built-in-endpoints.adoc#_authentication[Built-in API Endpoints] is recommended before continuing. === What is OIDC? OpenID Connect (OIDC) serves as an identity layer integrated with the OAuth 2.0 framework. @@ -390,9 +399,14 @@ For testing purposes, we recommend using a tool such as https://dinochiesa.githu Here is an example of the generated token from the payload data. === Use JWT Token -Now that the JWT token is generated, its usage is the same as using a GSQL plain text token, allowing access to RESTPP endpoints. +Now that the JWT token is generated, its usage is the same as using a GSQL plain text token, allowing access to *RESTPP endpoints*. + +For versions 3.10 and 3.11, please note: + +* JWT tokens are *supported only on RESTPP endpoints*. +* *GSQL endpoints do not support JWT tokens*, regardless of whether they are third-party or internally generated. -For example, this is used to run the query {queryName} on the graph {graphName}: +For example, this is used to run the query `{queryName}` on the graph `{graphName}`: [console] ---- curl -s -H "Authorization: Bearer " -X GET http://127.0.0.1:9000/query/{graphName}/{queryName} @@ -424,7 +438,7 @@ When using a JWT token for authentication, please consider these scenarios and h === CA certificate -Users need to rely on a CA certificate (corresponding to the xref:tigergraph-server:reference:configuration-parameters.adoc#_environment_variables[environment variable] `SSL_CA_CERT`) to establish the connection with the URL being set. +Users need to rely on a CA certificate (corresponding to the xref:reference:configuration-parameters.adoc#_environment_variables[environment variable] `SSL_CA_CERT`) to establish the connection with the URL being set. This env config is only needed when the URL fails with the error log recorded in the RESTPP log file: [console] diff --git a/modules/user-access/pages/rbac-row-policy/rbac-row-policy.adoc b/modules/user-access/pages/rbac-row-policy/rbac-row-policy.adoc index 02faa5ce..27ab2630 100644 --- a/modules/user-access/pages/rbac-row-policy/rbac-row-policy.adoc +++ b/modules/user-access/pages/rbac-row-policy/rbac-row-policy.adoc @@ -3,7 +3,7 @@ == Prerequisites This guide assumes some basic understanding of TigerGraph's query language GSQL. -Refer to the xref:gsql-ref:tutorials:gsql-101/index.adoc[] and xref:gsql-ref:intro:index.adoc[] to get started using GSQL. +Refer to the xref:{page-component-version}@gsql-ref:tutorials:gsql-101/index.adoc[] and xref:{page-component-version}@gsql-ref:intro:index.adoc[] to get started using GSQL. == Key Concepts RBAC Row policy introduces a number of key concepts: @@ -111,8 +111,8 @@ Follow these rules to ensure it works seamlessly within row policies. * *Simplicity for Efficiency*: To keep things simple and optimized, GSQL Functions can only have one level of nested control flows. * *Calling Other Functions*: GSQL functions, can utilize some built-in functions: -** xref:gsql-ref:querying:func/string-functions.adoc[] -** xref:gsql-ref:querying:func/mathematical-functions.adoc[] +** xref:{page-component-version}@gsql-ref:querying:func/string-functions.adoc[] +** xref:{page-component-version}@gsql-ref:querying:func/mathematical-functions.adoc[] * *Context Function*: GSQL functions support new xref:#_context_functions[] like: ** xref:#_current_roles[] @@ -331,7 +331,7 @@ Here are some important terms and details for object-based privileges: They can be things like `GLOBAL`, `VERTEX`, `EDGE`, etc. * *Privilege Scopes*: Define where these privileges apply, like `GRAPH`, `PACKAGE`, or `GLOBAL`. -To see a complete list, as well as the xref:tigergraph-server:reference:list-of-privileges.adoc[legacy privilege syntax] that the object-base privilege relate to, go to the xref:rbac-row-policy/row-policy-privileges-table.adoc[]. +To see a complete list, as well as the xref:reference:list-of-privileges-legacy.adoc[legacy privilege keywords] that the object-based privileges correspond to, see the xref:rbac-row-policy/row-policy-privileges-table.adoc[]. === Privilege Commands @@ -417,21 +417,21 @@ WRITE_POLICY on graph level [NOTE] ==== -For a complete list of Built-in roles see xref:tigergraph-server:user-access:access-control-model.adoc#_built_in_roles[Built-in Roles] +For a complete list of Built-in roles see xref:access-control-model.adoc#_built_in_roles[Built-in Roles] ==== == Context Functions -xref:gsql-ref:querying:func/context-functions.adoc[] are a set of new built-in functions that provide insights into the user's information during their current session. +xref:{page-component-version}@gsql-ref:querying:func/context-functions.adoc[] are a set of new built-in functions that provide insights into the user's information during their current session. They offer valuable insights into user roles, making it easier to manage access and privileges within TigerGraph. They work in: `INSTALLED` queries, `INTERPRET` queries, and xref:#_gsql_functions[]. Before users can use Context Functions, they must enable REST++ authentication. If it's not enabled, users will see an error message. -To learn more about REST++ authentication see xref:tigergraph-server:API:authentication.adoc[REST API Authentication]. +To learn more about REST++ authentication see xref:API:authentication.adoc[REST API Authentication]. -Additionally, in order to use the context functions explicitly, ensure that the user holds the `READ_ROLE` privilege on the current graph, unless a xref:tigergraph-server:user-access:rbac-row-policy/setup-row-policy.adoc#_row_policy[Row Policy] already includes the context functions. +Additionally, in order to use the context functions explicitly, ensure that the user holds the `READ_ROLE` privilege on the current graph, unless a xref:rbac-row-policy/setup-row-policy.adoc#_row_policy[Row Policy] already includes the context functions. === current_roles() diff --git a/modules/user-access/pages/rbac-row-policy/row-policy-overview.adoc b/modules/user-access/pages/rbac-row-policy/row-policy-overview.adoc index e0da81c2..067e3442 100644 --- a/modules/user-access/pages/rbac-row-policy/row-policy-overview.adoc +++ b/modules/user-access/pages/rbac-row-policy/row-policy-overview.adoc @@ -13,13 +13,13 @@ Preview Features should not be used for production deployments. == User Guide The row policy user guide has two parts: -. xref:tigergraph-server:user-access:rbac-row-policy/rbac-row-policy.adoc[] - Learn the key concpets and features that make up row policy. -. xref:tigergraph-server:user-access:rbac-row-policy/setup-row-policy.adoc[] - Learn how to setup a basic row policy using an example dataset. +. xref:rbac-row-policy/rbac-row-policy.adoc[] - Learn the key concpets and features that make up row policy. +. xref:rbac-row-policy/setup-row-policy.adoc[] - Learn how to setup a basic row policy using an example dataset. -== xref:tigergraph-server:user-access:rbac-row-policy/row-policy-privileges-table.adoc[] +== xref:rbac-row-policy/row-policy-privileges-table.adoc[] Here you can find the Object-Based privilege tables for reference. -== xref:tigergraph-server:user-access:rbac-row-policy/row-policy-ebnf.adoc[Row Policy EBNF] +== xref:rbac-row-policy/row-policy-ebnf.adoc[Row Policy EBNF] Here you can find the row policy EBNF examples for reference. == Row Policy Limitations @@ -37,4 +37,4 @@ Here you can find the row policy EBNF examples for reference. ** Blueprint function `outdegree()` ** Commands that are related to built-in functions on graph, such as: *** `select count(*)` from `vertexType` -*** xref:tigergraph-server:API:built-in-endpoints.adoc[]. \ No newline at end of file +*** xref:API:built-in-endpoints.adoc[]. \ No newline at end of file diff --git a/modules/user-access/pages/rbac-row-policy/setup-row-policy.adoc b/modules/user-access/pages/rbac-row-policy/setup-row-policy.adoc index 71e507d8..b85f35c1 100644 --- a/modules/user-access/pages/rbac-row-policy/setup-row-policy.adoc +++ b/modules/user-access/pages/rbac-row-policy/setup-row-policy.adoc @@ -4,7 +4,7 @@ This guide assumes some basic understanding of xref:rbac-row-policy/rbac-row-policy.adoc[]. -Additionally, before going through the examples below it's a good idea to read through creating a user and assigning roles in our current documentation on xref:tigergraph-server:user-access:user-management.adoc[] and xref:tigergraph-server:user-access:role-management.adoc[]. +Additionally, before going through the examples below it's a good idea to read through creating a user and assigning roles in our current documentation on xref:user-management.adoc[] and xref:role-management.adoc[]. Lastly, it helps to create users or have users ahead of time, one with the role of `superuser` and with a role other than `superuser`, to switch between and see the row policies take effect. @@ -20,9 +20,9 @@ Here's what you need to know: === Creating Row Policies This guide will give an example of how to set a new row policy for graph `Social_Net`. -For the schema and data of graph `Social_Net`, refer to xref:gsql-ref:appendix:example-graphs.adoc[] and for more information on loading data and defining a schema in GSQL. +For the schema and data of graph `Social_Net`, refer to xref:{page-component-version}@gsql-ref:appendix:example-graphs.adoc[] and for more information on loading data and defining a schema in GSQL. -See our documentation on xref:gsql-ref:tutorials:gsql-101/load-data-gsql-101.adoc[Load Data] and xref:gsql-ref:tutorials:gsql-101/define-a-schema.adoc[Define a Schema] in the xref:gsql-ref:tutorials:gsql-101/index.adoc[] tutorial. +See our documentation on xref:{page-component-version}@gsql-ref:tutorials:gsql-101/load-data-gsql-101.adoc[Load Data] and xref:{page-component-version}@gsql-ref:tutorials:gsql-101/define-a-schema.adoc[Define a Schema] in the xref:{page-component-version}@gsql-ref:tutorials:gsql-101/index.adoc[] tutorial. In this example, a row policy will be applied on the vertex `Person`, only the users who have the role, `superuser`, can see all vertices of type `Person`, other users can only access the vertices whose attribute `gender` is `Male`. @@ -222,7 +222,7 @@ Hence, when using "person2" as the parameter, this query will raise an exception === Applying Row Policies Beyond Queries Row policies also influence built-in functions and REST API endpoints. The behavior of these functions and endpoints depends on whether the user is authorized to access certain vertices. -For more information on REST API and REST API endpoints see the documentation on xref:tigergraph-server:API:index.adoc[] and xref:tigergraph-server:API:built-in-endpoints.adoc[]. +For more information on REST API and REST API endpoints see the documentation on xref:API:index.adoc[] and xref:API:built-in-endpoints.adoc[]. .Row polices can be applied to these endpoints: [cols="1", separator=¦ ] diff --git a/modules/user-access/pages/role-management.adoc b/modules/user-access/pages/role-management.adoc index a03fc0e9..a4690056 100644 --- a/modules/user-access/pages/role-management.adoc +++ b/modules/user-access/pages/role-management.adoc @@ -8,7 +8,7 @@ To see role management tasks under the Access Control List (ACL) model, see xref [NOTE] ==== -For Row Policy and Object-Based privilege EBNF examples see xref:tigergraph-server:user-access:rbac-row-policy/row-policy-ebnf.adoc[]. +For Row Policy and Object-Based privilege EBNF examples see xref:rbac-row-policy/row-policy-ebnf.adoc[]. ==== == Create a local role @@ -208,7 +208,7 @@ GSQL > GRANT PRIVILEGE WRITE_QUERY, WRITE_ROLE ON GRAPH example_graph TO role1 , role2 ---- -This will allow users with the roles `role1` and `role2` to edit and install queries, as well as modify roles on the graph `example_graph`. To see a full list of privileges and the command they allow users to run, see xref:reference:list-of-privileges.adoc[]. +This will allow users with the roles `role1` and `role2` to edit and install queries, as well as modify roles on the graph `example_graph`. To see a full list of privileges, see the xref:user-access:rbac-row-policy/row-policy-privileges-table.adoc[]. To grant xref:access-control-model.adoc#_access_control_lists[ACL privileges] to a role, see xref:acl-management.adoc#_grant_acl_privilege_to_a_role[Grant ACL privileges to a role]. diff --git a/modules/user-access/pages/sso-with-oidc.adoc b/modules/user-access/pages/sso-with-oidc.adoc new file mode 100644 index 00000000..9ab1253b --- /dev/null +++ b/modules/user-access/pages/sso-with-oidc.adoc @@ -0,0 +1,172 @@ += SSO with OIDC +:description: Instructions to set up single sign-on for TigerGraph using OpenID Connect (OIDC) with verified identity providers. + +== Overview +This guide demonstrates how to configure *PingFederate* as an *OIDC Identity Provider (IdP)* and *TigerGraph* as a *Service Provider (SP)*, enabling users to log in securely using *OpenID Connect (OIDC)* for *Single Sign-On (SSO)*. + +PingFederate is a federation server that supports identity management, SSO, and API security using various identity standards like *OAuth*, *SAML*, and *OpenID Connect (OIDC)*. We also have instructions for xref:user-access:sso-with-saml.adoc[SSO with SAML]. + +This guide will help you set up PingFederate for OIDC and integrate it with TigerGraph to provide a seamless login experience. + +== Prerequisites + +Before proceeding, ensure you have the following: + +* PingFederate Version 12.1.0.4 or later is installed and running. +* TigerGraph instance (v3.10 or later) accessible at `http://:14240`. +* Administrative access to both PingFederate and TigerGraph servers. +* A domain name to avoid SSL certificate issues when accessing PingFederate. + +== Install PingFederate + +Follow these steps to install PingFederate on your server (details on system requirements are available in the https://www.pingidentity.com/en/resources/downloads/pingfederate.html[PingFederate installation documentation]) + +. Download and install PingFederate by following the instructions on the PingFederate Download Page. +. Set up Java and ensure ports 9999 (admin console), 9031 (runtime), and 443 (SSL) are open on PingFederate. +. Access PingFederate using a web browser at `https://:9999/`. + +== Configure PingFederate for OIDC + +To configure PingFederate for OIDC, follow these steps: + +=== Add a User + +. Log in to the PingFederate Console. +' Navigate to Administrative Accounts to add a new user. +. Add a username, e.g., `test`. Add an email, e.g., `test1@.com`. + +=== Create Password Credential Validator (PCV) + +. In PingFederate, go to Password Credential Validator. +. Create a SimplePCV instance and select “Simple Username Password Credential Validator” as type. +. Save the PCV. + +=== Create IDP Certificate + +. Navigate to “Signing & Decryption Keys & Certificates” in PingFederate. +. Add a certificate for the *IDP Server* using the `` + +=== Add Access Token Manager + +. Navigate to *Applications > OAuth > Access Token Management*. +. Click Create New Instance. +. In the Type tab, input the name of the instance (e.g., `TigerGraphTokenManager`). +. In the Session Validation section, enable the following options: +* Select Include Session Identifier in Access Token. +* Check Update Authentication Session Activity. +. Under the Access Token Attribute Contract, add the attribute Extend the Contract and select `usernameFromATM`. +. In the Resource URIs tab, add the following URI: +* `http://:14240/api/auth/oidc/callback` +. In the Access Control tab, add your client (e.g., the client you created earlier). +. Click Save to apply the changes. + +=== Create OIDC Policy + +. Go to *Applications > OAuth > OpenID Connect Policy Management*. +. Add a new policy and set the Access Token Manager to the one you created above (`TigerGraphTokenManager`). +. In the Attribute Scopes tab, include `email` scope. + +=== Configure Access Token Mappings + +. Navigate to *Applications > OAuth > Access Token Mappings*. +. Select your context and access token manager, then click Add Mapping. +. In the Contract Fulfillment tab, configure the following: +* Select your contract. +* Choose Adapter for the source and username for the value. +. Click Save to finalize the mapping. + +=== Add IdP Adapter + +. Go to *Authentication > Integration > IdP Adapters* and click Create Adapter Instance. +. Choose HTML Form IdP Adapter as the Instance Type. +* Set Instance Name and Instance ID to `SSOTestIdPHTML`. +. Click Next and add SimplePCV as the Password Credential Validator. +* Set Session State to Per Adapter. +. In the Core Contract section, select username. +. Click Next, and in Adapter Attributes, check Pseudonym for username. +. In Adapter Contract Mapping, map username to `$(username)`. +. Click Done, review the summary, and click Finish. + +=== Create Client + +. Navigate to *Applications > OAuth*. Click Add Client +. Input the following: +* Client ID: Choose a unique identifier for the client (e.g., `TigerGraphClient`). +* Name: Provide a meaningful name (e.g., `TigerGraph OIDC Client`). +. For Client Authentication, select Client Secret and click Generate Secret. +* Important: Make sure to save the generated secret as you will need it in the TigerGraph configuration. +. Under Redirect URIs, input the URI: +* `http://:14240/api/auth/oidc/callback` +. Under Restrict Common Scopes, select the appropriate scope (e.g., `openid, email, profile`). +. For Allowed Grant Types, select: +* Authorization Code and Client Credentials. +. In Access Token Validation, select the Default Access Token Manager you created earlier. +* Choose Restrict to Default Access Token Manager. +. For the ID Token Signing Algorithm, select `RSA using SHA-256`. +. Click Save to apply the configuration. + +== Set Up SP Connection in PingFederate for OIDC + +In PingFederate, create an SP connection for OIDC by following these steps: + +. Navigate to Applications > Integration > SP Connections (hyperlink to SP Connections) and click Create Connection. +. Select Browser SSO Profiles connection template and click Next. +. On the Connection Options page, check Browser SSO and click Next. +. Skip the Metadata URL step and click Next. +. Enter Partner’s Entity ID: `https://:9031` and Base URL: +`http://:14240`, then click Next. +. Click Configure Browser SSO on the Browser SSO tab. +. Enable both IdP-Initiated SSO and SP-Initiated SSO on the SAML Profiles tab, then click Next. +. Set the Assertion Lifetime and click Next. +. Choose Standard Identity Mapping on the Assertion Creation tab, then click Next. +. Change Subject Name Format to +`urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified` or +`urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress`, then click Next. +. Select the Authentication Source and Adapter Instance, then click Next. +. Set SAML_SUBJECT Source to your adapter and Value to username, then click Next. +. Specify any authorization conditions (optional), then click Next and Done. +. On the Protocol Settings tab, set Binding to POST and Endpoint URL to `http://:14240/api/auth/saml/acs`, then click Next. +. On the Signature Policy tab, check SIGN RESPONSE AS REQUIRED and click Next. +Click Done on the Protocol Settings Summary and then Next on the Credentials tab. +. Select the signing certificate created earlier, click Next, and then click Done on the Digital Signature Settings Summary. +. On the Activation & Summary tab, review the settings and click Save. + +== Configure and Test OIDC in TigerGraph + +=== Configure OIDC in TigerGraph using gadmin + +. On the TigerGraph server, use the following gadmin command to enter OIDC settings: +* gadmin config entry OIDC +. Fill in the following parameters using the metadata found at the provided https://34.231.78.134:9031/.well-known/openid-configuration[OpenID Configuration URL] + +[cols="2", options="header",] +|=== +|Parameter |Value +|Security.SSO.OIDC.Enable |`true` +|Security.SSO.OIDC.CallBackUrl |`http://:14240` +|Security.SSO.OIDC.ResponseType |`code` +|Security.SSO.OIDC.Scope |`openid profile email` +|Security.SSO.OIDC.OP.SSOUrl |`https://:9031/as/authorization.oauth2` +|Security.SSO.OIDC.OP.Issuer | `https://:9031` +|Security.SSO.OIDC.OP.ClientId | `TigerGraphTest` +|Security.SSO.OIDC.OP.ClientSecret |`` +|=== + +3 Apply the configuration: +`gadmin config apply -y` + +=== Final Steps in Admin Portal + +For the final steps, see the xref:{page-component-version}@gui:admin-portal:security/sso-oidc-okta.adoc#_3_setup_oidc_button[Admin Portal documentation] for detailed instructions on configuring users and verifying OIDC login. + +== Troubleshooting + +[cols="2", options="header",] +|=== +|Issue |Solution +|SSL Certificate Warnings |Use a domain (e.g., `idp.example.com`) instead of an IP for PingFederate. +|Invalid Client Secret |Ensure the secret matches the one configured in PingFederate. +|User Not Authorized |Verify proxy group rules and role assignments in Admin Portal. +|Login Redirect Failures |Confirm the Redirect URI in PingFederate matches +`http://:14240/api/auth/oidc/callback`. +|=== diff --git a/modules/user-access/pages/sso.adoc b/modules/user-access/pages/sso-with-saml.adoc similarity index 96% rename from modules/user-access/pages/sso.adoc rename to modules/user-access/pages/sso-with-saml.adoc index 2daf8e81..42008145 100644 --- a/modules/user-access/pages/sso.adoc +++ b/modules/user-access/pages/sso-with-saml.adoc @@ -1,17 +1,19 @@ -= Single Sign-On -:description: Instructions to set up single sign-on for TigerGraph with verified identity providers. += SSO with SAML +:description: Instructions to set up single sign-on for TigerGraph with verified identity providers. +:page-aliases: sso.adoc :experimental: :sectnums: Single sign-on (SSO) enables you to use your organization's identity provider (IDP) to authenticate users to access TigerGraph GraphStudio and Admin Portal UI. +We also have instructions for xref:user-access:sso-with-oidc.adoc[SSO with OIDC]. We have verified the following IDPs that support SAML 2.0 protocol: -* https://www.okta.com/[Okta] -* https://auth0.com/[Auth0] -* https://docs.microsoft.com/en-us/azure/active-directory/[Azure Active Directory (Azure AD)] -* https://docs.pingidentity.com/bundle/pingfederate-110/page/ikr1564002999528.html[PingFederate] -* https://learn.microsoft.com/en-us/windows-server/identity/active-directory-federation-services[Active Directory Federation Services (AD FS)] +* https://www.okta.com/[Okta.com] +* https://auth0.com/[Auth0.com] +* https://docs.microsoft.com/en-us/azure/active-directory/[Azure Active Directory (Azure AD at microsoft.com)] +* https://docs.pingidentity.com/bundle/pingfederate-110/page/ikr1564002999528.html[PingFederate.com] +* https://learn.microsoft.com/en-us/windows-server/identity/active-directory-federation-services[Active Directory Federation Services (AD FS at microsoft.com)] For supporting additional IDPs, please contact sales@tigergraph.com and submit a feature request. @@ -209,7 +211,7 @@ image::adfs-sso-step-6.png[] ==== Configure TigerGraph After configuring AD FS as described previously, you must now configure TigerGraph to accept the connection. -This is handled in Admin Portal on the SSO page. xref:gui:admin-portal:security/sso.adoc[] +This is handled in Admin Portal on the SSO page. xref:{page-component-version}@gui:admin-portal:security/sso.adoc[] * In the field btn:[Identity Provider's X509 certificate], use the certificate exported in Step #4 above. @@ -295,7 +297,15 @@ Refer to your identity provider's documentation to determine which options to us Besides providing the SSO information in the UI, you also have the option of providing the information using `gadmin config` through the command-line. Below is the list of parameters you need to configure. -You can run xref:system-management:management-with-gadmin.adoc#_gadmin_config_set[`gadmin config set`] to configure their value non-interactively, or run `gadmin config entry Security.SSO.SAML` to configure their values interactively in the terminal. +You can run xref:system-management:management-commands.adoc#_gadmin_config_set[`gadmin config set`] to configure their value non-interactively, or run `gadmin config entry Security.SSO.SAML` to configure their values interactively in the terminal. + +After changing these configuration values, you must run the following `gadmin` commands: + +[,console] +---- +gadmin config apply +gadmin restart gsql +---- |=== |Name | Description | Example diff --git a/modules/user-access/pages/user-credentials.adoc b/modules/user-access/pages/user-credentials.adoc index e26ce350..010efc24 100644 --- a/modules/user-access/pages/user-credentials.adoc +++ b/modules/user-access/pages/user-credentials.adoc @@ -8,7 +8,7 @@ The TigerGraph platform offers three options for credentials: * A username-password pair used to log in to GSQL and make HTTP requests. * An ACL password used to run commands to alter the ACL privileges of a query. -* A token - a unique 32-character string with an expiration date, used for REST{pp} requests. See the full xref:tigergraph-server:API:authentication.adoc[API Authentication] documentation for details. +* A token - a unique 32-character string with an expiration date, used for REST{pp} requests. See the full xref:API:authentication.adoc[API Authentication] documentation for details. The following set of commands are used to create and manage passwords and secrets. diff --git a/modules/user-access/pages/user-management.adoc b/modules/user-access/pages/user-management.adoc index 98e1cfdb..9abcd002 100644 --- a/modules/user-access/pages/user-management.adoc +++ b/modules/user-access/pages/user-management.adoc @@ -143,15 +143,16 @@ For more information, see xref:security:login-protection.adoc[] If the user running the command has the `READ_USER` privilege, information on all users is displayed. Otherwise, only the current user's information is displayed. -== View privileges of a user +== View privileges of a user or proxy group -Users with the `READ_USER` privilege in a scope can view the RBAC privileges of the users in that scope. +Users with the `READ_USER` privilege in a scope can view the RBAC privileges of users or proxy groups within that scope. === Syntax [source,gsql] ---- SHOW PRIVILEGE ON USER (, )* +SHOW PRIVILEGE ON USER (, )* ---- === Required privilege @@ -160,14 +161,19 @@ SHOW PRIVILEGE ON USER (, )* === Procedure -. From the GSQL shell, run the `SHOW PRIVILEGE ON USER` command : +. From the GSQL shell, run the `SHOW PRIVILEGE ON USER` command with either a username or proxy group name: + [source,gsql] ---- GSQL > SHOW PRIVILEGE ON USER tigergraph ---- +or +[source,gsql] +---- +GSQL > SHOW PRIVILEGE ON USER proxy_group1 +---- -The above command will show the privileges of user `tigergraph`: +The command displays the privileges assigned to the specified user or proxy group. [source,text] ---- @@ -198,10 +204,22 @@ User: "tigergraph" ACCESS_TAG ---- +When you run the command for a proxy group, it displays the privileges granted to that proxy group. + +[source,text] +---- +ProxyGroup: "proxy_group1" + - Global Privileges: + READ_DATA + WRITE_DATA + READ_USER + READ_PROXYGROUP +---- + To view xref:access-control-model.adoc#_access_control_lists[ACL privileges] of a user, see xref:acl-management.adoc#_view_acl_privileges_of_a_user_[View ACL privileges of a user]. [#_grant_a_role_to_a_user] -== Grant a role to a user/proxy group +== Grant a role to a user or proxy group === Syntax diff --git a/modules/user-access/pages/vlac.adoc b/modules/user-access/pages/vlac.adoc index 575b245f..133fed93 100644 --- a/modules/user-access/pages/vlac.adoc +++ b/modules/user-access/pages/vlac.adoc @@ -64,7 +64,7 @@ Features not yet supported: In summary, all necessary operations to set up VLAC graphs and users are supported in GSQL. Due to a known bug, standard users (with `querywriter` and `queryreader` roles) can run some DDL operations which they should not be able to. ==== -We'll use the graph xref:gsql-ref:appendix:example-graphs.adoc#_social_net[socialNet] as an example in the following sections. +We'll use the graph xref:{page-component-version}@gsql-ref:appendix:example-graphs.adoc#_social_net[socialNet] as an example in the following sections. == Tag Management @@ -259,10 +259,10 @@ There are three main options for tagging vertices in the base graph. === Add tags on existing data -In GSQL, special vertex methods are provided to access and modify the tags of a vertex in a DML query (full list available on page xref:gsql-ref:querying:func/vertex-methods.adoc[Vertex Methods]). +In GSQL, special vertex methods are provided to access and modify the tags of a vertex in a DML query (full list available on page xref:{page-component-version}@gsql-ref:querying:func/vertex-methods.adoc[Vertex Methods]). These functions are only available for vertex aliases (defined in the `FROM` clause of a `SELECT` statement); they cannot be applied to vertex variables in other contexts. -There are xref:gsql-ref:querying:func/vertex-methods.adoc[8 DML-level tag-access functions] in the vertex-query block or edge-query block. Use the xref:gsql-ref:querying:func/vertex-methods.adoc#_addtags[addTags()] function to tag a vertex. +There are xref:{page-component-version}@gsql-ref:querying:func/vertex-methods.adoc[8 DML-level tag-access functions] in the vertex-query block or edge-query block. Use the xref:{page-component-version}@gsql-ref:querying:func/vertex-methods.adoc#_addtags[addTags()] function to tag a vertex. ==== Required privilege @@ -301,7 +301,7 @@ CREATE QUERY addTagsToPerson() { } ---- -Use xref:gsql-ref:querying:func/vertex-methods.adoc#_removetags[Remove tags] and xref:gsql-ref:querying:func/vertex-methods.adoc#_removealltags[Remove all tags] to remove tags from vertices: +Use xref:{page-component-version}@gsql-ref:querying:func/vertex-methods.adoc#_removetags[Remove tags] and xref:{page-component-version}@gsql-ref:querying:func/vertex-methods.adoc#_removealltags[Remove all tags] to remove tags from vertices: [source,gsql] ---- @@ -532,7 +532,7 @@ The output of the query would be: Users with global `WRITE_SCHEMA` and `ACCESS_TAG` privileges can create, modify and drop tags, as well as create tag-based graphs for all graphs. [discrete] -==== On the base graph +=== On the base graph Users with roles on the base graph that have the `ACCESS_TAG` privilege (e.g.`admin` and `designer` roles) can create/drop tags, and tag vertices. Users that have both the `ACCESS_TAG` privilege and `WRITE_SCHEMA` privilege (e.g. `admin` and `designer` roles) can create/drop tag-based graphs of the base graph. @@ -550,7 +550,7 @@ Users who are given roles on a tag-based graph have the privileges on the tag-ba == Sample Use Cases [discrete] -==== *Scenario I* +=== *Scenario I* *Problem* @@ -567,7 +567,7 @@ The base graph admin can do the following security setup. . *Grant users permission to the tag-based graph*. On the tag-based graph B, grant roles that have the appropriate privileges for graph `B` to the target users. [discrete] -==== *Scenario II* +=== *Scenario II* *Problem* @@ -584,7 +584,7 @@ The base graph `admin` user can do the following setup. . *Grant roles on the tag-based graph*. On the tag-based graph `B`, grant roles that have the appropriate privileges for the graph `B` to target users. [discrete] -==== *Scenario III* +=== *Scenario III* *Problem*