-
Notifications
You must be signed in to change notification settings - Fork 0
Data Best Practices
App contains the app name, which should be single words, up to 16 characters.
This field is used to capture the version of the client software (i.e., the iOS app, Android app, PC client). Example values:
- 0.1, 1.2.0, 3.0.2a
For app functioning as a client-server pair, this field is used to capture the version of the server software. Example values:
- 0.1, 1.2.0, 3.0.2a
Timestamp field set by Data Cortex to track when the event was received and loaded in to the Data Cortex system.
Timestamp field used to track when the event took place.
When events are sent directly from the client, this value captures the client system time, which may have been altered by the user. In this case, event_datetime may be captured that are not consistent with other events (for example, in-app actions recorded before an install event) or with the ingest_datetime (for example, event recorded as taking place after the ingest_datetime).
(View only) This field contains the same data as stored in event_datetime, translated into your organization's reporting timezone.
This field is used used to capture the hardware type the user is on.
- desktop - used for all normal computers (laptop or desktop)
All other devices should be represented by values captured from the system:
- iPad5.3, iPhone3,1, Nexus 9, SAMSUNG-SM-G900A, LG-D950G
This field is used to capture device information at a generalized level:
- ipad, iphone, android_tablet, android, desktop, etc.
No standard set of device_family values is specified. Rather, this field is dedicated to groupings determined by the app developer. The exact nature of the grouping is to be defined in the client logic and sent as part of the event bundle.
This field is used to capture an ID representing the user. Data Cortex user logic determined the exact value filled in here based on identifying information sent as part of the event bundle.
This field is used to capture the operating system running on the client device. The following values provide complete coverage:
- windows, ios, android, mac, linux
This field is used to capture the version of the operating system running on the client. For example:
- 4.0, 4.0.1, 6.0.1, 8.0
For applications running in a browser, this field is used to capture the browser name. For Example:
- chrome, firefox, safari
For applications running in a browser, this field is used to capture the version of the browser. For Example: 43, 44, 10.0.9
This field is used to capture the distribution channel that originated the install of the app. For example:
- itunes, google, samsung, steam
This field is used to capture the ISO 3166-1 alpha-2 country code representing the user's location. All valid entries should be 2 letters and upper case. A list of valid country codes can be found here.
- ZZ -- default value when no country code found
The preferred method is to send the country code captured from the client device with each bundle. For apps where that information is not available, Data Cortex supports IP-based geolocation (this optional feature must be turned on as part of your organization's configuration). IP-based geolocation is not as accurate as information gathered directly from client devices.
This field is used to capture the language code from client system. Example values:
- en-US, en-GB, es-ES, es-MX
This field is used to capture the user's network connection status at the time the event was generated. This field should only be populated with the following values:
- online-wifi, online-cell, online-wired, online, offline
The generic "online" should only be used when the specific nature of the user's connection cannot be determined.
This field is used to identify fields that are created in the same context.
By default, most applications create a session ID and store it here to enable easy analysis of actions taking place in the same session.
Depending on the exact nature of the app, other contexts may be preferred. For example, group_tag may be used to track multiple rows of data generated from the same round of play or from the same social interaction.
The seven taxonomy fields are used to capture details about what action the user took to generate the event. The values stored in these fields should follow several principles to maximize query performance, slicer support, and ease of analysis:
- Increasing detail
- Values stored in the taxonomy fields all describe the same event, but at increasing levels of detail.
- As such, the kingdom level should establish the context of the event, for example: session, combat, page_view, inventory
- phylum should then provide more detail. For kingdom=session, phylum values might be session_start and session_end. For kingdom=page_view, phylum={page_name}
- class then provides yet more detail. For kingdom=session, phylum=session_start, class values might be cold_start or from_background
- Eliminate "wasted" characters
- If all values in a field start with "character_", then that part of the field is not conveying any information and can be eliminated without an loss in analysis capability
- By removing these characters, the data size will be smaller. This may not have an appreciable impact at the time the data logging is being implemented, but when the database contains millions or billions of records, the query performance improvements will be significant.
- Avoid numeric results
- Numeric results should be stored in the one of the 4 float fields or the spend_amount field. For example:
- session duration, points scored, currency balance, damage done, price paid
- But Do Include Numeric Names
- Numeric names used to describe an event should be stored in the taxonomy fields. For example:
- step_01, level_05, discount_15, discount_20 (when tracking an offer shown)
- When unsure if a numeric value is a result or name, consider if aggregate metrics derived from the number would be meaningful. Would it make sense to determine the mean or median of the value? If so, then the value is likely a numeric result. If not, then it's likely a numeric name.
- Use readable names
- Especially with numeric names, it's important to include a descriptive prefix or suffix.
- Instead of capturing the user's level as 1, 2, 3, 4, it's better to store level_01, level_02, level_03.
- An analyst may still need context to know what "level" is being referenced, but it's much easier to understand than just "3".
- Zero-Padding
- When using numeric names, the numbers are stored as strings and will be sorted as such.
- To avoid having results return as 1,11,12,13,2,20,21,3, add leading zeros: 01,02,03, etc. Use as many 0's as needed to cause all possible values to sort properly
These 4 fields are used to capture numeric results related to the event, for example:
- session duration, points scored, currency balance, damage done, price paid
This field is used to track the name of the currency being spent or gained.