Kotahi data supply
Background[edit | edit source]
Kotahi provides drops of Kotahi data for us to use for reporting/analysis at or around 12am daily.
Detail[edit | edit source]
An S3 target has been created at s3://kpt-kotahi. Karo have permission to push data here.
Data arrives every day around midnight, following naming convention of prefix: karo_upload<timestamp>, with individual datasets underneath it.
30+ files are provided currently, and these - more or less - each relate to a database table. They are all 'complete' datasets per each load, with the exception of activities which is deltas only from prior export.
Files are provided in CSV format.
Data Transfer[edit | edit source]
The diagram below outlines the Kotahi data load process, showing how files are ingested, transformed, deduplicated, and written into the reporting schema for use in analytics and dashboards.
Datasets[edit | edit source]
Table below summarises supplied data:
| Name | Description | Notes |
|---|---|---|
| Activities | Logs of client interactions, including type, date, duration, and additional details stored in CustomJSONData.[1] Note | The CustomJSONData field requires proper escaping of quotes within strings. For example, ""\hello""\ inside a string should be escaped as \\\"hello\\\". Ensure data is validated and fixed before import to avoid parsing issues.
|
| ActivityTypes | Categorises activities/forms, e.g.,"Narrative Case Notes" "Requisition Approval" | |
| AssignedUsers | Tracks which users are assigned to specific clients. | |
| ClientCorrespondence | Records communications between clients and providers. | |
| ClientIwi | Links clients to their associated iwi. Not in use for now. | |
| ClientRelationshipParticipants | Defines roles in client relationships, like "Parent" or "Sibling" . | |
| ClientRelationshipTypes | Defines valid relationship types between participants, such as "Parent-Child" or "Siblings". | |
| ClientRelationships | Tracks relationships between clients (e.g., parent-child). | |
| ClientServiceStatus | Tracks the status of a client in a service, e.g., "Active" or "Inactive".[2] | |
| ClientTasks | Stores tasks assigned to clients, including due dates and statuses. | |
| Clients | Core client data, including personal details like name and DOB. | |
| CorrespondenceType | Defines types of communication | |
| CustomActivityFieldsData | Stores custom data fields linked to specific activities. | |
| Dhbs | Lists district health boards, their regions, and their respective ID. | |
| Ethnicities | Catalogues ethnicity options for demographic data. | |
| Iwi | Lists specific iwi (tribes) associated with clients or users. | |
| IwiGroupings | Groups iwi into broader categories. | |
| MessageTemplates | Predefined message templates for correspondence. | |
| Providers | Details about service providers, e.g., "Tui Medical" or "Dr Jane Doe" |
|
| ServiceStatus | Defines statuses for services, e.g., "New Referral" or "Completed." [2] | |
| Services | Information about available services, including their names and descriptions. | |
| UserLocalities | Tracks the geographic areas assigned to specific users. | |
| UserServices | Links users to the services they are involved with. | |
| Users | Stores user details, including names, roles, and contact information. | |
| Hauora data | Data about hauora goal plans | Hauora goal plans are currently not being utilised at time of writing. |
| Calendar data | Appointments data including schedules, availability, and events | Includes calendars, schedule entries, event details, and linked clients |
| Group activities data | Group activities data including details about participants and staff etc. |
- ↑ The CustomJSONData field in Activities stores flexible, structured data in JSON format. It is used for capturing additional details such as specific requests, health goals, or screening statuses, etc. enabling the retrieval of custom attributes.
- ↑ 2.0 2.1 Although the ClientServiceStatus table tracks a client's service status (e.g., "Active," "Inactive"), we are using the latest "Activity Date" on Power BI to dynamically determine activity status, addressing cases where statuses are stuck on "New Referral" for example
Materialized Views for Reporting[edit | edit source]
Technical Implementation[edit | edit source]
The following materialized views have been created in the kotahi schema to support reporting requirements:
| Materialized View | Purpose | Refresh Schedule |
|---|---|---|
mvw_activities_summary |
Details of kotahi activities | 00:45 daily |
mvw_client_services |
Shows active/inactive service relationships with status history | 00:50 daily |
mvw_client_summary |
Core client demographics + household/location metadata | 00:55 daily |
mvw_tupaia_report |
Kotahi data relating to tupaia service | 01:05 daily |
mvw_tupaia_stages |
Tracks client progression through service stages | 01:15 daily |
mvw_tupaia_stages_counts |
Count of tupaia clients in each stage of tupaia pathways | 01:25 daily |
Cron Job Configuration[edit | edit source]
Refresh schedules are managed via PostgreSQL's pg_cron extension:
-- Scheduled between 00:45-01:25 daily
SELECT cron.schedule('45 0 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_activities_summary$$);
SELECT cron.schedule('50 0 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_client_services$$);
SELECT cron.schedule('55 0 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_client_summary$$);
SELECT cron.schedule('5 1 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_tupaia_report$$);
SELECT cron.schedule('15 1 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_tupaia_stages$$);
SELECT cron.schedule('25 1 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_tupaia_stages_counts$$);
JSON data[edit | edit source]
The customfielddatajson field contains the contents of any activity. Obviously this varies between activities, and over time (if activities are modified). We will likely need some kind of mapping document to keep track of field changes and/or multiple fields which we need to perform analysis/calculations on.[1]
A fairly nice way to unnest JSON for specific tasks can be done via:
select distinct
a.clientid,
j."RequestDate",
j."RequestProduct1",
j."RequestAmountTotal"
from kotahi.activities a,
jsonb_to_record(customfielddatajson)
as j(
"RequestDate" date,
"RequestProduct1" text,
"RequestAmountTotal" float
)
where activitytypeid = 27 --Whanau ora request submission
Pinnacle Workplan[edit | edit source]
Current state[edit | edit source]
Database schema and target objects built (staging/rpt) Staging tables being manually loaded weekly
TO DO[edit | edit source]
- Automation of data pickup and load to staging
- StepFunction:
- S3 trigger -->
- Lambda invocation -->
- Write data to staging
- Truncate/reload to rpt
Activitiesneeds to be merged/upserted
- Considerations:
- Audit data (inc. diagnostics where applicable)
- Error logging
- Point in time reporting
- Exclude confidential narratives
Required analysis objects in rpt[edit | edit source]
Propose use of materialized views (?) to build objects showing us the following:
- Family views
- Location
- Household size
- Demographic/descriptive info
- Link to Indici[2]
- Cross-reference Indici<>Kotahi data.[3]
- Point in time
- How many cases on this date
- Family risk/need over time
- Rate of change in risk/need over time
- Goal plan analysis
- Goals/needs met
- Mapping of service-level goals/domains to KPT or other wellbeing domains
- Duration/effort
- Multiple staff/ multiple clients
- Activity duration
- Effort/duration and link to goal completion or change in risk/need
ADD MORE HERE!!