Kotahi data supply

From Kautepedia
Jump to navigation Jump to search

Background[edit | edit source]

Kotahi provides drops of Kotahi data for us to use for reporting/analysis at or around 12am daily.

Detail[edit | edit source]

An S3 target has been created at s3://kpt-kotahi. Karo have permission to push data here.

Data arrives every day around midnight, following naming convention of prefix: karo_upload<timestamp>, with individual datasets underneath it.

30+ files are provided currently, and these - more or less - each relate to a database table. They are all 'complete' datasets per each load, with the exception of activities which is deltas only from prior export.

Files are provided in CSV format.

Data Transfer[edit | edit source]

The diagram below outlines the Kotahi data load process, showing how files are ingested, transformed, deduplicated, and written into the reporting schema for use in analytics and dashboards.

Datasets[edit | edit source]

Table below summarises supplied data:

Name Description Notes
Activities Logs of client interactions, including type, date, duration, and additional details stored in CustomJSONData.[1] Note The CustomJSONData field requires proper escaping of quotes within strings. For example, ""\hello""\ inside a string should be escaped as \\\"hello\\\". Ensure data is validated and fixed before import to avoid parsing issues.
ActivityTypes Categorises activities/forms, e.g.,"Narrative Case Notes" "Requisition Approval"
AssignedUsers Tracks which users are assigned to specific clients.
ClientCorrespondence Records communications between clients and providers.
ClientIwi Links clients to their associated iwi. Not in use for now.
ClientRelationshipParticipants Defines roles in client relationships, like "Parent" or "Sibling" .
ClientRelationshipTypes Defines valid relationship types between participants, such as "Parent-Child" or "Siblings".
ClientRelationships Tracks relationships between clients (e.g., parent-child).
ClientServiceStatus Tracks the status of a client in a service, e.g., "Active" or "Inactive".[2]
ClientTasks Stores tasks assigned to clients, including due dates and statuses.
Clients Core client data, including personal details like name and DOB.
CorrespondenceType Defines types of communication
CustomActivityFieldsData Stores custom data fields linked to specific activities.
Dhbs Lists district health boards, their regions, and their respective ID.
Ethnicities Catalogues ethnicity options for demographic data.
Iwi Lists specific iwi (tribes) associated with clients or users.
IwiGroupings Groups iwi into broader categories.
MessageTemplates Predefined message templates for correspondence.
Providers Details about service providers, e.g., "Tui Medical" or "Dr Jane Doe"
  • Where the column Type = 0: General Practitioner.
  • Where the column Type = 1: Provider.
ServiceStatus Defines statuses for services, e.g., "New Referral" or "Completed." [2]
Services Information about available services, including their names and descriptions.
UserLocalities Tracks the geographic areas assigned to specific users.
UserServices Links users to the services they are involved with.
Users Stores user details, including names, roles, and contact information.
Hauora data Data about hauora goal plans Hauora goal plans are currently not being utilised at time of writing.
Calendar data Appointments data including schedules, availability, and events Includes calendars, schedule entries, event details, and linked clients
Group activities data Group activities data including details about participants and staff etc.


  1. The CustomJSONData field in Activities stores flexible, structured data in JSON format. It is used for capturing additional details such as specific requests, health goals, or screening statuses, etc. enabling the retrieval of custom attributes.
  2. 2.0 2.1 Although the ClientServiceStatus table tracks a client's service status (e.g., "Active," "Inactive"), we are using the latest "Activity Date" on Power BI to dynamically determine activity status, addressing cases where statuses are stuck on "New Referral" for example

Materialized Views for Reporting[edit | edit source]

Technical Implementation[edit | edit source]

The following materialized views have been created in the kotahi schema to support reporting requirements:

Materialized View Purpose Refresh Schedule
mvw_activities_summary Details of kotahi activities 00:45 daily
mvw_client_services Shows active/inactive service relationships with status history 00:50 daily
mvw_client_summary Core client demographics + household/location metadata 00:55 daily
mvw_tupaia_report Kotahi data relating to tupaia service 01:05 daily
mvw_tupaia_stages Tracks client progression through service stages 01:15 daily
mvw_tupaia_stages_counts Count of tupaia clients in each stage of tupaia pathways 01:25 daily

Cron Job Configuration[edit | edit source]

Refresh schedules are managed via PostgreSQL's pg_cron extension:

-- Scheduled between 00:45-01:25 daily
SELECT cron.schedule('45 0 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_activities_summary$$);
SELECT cron.schedule('50 0 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_client_services$$);
SELECT cron.schedule('55 0 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_client_summary$$);
SELECT cron.schedule('5 1 * * *',  $$REFRESH MATERIALIZED VIEW kotahi.mvw_tupaia_report$$);
SELECT cron.schedule('15 1 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_tupaia_stages$$);
SELECT cron.schedule('25 1 * * *', $$REFRESH MATERIALIZED VIEW kotahi.mvw_tupaia_stages_counts$$);

JSON data[edit | edit source]

The customfielddatajson field contains the contents of any activity. Obviously this varies between activities, and over time (if activities are modified). We will likely need some kind of mapping document to keep track of field changes and/or multiple fields which we need to perform analysis/calculations on.[1]

A fairly nice way to unnest JSON for specific tasks can be done via:

select distinct
a.clientid,
j."RequestDate",
j."RequestProduct1",
j."RequestAmountTotal"

from kotahi.activities a,
	jsonb_to_record(customfielddatajson) 
		as j(
			"RequestDate" date,
			"RequestProduct1" text,
			"RequestAmountTotal" float
			)

where activitytypeid  = 27 --Whanau ora request submission

Pinnacle Workplan[edit | edit source]

Current state[edit | edit source]

Database schema and target objects built (staging/rpt) Staging tables being manually loaded weekly

TO DO[edit | edit source]

  • Automation of data pickup and load to staging
    • StepFunction:
    • S3 trigger -->
    • Lambda invocation -->
    • Write data to staging
    • Truncate/reload to rpt
    • Activities needs to be merged/upserted
  • Considerations:
    • Audit data (inc. diagnostics where applicable)
    • Error logging
    • Point in time reporting
    • Exclude confidential narratives

Required analysis objects in rpt[edit | edit source]

Propose use of materialized views (?) to build objects showing us the following:

  • Family views
    • Location
    • Household size
    • Demographic/descriptive info
  • Link to Indici[2]
    • Cross-reference Indici<>Kotahi data.[3]
  • Point in time
    • How many cases on this date
    • Family risk/need over time
    • Rate of change in risk/need over time
  • Goal plan analysis
    • Goals/needs met
    • Mapping of service-level goals/domains to KPT or other wellbeing domains
  • Duration/effort
    • Multiple staff/ multiple clients
    • Activity duration
    • Effort/duration and link to goal completion or change in risk/need

ADD MORE HERE!!

References[edit | edit source]

  1. For example, financial summaries or client/aiga goal plans which span different services.
  2. Relies on NHI lkup in Indici.
  3. Where is this useful/valuable?