Editing
Indici data
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Background== [[Indici]] data is received automatically every day from Valentia. Ultimately, it is envisaged that this will form an important part of a broader K'aute data platform. But, for now, the focus is on picking up the data and bringing it to a manageable state so it can be used for analysis and contract reporting. ==Data transfer== Summary of the indici data transfer process is shown in the graph below: [[File:Indici_load_updated.drawio.png|800x800px]] Valentia sends daily delta files (Parquet format) to <code>arn:aws:s3:::kpa-valentia</code>. The IAM user <code>valentia</code> uses policy <code>valentia-import</code> for: * <code>s3:ListBucket</code> * <code>s3:PutObject</code> * <code>s3:GetObject</code> '''Trigger mechanism:''' An Amazon EventBridge rule detects file uploads and starts the Step Functions workflow. Lifecycle policies: # Move files to Intelligent tiering after a few days # Move to Archive format after longer periods ==Partitioning incoming data== When EventBridge detects a new file: # Triggers Step Functions workflow # Workflow calls <code>copyIndiciFiles</code> lambda # File moved to date-partitioned location in <code>kpa-indici-partitioned</code> Files are partitioned using their embedded timestamp. All files are processed regardless of content.<ref>Empty files are handled in later steps</ref> ==Data processing workflow== The Step Functions workflow coordinates: === Loading data === <code>indiciLoadAppointmentMedications</code> lambda: # Reads Parquet β pandas DataFrame # Processes datetimes (NaT β 'None') # Identifies dataset from filename # Pulls transformation rules from DynamoDB # Adds metadata (filename + timestamp) # Validates row count > 0 # Executes SQL to load into <code>indici_staging</code> # Logs to <code>auditlog</code> ''DB role:'' <code>lambda_writer</code><br> ''Requires:'' [[Psycopg2 layer]] === Deduplication === <code>rptDeduplication</code> lambda: # Performs UPSERTs to <code>rpt</code> tables # Handles conflicts using primary keys # Logs update/insert counts to <code>auditlog</code> ==Error handling== Workflow handles: * Empty files β Logs to CloudWatch and skips * Missing SQL definitions β Triggers static data loading * Other errors β Fails workflow and logs details ==Audit logging== All outcomes logged through <code>stateMachine_LogHandler</code> to CloudWatch: * <code>SuccessLogs</code>: Processed files * <code>EmptyFilesLogs</code>: Skipped empty files * <code>ErrorLogs</code>: Processing failures ==Infrastructure== '''Obsolete:''' * <s><code>pg_cron</code> jobs</s> * <s>Scheduled database functions</s> '''Current:''' * EventBridge rule * Step Functions state machine * CloudWatch log groups * Processing Lambdas ==Version control== {{See also|Source control}} All related code for this work should be checked into gitlab repo [https://gitlab.com/kaute-pasifika/malamalama malamalama]. ==TODO== * Update/Insert counts should be logged in <code>indici_staging</code>.<code>auditlog</code>. - Not sure if this has been implemented? ==References== [[Category:Indici]]
Summary:
Please note that all contributions to Kautepedia are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
Kautepedia:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Templates used on this page:
Template:See also
(
edit
)
Module:Arguments
(
edit
)
Module:Format link
(
edit
)
Module:Hatnote
(
edit
)
Module:Hatnote/styles.css
(
edit
)
Module:Hatnote list
(
edit
)
Module:Labelled list hatnote
(
edit
)
Module:Yesno
(
edit
)
Navigation menu
Personal tools
British English
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
British English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information