Editing
Step Functions
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Indici-State-Machine == The '''Indici-State-Machine''' is an AWS Step Functions state machine designed to process Indici files through a series of AWS Lambda functions. It uses an [[EventBridge]] rule to start the state machine automatically whenever a file is uploaded to the '''kpa-valentia''' S3 bucket. This event-driven setup ensures smooth and immediate processing of new data files. The state machine consists of the following steps: === Transform Event === * '''Type:''' Pass * '''Purpose:''' Prepares and reorganises the incoming file details (the bucket name and object key) so they can be easily processed by the following steps. * '''Trigger:''' Starts when an EventBridge rule detects a file upload to the kpa-valentia S3 bucket. * '''Output Location:''' Stores the transformed event details under `$.TransformedEvent`. * '''Key Details Processed:''' Extracts the bucket name and file name from the uploaded event and formats them into a list for easy processing. * '''Next Step:''' '''''Copy Indici Files''''' === Copy Indici Files === * '''Type:''' Task * '''Purpose:''' Copies the uploaded file to a specific location in a partitioned S3 bucket for further processing. * '''Function Used:''' Calls the copyIndiciFiles Lambda function. * '''Data Passed:''' Uses the TransformedEvent output by the '''''Transform Event''''' step. * '''Output Location:''' Stores the result under `$.CopyIndiciFilesResult`. * '''Retry Rules:''' Automatically retries if there are errors with the function, with increasing wait times between retries. * '''Catch''': Catches all errors (States.ALL) and transitions to '''''Check Error Type''''' with the error details in $.ErrorInfo. * '''Next Step:''' '''''Data Load to indici_staging RDS''''' === Data Load to indici_staging RDS === * '''Type:''' Task * '''Purpose:''' Moves the copied file data into the indici_staging database table for validation. * '''Function Used:''' Calls the indiciLoadAppointmentMedications Lambda function. * '''Data Passed:''' Combines the bucket name (now partitioned) and the location of the newly copied file is retrieved from $.CopyIndiciFilesResult.Payload.body.new_key. * '''Output Location:''' Stores the result under `$.DataLoadResult`. * '''Retry Rules:''' Automatically retries if there are errors with the function, with increasing wait times between retries. * '''Catch:''' Catches all errors and transitions to '''''Check Error Type.''''' * '''Next Step:''' '''''Deduplicate Indici Data''''' === Deduplicate Indici Data === * '''Type:''' Task * '''Purpose:''' Removes duplicate records from the database to ensure clean and accurate data. * '''Function Used:''' Calls the rptDeduplication Lambda function. * '''Data Passed:''' Uses the key from $.CopyIndiciFilesResult.Payload.body.new_key. * '''Output Location:''' Stores the result under `$.DeduplicationResult`. * '''Retry Rules:''' Automatically retries if there are errors with the function, with increasing wait times between retries. * '''Catch''': Catches all errors and transitions to '''''Check Error Type'''''. * '''Next Step:''' '''''Deduplicated''''' === Deduplicated === * '''Type:''' Pass * '''Purpose:''' A simple pass state that can reorganize certain data from previous steps as needed. * '''Parameters:''' Retains details such as CopyIndiciFilesResult, DataLoadResult, and DeduplicationResult. Initializes NoSQLDDResult to null (in case the next steps set it) * '''Output Location:''' Stores the transformed event details under `$.TransformedEvent`. * '''Key Details Processed:''' Extracts the bucket name and file name from the uploaded event and formats them into a list for easy processing. * '''Next Step:''' '''''Write Success Log''''' === Write Success Log === * '''Type:''' Task * '''Purpose:''' Logs the successful completion of the state machine and includes key file details for tracking. * '''Function Used:''' Calls the `stateMachine_LogHandler` Lambda function. * '''Log Details:''' * Log Group: stateMachine * Log Stream: SuccessLogs * Status: Success * Additional Information: Includes bucket name, file name, file size, and unique file identifier. * Success Message: "Execution completed successfully." * '''Next Step:''' '''''Success State''''' <blockquote> '''Error Handling''' - Errors are handled by sending them to Check Error Type, where conditions can route the execution flow to different error-handling states. '''Remark:''' Currently, from the State Machine CloudWatch logs, we cannot capture the filename for Lambda function timeout scenarios. If we encounter more Lambda function timeouts, we can capture the filename in the State Machine CloudWatch logs by using "ErrorEquals" and "Catch" to match error strings that come from Lambda exceptions (timeout) and generate the State Machine log with the filename. The only downside of this workaround is that the State Machine becomes more complex since we have to add a Catch for every Lambda function to capture the timeout keyword. We cannot use "ClientError" because the Lambda job quits as soon as the timeout occurs.</blockquote> === Check Error Type === * '''Type''': Choice * '''Purpose''': Evaluates the error captured in $.ErrorInfo.Cause and decides which path to follow. * '''Branches''': # '''File is Empty''': If StringMatches *File is Empty!*, transitions to '''''Write Empty File Error Log'''''. # '''Missing sql_dd attribute''': If StringMatches *No sql_dd attribute found*, transitions to '''''No sql_dd Data Load'''''. # '''Default''': All other errors go to '''''Write General Error Log'''''. === Write Empty File Error Log === *'''Type:''' Task *'''Purpose:''' Logs empty file errors to a dedicated CloudWatch Log Stream (EmptyFilesLogs). * '''Function Used:''' Calls the 'stateMachine_LogHandler' Lambda function. * '''Log Details''': # '''LogGroup''': stateMachine # '''LogStream''': EmptyFilesLogs # '''LogStream''': EmptyFilesLogs * '''Next Step:''' '''''Success State''''' === Write General Error Log === * '''Type:''' Task * '''Purpose:''' Logs any errors that are not recognized as βFile is Empty!β or missing sql_dd attribute.. * '''Function Used:''' Calls the `stateMachine_LogHandler` Lambda function. * '''Log Details:''' * Log Group: stateMachine * Log Stream: ErrorLogs or EmptyFilesLogs, depending on the error type. * Error Message: Extracted from the error details found in `$.ErrorInfo.Cause`. * '''Next Step:''' '''''Fail State''''' '''Handling Missing sql_dd Data -''' Certain files may lack the sql_dd attribute. In this case, the state machine routes the error to a special path to run an additional data load step. === No sql_dd Data Load === *'''Type:''' Task *'''Purpose:''' Runs an alternate data load process for files that do not contain the sql_dd attribute.. * '''Function Used:''' Calls the 'noSql_ddStaticDataLoadtoRpt' Lambda function. * '''Data Passed''': The entire execution state ($) * '''Output Location''': $.NoSQLDDResult * '''Catch''': If an error occurs here, logs a '''''General Error Log''''' and goes to '''''Fail State.''''' * '''Next Step:''' '''''Check Last updated date''''' === Check Last update date === *'''Type:''' Choice *'''Purpose:''' Determines whether a refresh is needed based on the last kptinsertedat value in the target rpt table.. *'''Function Used:''' Calls the 'noSql_ddStaticDataLoadtoRpt' Lambda function. *'''Data Passed''': Looks at $.NoSQLDDResult.Payload.status value * '''Branches''': # '''Refreshed''': The data load now always proceeds, regardless of how recent the last insert was. The Lambda returns {"status": "Refreshed"} and transitions to '''''Refreshed.''''' # '''Skipped''': This path is no longer used for now. # '''Default''': Any other status or unexpected response transitions to the '''''Write General Error Log''''' step. === Refreshed === * '''Type:''' Pass * '''Purpose:''' For cases when the data load was successfully refreshed. * '''Next Step:''' '''''Write Success Log''''' === Skipped === * '''Type:''' Pass * '''Purpose:''' For cases when no refresh was needed. * '''Next Step:''' '''''Write Success Log''''' === Success State === * '''Type:''' Succeed * '''Purpose:''' Confirms that the state machine finished successfully. === Fail State === * '''Type:''' Fail * '''Purpose:''' Indicates that the state machine could not complete due to an error. * '''Error Details:''' Includes a generic error name ("StateMachineError") and the cause of the failure.
Summary:
Please note that all contributions to Kautepedia are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
Kautepedia:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
British English
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
British English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information