Graph Database

From Kautepedia
Revision as of 00:29, 9 January 2025 by Solomon.pidoke (talk | contribs) (Graph Database findings)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Graph Database findings[edit | edit source]

Database Options[edit | edit source]

Database Key Features Pros Cons
Neo4j Optimised for relationship-heavy queries and traversals. Uses Cypher query language, which is relatively simple.
  • Excellent for multi-hop traversals.
  • Strong visualisation tools.
  • Relationship properties natively supported.
  • Limitations on community edition but for contract tracing etc its probably enough
  • Costs for enterprise edition.
  • Requires setup and hosting.
AWS Neptune Fully managed service supporting Gremlin and SPARQL.
  • AWS integration.
  • Scalable and highly available.
  • Higher cost.
  • Steeper learning curve for SPARQL/Gremlin.
TigerGraph Distributed graph database for large-scale analytics.
  • Excellent performance for deep relationships.
  • Handles large datasets well.
  • Enterprise-focused pricing.
  • Requires learning GSQL.
PostgreSQL + Apache AGE Extends PostgreSQL to support graph functionality.
  • Cost-effective.
  • Integrates with existing relational workflows.
  • Limited advanced graph features.
  • Not natively supported on RDS
  • Requires self-hosting PostgreSQL.
ArangoDB Combines graph, document, and key-value models.
  • Flexible for mixed workloads.
  • Cost-effective.
  • Less specialised for graph-specific tasks.


Neo4J[edit | edit source]

Neo4J would be suitable for what we're trying to achieve. The community edition is most likely suffice for our dataset and things like relationships,contact tracing etc.

For a Small Dataset (2GB)[edit | edit source]

  1. EC2 with EBS
    • Best For: Simple and low-cost setup.
    • Estimated Cost: ~$9/month.
  2. ECS with EFS
    • Best For: Flexibility (though it might be overkill unless scaling significantly).
    • Estimated Cost: ~$16/month.
  3. Neo4j Aura Free Tier
    • Best For: Simplicity and no upfront cost (if workload fits within free limits).

For a 5GB Dataset[edit | edit source]

  1. EC2 with EBS
    • Best For: Simple, cost-efficient setup with moderate scaling needs.
    • Configuration:
      • Instance: t3.medium (2 vCPUs, 4GB RAM) recommended for moderate query loads.
      • Storage: 20GB EBS gp3 (to allow room for growth and logs).
    • Estimated Cost: ~$40/month (EC2: ~$38, EBS: ~$2).
  2. ECS with EFS
    • Best For: Flexibility, especially if you’re familiar with containerized environments.
    • Configuration:
      • 1 ECS Fargate task (1 vCPU, 2GB RAM) with 10GB EFS.
    • Estimated Cost: ~$25/month (ECS: ~$22, EFS: ~$3).
  3. Neo4j Aura Professional
    • Best For: Workloads exceeding free tier limits (200K nodes/400K relationships).
    • Estimated Cost: ~$65/month.

For a 10GB Dataset[edit | edit source]

  1. EC2 with EBS
    • Best For: Cost efficiency with predictable workloads.
    • Configuration:
      • Instance: t3.medium (2 vCPUs, 4GB RAM) or m5.large (2 vCPUs, 8GB RAM) for heavy workloads.
      • Storage: 30GB EBS gp3 for dataset and transaction logs.
    • Estimated Cost: ~$41/month (EC2: ~$38, EBS: ~$3).
  2. ECS with EFS
    • Best For: Scalability and AWS integration.
    • Configuration:
      • 1 ECS Fargate task (2 vCPUs, 4GB RAM) with 20GB EFS.
    • Estimated Cost: ~$35/month (ECS: ~$30, EFS: ~$5).
  3. Neo4j Aura Professional
    • Best For: Handling datasets larger than the free tier but requires a subscription.
    • Estimated Cost: ~$65/month.