Full Stack Data Infrastructure Engineer
K2 Partnering Solutions
Job Description
Job Description & RequirementsAs a Data Engineer on this project, you will architect and develop infrastructure to support modern data science and machine learning applications. You will also partner actively with policy and business divisions to support digital transformation efforts through the use of Data and AI methods. You will be expected to:Take full ownership of end-to-end data systems and architecture development, integrating databases, backend services, DevOps and cloud infrastructure as required to achieve organisational objectivesDevelop data systems and supporting infrastructure, including database administration, ETL pipelining, and analytics tools for use cases across the education ecosystemDevelop infrastructure and applications in accordance with modern cloud technology, ensuring reusability, scalability and securityWork cross-functionally with Product Managers, Data Scientists and Users with a strong sense of ownership to ensure that products developed meet user needs
Requirements:Bachelor's Degree in Computer Science, Information Technology, Engineering, or a related discipline or equivalent experience.Experience with data systems and architecture development, encompassing database systems, backend, DevOps and infrastructure development with care for reusability, scalability and security considerationsExperience with developing and administering production-grade databases and data pipelinesExperience with frontend development is a plusExperience with data science and machine learning applications is a plusStrong communication and collaboration skillsPassion towards working for public good with interest in the education domain in particular
Tech Stack:Government on Commercial Cloud (GCC 2.0)Primary Cloud: AWS, though Azure or GCP are possibilities.Containerisation: Docker and KubernetesIaC (Infrastructure as Code): Terraform is the standard for provisioning these cloud resources.Languages: Python (non-negotiable for ETL/ML) and SQL. Kotlin or Go are common for backend services.Orchestration: Apache Airflow for managing complex data workflows.Data Warehousing: Likely Snowflake, Databricks, or AWS Redshift.Processing: PySpark or Pandas for large-scale data manipulation.Backend & API DevelopmentFrameworks: FastAPI, Flask (Python), or Node.js/TypeScript.API Standards: Strong knowledge of RESTful APIs and APEXDatabases: PostgreSQL (Relational) and potentially MongoDB or Elasticsearch (Non-relational/Search).DevOps & Security SHIP-HATS, their CI/CD toolchain. CI/CD: GitLab CI, GitHub Actions, or Jenkins.Security: Awareness of Vault for secret management and automated security scanning tools (SonarQube, Checkmarx).Frontend: If you have the "plus" skill, it’s likely React.js or Vue.js.MLOps: Tools like MLflow or Kubeflow for deploying the Data Science models