• Development and maintenance of a fully open-source Data Lakehouse.
• Design and development of data pipelines for scalable and reliable data workflows to transform extensive quantities of both structured and unstructured data.
• Data integration from various sources, including databases, APIs, data streaming services and cloud data platforms.
• Optimization of queries and workflows for increased performance and enhanced efficiency.
• Writing modular, testable and production-grade code.
• Ensuring data quality through monitoring, validation and data quality checks, maintaining accuracy and consistency across the data platform.
• Elaboration of test programs.
• Document processes comprehensively to ensure seamless data pipeline management and troubleshooting.
• Assistance with deployment and configuration of the system.
• Participation in meetings with other project teams.
KNOWLEDGE AND SKILLS
• Extensive hands-on experience as Data Engineer or Data Architect in modern cloud-based open-source data platform solutions and on data analytics tools.
• Excellent knowledge of data warehouse and/or data lake house design & architecture.
• Excellent knowledge of open-source, code-based data transformation tools such as dbt, Spark and Trino.
• Excellent knowledge of SQL.
• Good knowledge of Python.
• Good knowledge of open-source orchestration tools such as Airflow, Dagster or Luigi.
• Experience with AI-powered assistants like Amazon Q that can streamline data engineering processes.
• Good knowledge of relational database systems.
• Good knowledge of event streaming platforms and message brokers like Kafka and RabbitMQ.
• Extensive experience in creating end-to-end data pipelines and the ELT framework.
• Understanding of the principles behind storage protocols like Apache Iceberg or Delta Lake.
• Proficiency with Kubernetes and Docker/Podman.
• Good knowledge of data modelling tools.
• Good knowledge of online analytical data processing (OLAP) and data mining tools.
• Ability to participate in multilingual meetings
• Ability to work with a high degree of rigour and method and, more specifically, to follow naming conventions and coding standards.