Company culture :
Inetum operates within a predominantly collaborative culture, where people, trust and teamwork are at the core of the organization. The company promotes a supportive management approach focused on guidance, accountability and skills development. This collaborative foundation is strengthened by a performance-driven mindset, emphasizing ambition, self-improvement and customer focus. Innovation and organizational dimensions further complement this culture, encouraging initiative and agility while relying on structured processes to ensure efficiency and reliability. [+]
Job :
As part of the Data team, you will be responsible for the design, industrialization, and optimization of data pipelines in a Big Data environment (Hadoop/HDFS, Hive, Spark). You will ensure the quality, traceability, and availability of datasets that feed BI (Power BI) and business analytical needs.
Key Responsibilities:
Integrate data from multiple RDBMS (PostgreSQL, SQL Server, MySQL, IBM DB2) and files via Sqoop/ETL.
Structure bronze/silver/gold zones and define schemas (Hive).
Develop and optimize Spark / PySpark jobs (partitioning, broadcast, cache, bucketing).
Write efficient and maintainable SQL/HiveQL transformations.
- Orchestration & Production
Design and maintain Airflow DAGs (scheduling, retry, SLA, alerting).
Industrialize via GitLab (CI/CD), Shell scripts, and DevOps Data best practices.
Implement checks (completeness, uniqueness, referentials), unit/data tests, and documentation (catalogue, dictionaries).
Ensure traceability (lineage) and incident management (RCAs, runbooks).
Publish "analytics-ready" datasets and optimize Power BI data feeding (materialized views, aggregations).
Contribute to KPI calculation and reliability.
Required profile :
Profile sought:
- 2 to 4 years of experience in Data Engineering/Big Data, with proven achievements in PySpark/Hive and Airflow.
- Master's degree (Big Data & AI, Data Engineering, or equivalent).
- Proficiency with RDBMS (PostgreSQL, SQL Server, MySQL, IBM DB2) and query optimization.
- Familiarity with Linux environments and Shell scripting.
- Ability to document, test, and monitor production pipelines.
Technical Stack:
- Big Data Processing: Spark / PySpark, Hive, HDFS (+ MapReduce/Impala appreciated).
- Languages & Data: Python, advanced SQL, Shell (bash).
- Orchestration: Apache Airflow.
- Dataviz/BI: Power BI (dashboards, datasets).
- OS & Tools: Linux (Ubuntu/CentOS), Git/GitLab, CI/CD.
- Bonus: Pandas/Numpy for prototyping, MongoDB/HBase knowledge.
Behavioral Skills:
- Rigor and attention to quality (tests, code reviews, documentation).
- Team spirit and clear communication with business and BI teams.
- Autonomy in incident investigation and proactivity in continuous improvement.
- Results-oriented: adherence to SLAs and performance culture.