Requirements:

7+ years of experience in Amazon Web Services (AWS) cloud computing.
10+ years of experience in big data and distributed computing.
Strong hands-on experience with PySpark, Apache Spark, and Python.
Strong hands-on experience with SQL and NoSQL databases (DB2, PostgreSQL, Snowflake, etc.).
Proficiency in data modeling and ETL workflows.
Proficiency with workflow schedulers like Airflow.
Hands-on experience with AWS cloud-based data platforms.
Experience in DevOps, CI/CD pipelines, and containerization (Docker, Kubernetes) is a plus.
Strong problem-solving skills and ability to lead a team.
Experience with DBT and AWS Astronomer is a plus.

Responsibilities:

Lead the design, development, and deployment of PySpark-based big data solutions.
rchitect and optimize ETL pipelines for structured and unstructured data.
Collaborate with clients, data engineers, data scientists, and business teams to provide scalable solutions.
Optimize Spark performance through partitioning, caching, and tuning.
Implement best practices in data engineering (CI/CD, version control, unit testing).
Work with cloud platforms like AWS.
Ensure data security, governance, and compliance.
Mentor junior developers and review code for best practices and efficiency.

MUST HAVE:

7+ years of experience in AWS cloud computing.
10+ years of experience in big data and distributed computing.
Experience with PySpark, Apache Spark, and Python.
Experience with SQL and NoSQL databases (DB2, PostgreSQL, Snowflake, etc.).
Hands-on experience with AWS cloud-based data platforms.
Experience in DevOps, CI/CD pipelines, and containerization (Docker, Kubernetes) is a plus.

Lead PySpark Developer