Click here to join our community of experts to get information on job search, salaries and more.

Data Engineer

Company: Cliff Services Inc

Location: Chicago, IL

Posted on: December 10

Job Title: Data Engineer (Scala + AWS + Spark)

Locations: McLean, VA / Richmond, VA / Chicago, IL

Experience: 7+ Years

Visa: H1B

Interview Process: In-Person / Face-to-Face (F2F)

Job Description

We are seeking a highly skilled Data Engineer with strong expertise in Scala, AWS, and Apache Spark. The ideal candidate will have 7+ years of hands-on experience building scalable data pipelines, distributed processing systems, and cloud-native data solutions.

Key Responsibilities

  • Design, build, and optimize large-scale data pipelines using Scala and Spark.
  • Develop and maintain ETL/ELT workflows across AWS services.
  • Work on distributed data processing using Spark, Hadoop, or similar.
  • Build data ingestion, transformation, cleansing, and validation routines.
  • Optimize pipeline performance and ensure reliability in production environments.
  • Collaborate with cross-functional teams to understand requirements and deliver robust solutions.
  • Implement CI/CD best practices, testing, and version control.
  • Troubleshoot and resolve issues in complex data flow systems.

Required Skills & Experience

  • 7+ years of Data Engineering experience.
  • Strong programming experience with Scala (must-have).
  • Hands-on experience with Apache Spark (core, SQL, streaming).
  • Solid experience with AWS cloud services (Glue, EMR, Lambda, S3, EC2, IAM, etc.).
  • High proficiency in SQL and relational/noSQL data stores.
  • Strong understanding of data modeling, data architecture, and distributed systems.
  • Experience with workflow orchestration tools (Airflow, Step Functions, etc.).
  • Strong communication and problem-solving skills.

Preferred Skills

  • Experience with Kafka, Kinesis, or other streaming platforms.
  • Knowledge of containerization tools like Docker or Kubernetes.
  • Background in data warehousing or modern data lake architectures.