Offer summary

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field., Experience with ETL/ELT tools such as Azure Data Factory and Azure Databricks., Strong programming skills in Python and familiarity with data storage solutions like Azure Data Lake., Knowledge of data governance and quality assurance practices..

Key responsibilities:

Design and develop scalable data pipelines to meet business requirements.

Integrate data from various sources and manage data storage solutions.

Implement data quality monitoring systems and uphold data governance policies.

Collaborate with cross-functional teams to address evolving data needs and optimize data workflows.

Job description

About the Company

Lift Ventures, a remote-first startup studio whose portfolio of businesses has reached over 250 million consumers to date, is seeking a seasoned and talented Data Engineer for SuperSummary, our fast-growing EdTech business. SuperSummary is a subscription-based website and mobile app offering a library of professionally written study guides and other educational tools and resources on thousands of books for students, teachers, and readers of all types.

About the Job

We are looking for a Data Engineer to join our fully remote team and play a key role in designing, building, and optimizing our data systems. Reporting to the Data Engineering Coordinator, you will collaborate with product managers, department leaders, data scientists, and analysts to deliver high-quality, actionable data insights that shape product development, drive innovation, and support strategic decision-making across the company.

This position is 100% remote, with a preference for candidates based in Latin America. Our distributed team spans the U.S., Brazil, the Philippines, and beyond — we value diverse perspectives and an inclusive, collaborative work environment.

Key Responsibilities

Design and Develop Scalable Data Pipelines

Build, maintain, and optimize robust ETL/ELT pipelines using tools such as Azure Data Factory, Azure Databricks, or Synapse Analytics.
Ensure pipelines meet business requirements, are scalable, efficient, and well-documented.

Data Integration and Management

Integrate data from diverse sources, including APIs, data warehouses, cloud services, and more.
Manage and optimize data storage solutions like Azure Data Lake, Azure SQL Database, and Azure Blob Storage.

Ensure Data Quality and Governance

Implement data quality monitoring systems and checks to ensure accuracy, consistency, and reliability.
Uphold data governance policies and ensure compliance with data security and privacy standards.

Cross-Functional Collaboration

Work closely with cross-functional stakeholders to understand evolving data needs and translate them into scalable solutions.
Provide technical support for data-related issues and contribute to the continuous improvement of the data infrastructure.

Performance Optimization

Monitor, troubleshoot, and enhance the performance and availability of data pipelines.
Optimize data workflows to improve speed, efficiency, and cost-effectiveness.

Documentation and Best Practices

Document data architectures, workflows, and processes to ensure transparency and knowledge sharing.
Advocate for and apply best practices in data engineering, staying current with emerging tools and technologies.

Sample Projects

Unified Traffic & Session Pipelines

Built scalable ETL/ELT workflows (Airflow/Databricks) that merged clickstream and session data from Amplitude, Google Analytics, AWR, etc., into a central analytics layer—powering cross-platform marketing + product dashboards.

External Data Ingestion & Enrichment

Developed Python-based scrapers and API orchestrations to pull competitive pricing, book metadata, and review data—fusing it into our recommendation engine for richer decision-making.

Data Lakehouse & Warehouse Architecture

Led rollout of a cloud-native Lakehouse (Delta on Databricks/Azure Synapse) alongside a star-schema enterprise warehouse—standardizing schemas, partitioning strategies, and CI/CD for SQL artifacts (dbt or Synapse pipelines).

Data Governance & Quality Framework

Established data ownership models, automated lineage tracking, and built monitoring jobs (Great Expectations) to catch schema drift, null spikes, and stale datasets before they hit BI tools.