Offer summary

Qualifications:

5+ years of experience in data engineering or a related field., Deep experience with Apache Spark, Hadoop, and Apache Hive., Strong programming skills in Python and understanding of NoSQL databases., Fluency in Polish and English, with strong problem-solving and communication skills..

Key responsibilities:

Design, develop, and maintain scalable batch and streaming data pipelines.

Transform, process, and integrate data using Python.

Optimize performance across big data workflows, including tuning Hive and Spark jobs.

Handle a mix of structured and unstructured data, including NoSQL and vector databases.

Job description

Hi there!

We are Tooploox, an AI software development company offering custom AI solutions and services. We help innovative companies and startups design and build digital products with generative AI, mobile, and web technologies.

Our team, consisting of nearly 200 experts including our R&D team of over 40 engineers, many with PhDs, has pioneered AI solutions across industries like healthcare, fashion, and e-commerce. We’ve published over 15 research papers in top conferences like NeurIPS and ICML.

We're on the lookout for a Data Engineer to take on a pivotal role in our team. You'll be at the heart of working with data, focusing on scalable batch and streaming data pipelines. If you're someone who loves to merge traditional software development with innovative AI technologies, this role is tailor-made for you.

Feel invited!

What you will do:

Design, develop, and maintain scalable batch and streaming data pipelines.
Work with Python to transform, process, and integrate data.
Handle a mix of structured and unstructured data, including work with NoSQL and vector databases.
Optimize performance across big data workflows, including tuning Hive and Spark jobs.

Experience and skills you need to join us:

5+ years of experience in data engineering or a related field.
Deep experience with Apache Spark (especially PySpark), Hadoop, and Apache Hive.
Strong programming skills in Python.
Solid understanding of database concepts, including experience with NoSQL databases (e.g., MongoDB, Redis) and ideally vector databases.
Hands-on experience with stream processing, preferably using Apache Flink.
Familiarity with distributed computing, data warehousing, and performance optimization techniques.
Strong problem-solving and communication skills.
Fluency in Polish and English.

It would be great if you also have:

Experience with LLMs, prompt engineering, or machine learning workflows (we use this in conjunction with vector DBs).
Proficiency in Java or Scala - useful for deeper Spark optimization or contributing to broader engineering projects.
Familiarity with Spring Boot for building and deploying data applications.

How we work:

At Tooploox, you have the flexibility to choose your working hours and location. While we value remote work, we also believe in building relationships and invite you to join us in our Warsaw and Wrocław offices. Enjoy a relaxed atmosphere and try some “home-made” pizza from our office pizza oven. We love having pets in the office, so feel free to bring yours along.

Join us and shape the future of AI while working the way you like!

Required profile