Opening / Selling Statement -
We are seeking a talented and experienced Generative AI Data Scientist to join our growing Data and Analytics service line. The position can be remote within the US.
This is an excellent opportunity to deepen your experience as an AI engineer using a wide variety of technologies to solve complex business problems. Join us in shaping the future of AI-powered applications and contribute to our mission of transforming business processes with AI.
Required Skills -BERT, GPT, GPT-2, and GPT-3, Lambda, Jasper, DCGAN, WGAN
Job Duties -A strong GenAI Data Scientist who specializes in resolving customer challenges and works on use cases using Generative AI and other sub domains of data sciences. Must be able to understand user requirements and then propose and implement solutions using Transformer models and Generative Adversarial Networks (GANs). Will work on multiple use cases related to natural language processing and image as well as video processing, using Large Language Models (including Lang Chain). Will process and augment data as well as images/videos using Generative Adversarial Networks. Perform exploratory data analysis and data pre-processing and also apply machine learning/neural networks models wherever required. And will be coding in python and PySPARK as well as work on various cloud platforms.
Key Responsibilities:
Data preprocessing, data explorations and feature engineering:
Data pre-processing, statistics and feature engineering. Apply multiple data cleaning and data wrangling techniques, multiple Feature Engineering techniques of different types of data. Apply descriptive statistics consisting of central tendencies, variability, distributions. Apply concepts of probability, discrete and continuous distributions. Apply hypothesis testing, density estimates, chi square tests, ANOVA, A/B testing, N Gram analysis of texts, etc. Apply dimensionality reduction and feature extraction techniques like principal component analysis (PCA), singular value decomposition, and linear discriminant analysis wherever required. Apply multiple feature engineering techniques like normalization and scaling, label and one hot encoding, converting text into numerical vectors using multiple techniques, clean texts using nltk, re, spacy libraries, use techniques like stemming, lemmatization etc.
Applying Generative AI – Transformer Model and Generative Adversarial Networks (GANs) for natural language processing, image and video processing.
(a) Transformer Models – Use transformer models as well as Lang Chain to perform Natural Language Processing including Sentiments Analysis, named entity recognition, machine translations, topics and themes Analysis, text and documents classifications, spam filtering, document classifications, documents generations, question-answering (e.g. SQuAD), zero-shot classification, speech recognition etc. Text Cleaning, tokenization, POS tagging, stemming and lemmatization, text vectorizations. Understand self-attention and in-depth theory behind transformers. Implement transformers from scratch, use transformers with both Tensor flow and PyTorch. Applying BERT, GPT, GPT-2, and GPT-3, Lambda, Jasper, ChatGPT, DALL E, stable diffusion, Syntesia, Whisper etc. Apply encoder, decoder, and seq2seq architectures, master the Hugging Face Python library. Fine-tune transformers on your own datasets with transfer learning. Lang Chain – Few shots prompting, Chain of Thought, ReAct prompting, Chat Models, Prompts, PromptTemplates, Output Parsers, Chains: Sequential Chain, LLM Chain, Retrieval QA chain, Agents, Custom Agents, Python Agents, CSV Agents, Agent Routers, Open AI Functions, Tools, Toolkits, Memory, Vector stores (Pinecone, FAISS), Document Loaders, Text Splitters, Stream lit (for UI).
(b) Generative Adversarial Networks (GANs) – Transforming images, convert drawings into high-quality photos, transfer styles between images, increase the resolution of low quality images (super resolution), generate deep fakes (fake faces) with high quality, create images through textual descriptions, restore old photos, complete missing parts of images, swap the faces of people who are in different environments. Deep understanding of several different GANs, such as: DCGAN (Deep Convolutional Generative Adversarial Network), WGAN (Wasserstein GAN), WGAN-GP (Wasserstein GAN-Gradient Penalty), cGAN (conditional GAN), Pix2Pix (Image-to-Image), CycleGAN (Cycle-Consistent Adversarial Network), SRGAN (Super Resolution GAN), ESRGAN (Enhanced Super Resolution GAN), StyleGAN (Style-Based Generator Architecture for GANs), VQ-GAN (Vector Quantized Generative Adversarial Network), CLIP (Contrastive Language–Image Pre-training), BigGAN, GFP-GAN (Generative Facial Prior GAN), Unlimited GAN (Boundless) and SimSwap (Simple Swap).
Deployment knowledge in production environment and continuous monitoring:
Good knowledge and at least 1 project experience in deploying Models in a production environment. Continuous Monitoring and Training of Models in a production environment using production input data streams.
Applying Client/Neural Networks/generative AI Models and measuring their performances:
Applying and fine tuning Machine Learning (Supervised, Unsupervised, Semi Supervised, Reinforcement Learning), Deep Learning (Deep Sequential and Functional feedforward Neural Networks, Recurrent Neural Networks, Convolution Neural Networks), Transformer Models and Generative Adversarial Networks. Good in performance measurement and monitoring of the Client/Neural Network Models
Programming and testing:
Excellent coding skills in Python and PySPARK. Tests codes of machine learning/neural network/generative AI techniques used. Hands on in all libraries of Python and PySPARK.
Hands on coding over cloud platform:
Has to work on cloud machine learning platforms like AWS Sage Maker, Data bricks, Azure Client Platform, Google Client platform/Vertex AI. Has to work on containers and virtual machines over cloud.
Stakeholder engagement:
Engage and collaborate with business analysts, data analysts, data engineers, AI/Client engineers. Understand customers'/business analysts' requirements and provide solutions which suit business requirements. Guide the data engineers to provide the data sets in correct format, sizes, types etc., to facilitate further processing and applying data sciences techniques. Help AI/Client engineers in deploying the codes into production environments. Lead the interactions with customers and other stakeholders within the organization whenever required.
Team leadership and self-leadership:
Lead and mentor a team of Data Scientists and AI/Client engineers, whenever required.
Always learns and grows. Self-motivated.
Job Requirements -
1. Bachelor / Master of Science in Statistics/Mathematics or B.E / B.Tech or M.E. / M.Tech in any engineering discipline.
2. Above 2 years of experience in the above mentioned job description.
3. In depth hands-on skills in prompt engineering and natural language processing using transformer models like - Generative AI (transformer models/large language models/lang chains and GANs) and prompt engineering.
4. In depth hands on skills in image and video processing using multiple Generative Adversarial Network (GANs)
5. In depth knowledge and skills in BERT, GPT, GPT-2, and GPT-3, Lambda, Jasper, ChatGPT, DALL E, stable diffusion, Syntesia, Whisper m Hugging Face Python library etc.
6. Lang Chain – Few shots prompting, Chain of Thought, ReAct prompting, Chat Models, Prompts, PromptTemplates, Output Parsers, Chains: Sequential Chain, LLM Chain, Retrieval QA chain, Agents, Custom Agents, Python Agents, CSV Agents, Agent Routers, Open AI Functions, Tools, Toolkits, Memory, Vector stores (Pinecone, FAISS), Document Loaders, Text Splitters, Stream lit (for UI).
7. GANs, such as: DCGAN (Deep Convolutional Generative Adversarial Network), WGAN (Wasserstein GAN), WGAN-GP (Wasserstein GAN-Gradient Penalty), cGAN (conditional GAN), Pix2Pix (Image-to-Image), CycleGAN (Cycle-Consistent Adversarial Network), SRGAN (Super Resolution GAN), ESRGAN (Enhanced Super Resolution GAN), StyleGAN (Style-Based Generator Architecture for GANs), VQ-GAN (Vector Quantized Generative Adversarial Network), CLIP (Contrastive Language–Image Pre-training), BigGAN, GFP-GAN (Generative Facial Prior GAN), Unlimited GAN (Boundless) and SimSwap (Simple Swap).
8. In-depth skills in all aspects of Statistics and Data Sciences, Machine Learning (supervised, semi- supervised, un-supervised, reinforcement learning, regularization techniques, hyper-parameter tuning), neural networks (deep sequential, deep functional, recurrent, convolution neural networks), Skilled in cloud machine learning platforms (AWS Sage Maker, Azure Data bricks and Azure Client platform, Google Client platform/Vertex AI).
9. Additional qualifications/certifications in Data Sciences-Machine Learning, Generative AI will be an added advantage.
10. Strong problem-solving, analytical, and decision-making skills.
11. Excellent communication and leadership skills.
Stoke Systems
Malthus Darwin
OLX Group
Laserhub GmbH
Deciphex