Position Summary: We’re seeking a motivated and detail-oriented ECM Integration Analyst/Data Scientist to join our development team. In this role, you’ll work closely with senior team members to help in implementing and supporting integrations between the Enterprise Content Management (ECM) platform and internal/external systems, support data-driven solutions in the healthcare space. You’ll be involved in data analysis, feature development, and data pipeline support, making sure data is clean, consistent, and easy to use for analytics and machine learning. This role focuses on designing prompt engineering workflows for LLM based applications. The ideal candidate is passionate about data, eager to solve complex problems, and comfortable working in a fast-paced, collaborative environment.
Responsibilities:
- Generate clean and consistent training and testing datasets to support machine learning and analytical models and assist in preparing models for deployment while working with production teams to monitor performance.
- Design, test, and optimize prompts for LLMs to support structured data extraction, summarization, and decision support.
- Develop reusable prompt templates with system, user, and context instructions to ensure consistent and compliant AI outputs.
- Iteratively refine prompts using evaluation metrics, error analysis, and real-world feedback.
- Design, develop, support, and manage end-to-end data pipelines (ETL/ELT) in collaboration with machine learning engineers, ingesting data from multiple sources including Excel, CSV, APIs, and databases, and contributing to pipeline design, optimization, and scalability.
- Perform data transformation, validation, monitoring, and audit logging to ensure accurate, reliable, and scalable data workflows.
- Write and maintain SQL queries involving joins, subqueries, CTEs, and window functions to support data analysis and feature development.
- Work with cloud platforms such as AWS or Azure to support analytics and model-related workloads in cloud-based environments.
- Develop data workflows using Python for automation, data processing, and integration of tasks.
- Help maintain data governance by ensuring standardized, traceable, and consistent data usage across analytics and machine learning solutions.
- Monitor and implement processes involved in document management system integration.
- Document data processes, schemas, feature definitions, transformation logic, and workflows to support future maintenance and governance.
- Communicate data insights and analytical findings effectively to both technical and non-technical stakeholders.
Required Qualifications:
- Bachelor's or master's degree in computer science, Data Science, Statistics, or a related quantitative field
- 3+ years of experience in data science, data engineering, with a focus on data infrastructure and machine learning workflows.
Required Technical Skills:
- Experience with system integration tools and middleware involved with document management systems.
- Strong understanding of ECM system architecture and data mapping.
- Hands-on experience with Python libraries such as Numpy, Pandas, Seaborn, Scipy, Sklearn, Torch, Transformers and FastAPI for data processing and automation.
- Experience in building different models such as Logistic regression, Random Forest and Neural networks (NLP)
- Experience in building vector databases such as Elastic for advanced searching on documents.
- Experience building ETL pipelines using tools such as SSIS or Python-based workflows.
- Experienced in designing and consuming RESTful APIs using C#/.NET.
- Experience working with Excel, CSV, and JSON data sources.
- Strong understanding of relational databases (SQL Server or equivalent).
- Basic familiarity with ClickHouse as a high-performance analytical database for large-scale data analysis use cases.
- Familiarity with Generative AI and LLM-based applications, including AI agents and modern AI development tools, supported by recent professional certification.