Senior Data Scientist
- Develop, implement, and maintain the platform for generation, maintenance, assessment, and improvement of models. Devise and develop tools for efficient creation of new models.
- Develop and implement closed-loop accuracy assessment process for validating models. Develop algorithms for detection of errors in existing and new models.
- Integrate crowdsourcing into process of evaluation and improvement of models. Lead and supervise a team of data engineers, model curators, and model translators to apply the framework to improve existing models and to generate new contextual models as well as replicating models to languages including German, Chinese, Japanese, Spanish, French, Portuguese, Russian, etc.
- Supervise junior data scientist to create machine-learning methods of classification and to integrate these methods with the model framework to simplify the current process and make it more efficient. Propose a new framework of closed-loop interaction of human- and machine-generated classification models, and publish the results in peer-reviewed journals.
- Implement these algorithms utilizing knowledge of Hadoop, Pig, HBase, Java, Python, and R. Implement these algorithms by: (a) Writing script to retrieve and organize data from Hadoop file system; (b) Writing custom scripts to construct dataset from different data storage; (c) Prototyping in Python; (d) Statistical processing using R; (e) Data visualization.
- Develop and implement new modules in Integral data generation stack. Retrieve unstructured data from massive data stores. Perform statistical analysis of results.
- Education and experience requirement: Ph.D. degree in Computer Science, Engineering, Business Administration (with a major in Information Systems) or Related Field and 3 months of experience in job offered or 3 months of experience in the Related Occupation. Experience can be pre or post degree.
Data Analyst or any other job title developing the generation, assessment and improvement of models; Devising and developing tools for creation of new models; Developing and implementing accuracy assessment process for validating models; Developing algorithms for detecting errors in existing and new models; Helping to develop a new online ad targeting algorithm and publishing the results in peer-reviewed conferences; Implementing these algorithms utilizing knowledge of Hadoop, SQL, Java, Python, and R; Implementing these algorithms by: (a) Writing script to retrieve and organize data from Hadoop file; (b) Writing custom scripts to construct dataset from different data storage; (c) Prototyping in Python; Statistical processing using R; and Data visualization. Developing new modules within the online advertising targeting framework; Retrieving unstructured data from massive data stores; and Performing statistical analysis of results.