Senior Backend Data Engineer and Operations (London)
Causaly Inc. is developing the world’s biggest data platform for Cause and Effect evidence in Biomedicine using AI Machine-Reading technology. The technology is self-developed and proprietary, powering a massive Biomedical Causal Knowledge Graph. It helps researchers and decision-makers to discover insights from 30,000,000 academic publications, in minutes. Causaly is used by Pharmaceutical companies and Academia in Research and Commercial departments, for Drug Discovery and Drug Safety.
On the technology side, we are developing a machine-reading platform comprised of a variety of Natural Language Processing and Computational Linguistics algorithms among which are information and relationship extraction, entity linking, knowledge inference and relational learning, abbreviation and coreference resolution. Having machine-read tens of millions of academic documents, our platform processes hundreds of thousands of new documents on a monthly basis. The platform turns free-flow text into causal knowledge graphs and applies machine learning to surface new knowledge.
We are a VC-backed tech company with offices in London and Athens, currently looking for an experienced and dedicated Senior DevOps Engineer who would share our vision and passion to be part of the development of a cutting edge, transformative knowledge product.
See here why we love what we do:
- Operation, maintenance and further development of a Natural Language Processing (NLP) data processing pipeline consisting of multiple conventional NLP stages, i.e. text preprocessing and normalization, relationship extraction, entity linking etc.
- Specifically, running periodic updates processing large volumes of data (hundreds of thousands to tens of millions of documents)
- Prompt problem solving and incidence response
- Data consistency and integrity management across multiple datastores (relational, graph databases and internal search engines)
- Design and operation of efficient data processing and retrieval architectures
- Help in integration of new data types, deployment of new NLP/ML workflows and frameworks
- Management of the cloud environment, consistency in security and usage
- Management and team support of NLP/Fullstack teams with version control and cloud environment operations
- Data pipeline and technology stack/architecture documentation
Required Skills and Qualifications:
- MSc in Computer Science/Informatics or related field with 3+ years of experience in industry
- Attention to detail: Dealing with millions of files, processing them in different stages and tracking their states, comes easy to you
- You can work in an environment which is still shaping, where not everything is fully defined and you are asked to actively contribute to the solution.
- Excellent knowledge of Unix/Linux systems and bash/shell scripting for data processing
- Proficiency in Python, knowledge of Java
- Proficiency in SQL, relational data store design and management
- Proficiency in version control system (git)
- Excellent knowledge in working with either AWS or GCP
- Excellent verbal and written communication skills
Good to have:
- Experience with running natural language processing (NLP) pipelines
- Experience with any of the following databases
- Compensation 50,000 - 70,000 GBP based on candidate experience and equity option package
- Be part of the early team that builds a transformative knowledge product with the potential to have real impact
- Individual training budget for professional development
- Flexible working environment