• Developed layout parsing OCR models to automate historical newspaper processing at scale
  • Deployed Celery-based task queues with FastAPI and Redis to handle asynchronous bulk processing of 100GB+ PDF files, improving throughput .
  • Integrated Elasticsearch with semantic search to support real time search of historical media
  • Migrated 3 Services to OpenStack Jetstream Cloud and setup Ceph object storage reducing AWS bill of 300$ per month .