Martin took the audience through Presto’s journey - right from its birth at Facebook, to its growth and adoption at Facebook, and finally to the present with the formation of Presto Software Foundation for wider community involvement. He also highlighted some of their design choices and some mis-steps they took along the way.
Edwin Hui Hean Law, Data Engineering Lead at Grab, Singapore gave a talk that covered Grab’s experience of using Presto on Amazon EMR followed by their migration to Presto on Qubole. He provided his insights on the relative pros and cons of these platforms. Final part of his talk covered his team’s recent experimentation with Presto on Kubernetes.
Shubham Tagra, Sr. Staff at Qubole, presented his work on providing read support for Hive ACID tables in Presto. This has become increasingly important with the arrival of data privacy regulations like GDPR and CCPA that grant users “Right to erasure” and/or “Right to rectification”. Shubham’s talk covered why he picked Hive ACID over other options available in open source, as well as details of Hive ACID and Presto integration that he added.
Praveen Krishna from Zoho Corporation, presented a summary of his team’s journey with Presto. He gave an overview of how his team optimized Presto’s planner and reduced the planning time by 20-30% for queries involving multiple joins on wide tables. He also highlighted how they have integrated Apache Lucene to speed up full text search operation.
Ashish Kumar Tadose, Principal Engineer at Walmart Labs gave an overview of how his team is using Presto on Google Compute Cloud (GCP). He highlighted the challenges associated with querying diverse data sources at Walmart and how his team has tackled these challenges using Presto. His talk also described how his team has implemented monitoring, auto scaling, caching (via Alluxio), and security policies via Ranger.
Garvit Gupta from Microsoft along with Ankit Dixit from Qubole, presented their work on Presto scheduler changes for data locality and optimized scheduling for caching engines like RubiX. They presented a new scheduling model that manages to prioritize locality while ensuring a uniform distribution of workload to nodes and improves efficacy of any data caching framework. His talk concluded with performance numbers that showed upto 9x improvement in cached/local reads in RubiX.
Rohit Srivastava, Engineering Manager at MiQ Digital, highlighted several challenges that his team had to overcome such as dealing with data copies, duplication of data pre-processing, meeting strict SLAs, etc. He gave an overview of how using Presto on Qubole for all dashboarding needs with additions like standardising most of their data to be stored in the Apache Parquet format on S3 has helped overcome these challenges.
Septemer 05 2019, 9:00 AM IST
Courtyard by Marriott, Bengaluru Outer Ring Road
Marathahalli - Sarjapur Outer Ring Road, Bellandur, Bengaluru, Karnataka 560103