Case Study | Data Analytics & Insights

A major retailer approached Spike for help with their data & analytics platforms following a major Oracle Retail implementation. Our challenge was to “get data moving” and resolve a number of issues along a complex data journey from the data lake through flattening, transformation, data warehousing and out to a MicroStrategy reporting solution.

Benefits:

  • Built a repeatable data generation and loading framework.
  • Identified regression and performance issues allowing customer to change and tune end to end load of sales data through to EDW within required 4 hour overnight window including:
    • Identified optimum ingestion batch size and reduced data flatteningtime by 50%.
    • Determined ideal cost and performance configuration for ETL.
    • Tuned Microsoft SQL Data Warehouse (Synapse) for most efficient use of capacity.
    • Identified SQL Server configuration and maintenance improvements which reduced cube load time on the reporting server.
  • Found critical bottlenecks before go live:
    • Hardware constraint with 3rd party reporting software.
    • Limitations with in-house developed “unzip service”.
    • Inefficiencies with database maintenance tasks which had a significant impact on data publish times at the MicroStrategy reporting layer.

Background

Spike led the analysis, design and implementation of a number of data initiatives, working within the client’s data division and the wider Oracle Retail programme.

  • Some real data had been presented but not in sufficient and reliable enough volume. Previous data creation efforts were not to the volume or variance required for full testing.
  • The client’s teams were working with the client’s central performance team to build execution capability and adopt a core performance framework. Spike were previously involved in this work and were therefore able to enhance these efforts.
  • Getting reliable volumes of data into Data Lake was key so that this could then flow into other layers.
  • Waiting for real data was no longer practical as it would not allow the projects to deliver on time.
  • Key goal was to generate synthetic data in volume until it can be replaced by real data starting with Data Lake and iterating through each layer.

Our Approach

Our performance experts drove the planning of key performance activities including:

  • An independent evaluation of current methods used to data creation in depth and confirm the details within this proposal.
  • Confirming readiness to test across all areas.
  • Examining existing tools including an in-house “data exploder” and designing the technical approach to data creation.
  • Investigating data recovery techniques to allow tests to be repeated with confidence.
  • Determine volumetrics for flows from Oracle Retail into the data layers including throughput required and flow rate, response time required to process data and capacity.

Performance test execution then focused on getting data moving:

  • Modelling the messages into Data Lake for key data flows.
  • Creating volumes of data and flowing into Data Lake.
  • Determining how to then flow data from Data Lake > flattening > EDW > reporting.
  • Conducting pattern-based tests for those flows, identifying critical bottlenecks.

Conclusion

  • Modelling data flows and creating synthetic data will get data moving and accelerate complex data programmes.
  • Numerous bottlenecks will exist requiring a focused, iterative and forensic approach to testing.
  • Automated tests and scripts are required to allow repeatability. This is both to execute tests and for test environment creation, teardown, reset and data load.
  • Performance testing is essential to ensure load and transformation can occur in a timely manner. It will ensure valid data can be delivered when end users need it.

Download as a PDF

No contact details needed, click below to download a copy of our case study with our compliments. Please consider the environment and read online or print responsibly.