Leveraging automation and machine learning for records management

Challenge

The Department of Finance identified records management as a target for a whole of government single solution. Commonwealth agencies register about 230 million new records per year and hold about 1 petabyte of digital records.

Research across 36 agencies found an 80% dissatisfaction rate in the existing records management solutions, with the root cause being that practices imposed a significant burden on users. The consequence of the dissatisfaction was business inefficiency and a large number of records never being entered.

Additionally, manual disposal processes could not cope with the volume of digital records, so unwanted data was accumulating fast.

Approach

GoSource proposed leveraging cloud-based machine learning and big data platforms to automate records management activities. The team developed a records lifecycle microservice to create, version manage, index and auto-classify records in accordance with National Archives records management standards and government information security standards.

To validate the prototype, they ingested over 500 million sample records and verified sub-second keyword search across the full holdings.

To build the microservices and web apps, GoSource used Python/Django and Python/Flask and AWS cloud platform services including (S3, ECS, ElasticSearch, and API Gateway). To prove that any business system could integrate with the records microservices, GoSource built connectors for e-mail, windows file system, and cloud platforms such as Twitter.

Solution

In the solution architecture and operating model for the RMaaS, GoSource included the discovery and re-use of information assets through big-data search indexes and linked-data ontologies.

System enhancements include a unified search functionality across multiple record sets, artificially intelligent record clustering and classification and merging and reappropriation of records between organisational units.

The microservices suite collectively provides all the essential functions of a system of record (per the Records Management Act). To prove the solution’s ability to scale up, the team set up worst-case scenarios of sustained workloads.

In the future, users will not need to think like record managers. They will do their job using normal systems while records management happens in the background.

Client
Department of Finance
Timeframe
Three months
Scope
Solution architecture, software engineering and operating model of Records Management as a Service (RMaaS)
Summary
GoSource proved the viability of a whole of government record management platform capable of ingesting multi-petabytes of records each year while being highly available and secure
In brief

Benefits

  • Available and secure This prototype proved the viability of a whole of government record management platform capable of ingesting multi-petabytes of new records each year, while being highly available and secure.
  • Improved communication As a true microservice architecture, RMaaS demonstrates that value-adding services can be integrated with core record management functionality.
"GoSource’s rapid prototyping helped us to prove the technical feasibility of our records management transformation program."
Robyn mcleod, director, whole of government solutions
Department of finance