
About the Customer
The customers of this project appear to be organizations that require software backend APIs that are scalable, extensible, and capable of scheduling jobs. The project seems to be designed to support tenants, or multiple users, who can provide configuration information to the system through a tenant admin component.
The cloud aggregator component then schedules jobs according to the required configuration and validates data through the Avro mechanism before sending it to the cloud edge-catalog, which serves as the data warehouse.


Project Overview
The project mainly forced on writing high-quality code by maintaining scalability and extensibility in mind. Software backend API development is divided into tenant admin, cloud aggregator and cloud edge-catalog component. The role of the cloud-aggregator part is to schedule the jobs according to the required configuration which we get from the tenant admin component. Cloud aggregator is the central component of the product. cloud edge-catalog is the data warehouse of the product.
when the server is started cloud aggregator schedules the jobs according to the configuration which we get from the tenant admin component. for example, we get the GitHub and azure data source-related config from the tenant config API. First GitHub job will run and scan the data from the organisation and then cloud aggregator validates the data through the Avro mechanism and sent it to the cloud edge-catalog side via queue mechanism.
Purpose
The purpose of this project is to provide organizations with a powerful tool for managing and processing large amounts of data. By creating a software solution that is scalable and extensible, the project aims to help businesses and other organizations handle increasing amounts of data as their needs grow.
Business Challenge
- Data fetching from different data sources like GitHub, Azure, BitBucket etc.
- Automated response workflow.
- Transferring bulk data in desired patterns.
- Companies that truly embrace the data privacy right of their customer and Employees and make it easy for them to understand and exercise their Rights will gain a distinct competitive advantage, now and in the future.
- Enhanced reporting for auditing and performance monitoring.
- Automatic detection of personal data being stored in any of the configured data source.
- Proper logging mechanism for cloud aggregator.
Our Solution
For fetching data from Github we have integrated Pygithub third party Library and for Azure, we used Microsoft Azure GraphQL API.
Automated response workflow before we implemented celery based
Job scheduling flow. In that user can schedule different data source jobs like GitHub, Bitbucket, Jira, Azure etc. according to the scheduler configuration which consumer set. same things we have implemented based on the multithreading-based architecture.
We used the Avro mechanism to transfer the bulk data which we fetched From the different data sources using the job scheduling flow. Here we created data source different component Avro schema according to the client requirements.
To construct APIs quickly and easily with better performance and speed we used FastAPI web Development framework in this product. also FasrAPI Provide built-in concurrency, excellent performance, dependency injection Support, built-in doc, validation built-in etc.
We have implemented a file-based logging mechanism to monitor the cloud aggregate data scanning functionality. also, we have integrated the Sentry to monitor the error logs in the system.
Your Business Could Be the Next Success Story
We turn complex challenges into scalable digital solutions.
Let’s talk about how we can solve yours.
Key Challenges
There was a whole cycle of code quality and deployment tools that had to be passed before production push. Continuously run data source jobs in the development and production server.
Writing test cases for asynchronous application features. Handling normal vs regular mode while scanning the data.
Project Name
DataSentry
Category
Cloud-Based Backend API Development
Technology Stack
- Python
- FastAPI
- Threading
- Celery
- Postgresql
- Sentry
- Azure portal
- GitHub
- Azure DevOps
- BitBucket
- Jenkins
- Sonarqube
Industry
Information Technology (IT) & Software Development