Overview
The entire AI space is moving very fast, and we are of course monitoring developments very closely.
We recognise that chat based systems such as ChatGPT, and AntropicAI, Bing and Bard do not meet the regulatory or deployment requirements of SMB organisation or larger. We therefore propose an open source stack of interlocking technologies which can be used to understand, design and deploy inside a business framework.
Our Reference Platform is a open source software stack combining the best, most reliable products available to deliver business value today. We have built is as a modular platform, allowing us to change parts of the stack in and out as new technology becomes available:
Our Open Source Platform allows the Development, Testing, Costing and Deployment of LLM and AI agent solutions to SMB business, which are compliant with ISO, GDPR and FSC regulations, delivering reliable, transparent and regulatory compliant solutions to Small and Medium sized businesses.
The solutions are scalable from development systems capable of 50-60 tokens per second to large scale supercomputer instances running on Dell Helix and IBM Power9 systems.
Platform Design Strategy
Our platform AMP(H) is based around Ubuntu Unix 22.04 LTS, a leading vendor of Linux based server platforms. This allows us to deliver the following:
1) Security
Designed from the ground up as a secure compute platform capable of running Python workloads in a scalable environment, with Docker and Venv internal virualisation. Ubuntu has built in compatibly with Active Directory and networking to allow easy integration with a customer environment.
2) Scalability and compatibility
Ubuntu is also is compatible with other enterprise level Linux architectures such as Redhat, which can be used on large scale deployments which require Power9 or NVIDIA/DELL helix. There are Ubuntu Backup modules available for all popular network backup solutions.
3) Load balancing, firewall and Reverse Proxy
The platform is designed to sit on your companies’ premises, or as a virtual machine in your cloud, if you do not require GPU acceleration for local delivery, (i.e. if you are able to use the OpenAI or Antropic API’s for delivery of MML API interfaces.
4) Local Logging and connection mediation
In a multiuser environment, it is important to mediate the incoming connections and local processes with regard to network security, to prevent both leakage of confidential internal information to 3rd parties, comply with GDPR, and to provide guaranteed and reliable responses. In the event that a local LLM is deployed (LLAMA2/Falcon etc) under the built in inference engine (LLAMA.CPP) the queries remain on premise, and queries are processed locally and logged. In the configuration where the platform relies on external LLM API (OpenAi/Anthropic) all user queries and return from the external API are logged and moderated to prevent information leakage of confidential data.
5) Locally administered user interface
Any Chat UI or autonomous Agent Process is initiated and moderated on the local network with regard to user access rights derived from Active Directory or any LDAP compatible authentication scheme. All automated processes are under the control of the companies administration team, and can be monitored by existing enterprise monitoring systems by querying the local services running on the AMP(H) server as Unix daemons.
6) Local Version control
In the deployment scenario whereby the LLM is delivered locally, once testing and deployment of a workflow or process has been tested, the LLM will deliver predictable results. Local seed and heat values can be set for each process for reliable autonomous processes. In the event that a external LLM is used, we can again set seed and heat values to limit the system deviating from its tested performance, along with explicit definition of the LLM subversion to be used, although some exposure to feature withdrawal by the 3rd party API provider remains a managed business risk.
7) Governance
With server defined workflows, and moderated chat and email functionality, the working practise is designed at development time, and will remain compliant with existing governance at runtime.
8) Cost and token metering
Through the development process, the organisation can monitor token usage, search usage, and any other 3rd party API costs to calculate the month on month sped to make sure that the system is providing value for money. Limits can be applied locally to the system that can prevent runway recursive calls to 3rd party API’s that would cause excessive billing due to error.
9) Secure, authenticated Knowledge base Grounding
In the case that confidential information is used for reference during inference, the platforms local knowledgebase is held on the AMP(H) platform, with local tokenisation. The documents are never sent to a 3rd party for tokenisation or inference, with only a local redirect taking place to use an external LLM for natural language processing. All vector databases and document stores are created and hosted locally.
10) LLM generalisation and redirect
In the case where multiple LLMs are used, for example where Wizard is used for local code generation inference and LLAMA2 is used for natural language, we can support different LLMs as plugins to deliver optimum performance for business case scenarios. In the case that in the future the organisation performs LoRA on a foundational model to improve local inference speed by tuning a foundational model (LLAMA) on its internal documentation, this can also be supported.
11) CUDA and CUBLAS support
In the case where local inference is either required by regulation, the platform can be accelerated by use of CUDA compatible Nvidia GPUs such as the consumer level GTX range, or the commercial grade A100/H100 GPUs. For test systems and workstations running GTX level cards CUBLAS compatible instruction sets are available to allow the offloading of certain layers of the neural net to the Nvidia GPU, while other consume CPU from the Intel Xeon processor and system RAM to conserve resources.
Software Stack
1) Python 3.7 runtime with docker and Conda virtual environment support.
2) Langchain, an automation workflow engine sitting on top of Python to provide a way to provide static agentic workflows surfaced as python code which call all subsystems and be run as timed Unix jobs.
3) ChromaDB vector databases to create both static and runtime tokenisation stores of confidential documentation stores.
4) LLAMA.CPP and LLAMA index to provide local inference of locally hosted LLMs such as Wizard, Falcon or LLAMA2
5) Document loaders, to allow PDF, TXT, Word and Excel files to be tokenised into vector databases for local grounding of user queries.
6) LDAP for local user identification and authorisation into active directory.
7) Chainlit and Uvicorn webserver for provision of custom user interface design into interactive processes.
8) Langflow, a graphical workflow designer that allows rapid prototyping testing of specific agentic workflows and operations that produces complied JSON file can be deployed to a specific API end point for process initiation or attached to a GUI for interactive client/server applications via Python scripting. The API can be secured with LDAP authentication for security.
9) Email Integration : Sendmail functionality allows automated reporting of agents into specific mailboxes and existing email workflows though MX relay.
10) Mail Agents allow the Natural Language Processing of authenticated mailboxes.