Inferix Decentralized GPU
  • Overview
  • Tokenomics
  • Introduction
    • Rendering network using crowdsourced GPU
    • Rendering verification problem
  • Decentralized visual computing
    • Client Apps plugin
    • Client API and SDK
    • Manager node
    • Worker node
    • Decentralized storage
      • Data categories
      • Multi-level 3D polygon data
      • Polygon digester
      • Decentralized storage
      • Decentralized cache
    • Data security with FHE and TEE
      • Verifier data security enhancement with FHE
      • Worker and Manager data security enhancement with FHE
    • Decentralized federated AI
      • Federated learning with TensorOpera
      • Meta LLaMA
      • Stable Diffusion
      • Other AI models
      • Inferix AI
  • Inferix Testnet 1
    • Inferix GPU Solutions
    • Adding GPUs to the Network
    • Renting GPU Devices
    • GPU Staking
  • Future development
    • PoR and NFT minting for graphics creative assets
    • ZKP and PoR communication
    • Inferix RemotePC
    • Rendering professional network
  • Terms of Service
    • Privacy Policy
    • Airdrop Terms of Service
Powered by GitBook
On this page
  1. Decentralized visual computing
  2. Decentralized federated AI

Meta LLaMA

In its GPU hardware segment, Inferix focuses on devices optimized for graphics rendering, with the RTX3090 and RTX4090 serving as the flagship devices.

The TensorOpera®️ team has released public data on deploying pre-trained models like LLaMA-2 13B or LLaMA-3 7B parameters on the RTX4090. Notably, LLaMA-2 13B inference running on a single RTX4090 using TensorOpera’s ScaleLLM achieves 1.88 times lower latency compared to the same model running on a single A100 GPU using vLLM. For the LLaMA-3 7B, it can run with a token batch size of 256 on a single RTX4090, without additional memory optimization.

In their introduction to ScaleLLM, the TensorOpera®️ team claims that by utilizing this engine with the RTX4090, LLMs can operate with three times less memory, run 1.8 times faster, and be 20 times more cost-effective compared to using A100 GPUs in traditional data centers.

Research and experimental benchmarks have shown that we can train larger LLMs on a larger number of distributed GPUs than in data centers with federated learning, using Gradient Low-rank Projection (GaLore).

PreviousFederated learning with TensorOperaNextStable Diffusion

Last updated 4 months ago