The State of EVM Indexing – A Comprehensive Survey of Tools, Trade‑offs, and Emerging Trends
By The Dune Research Desk – March 3 2026
Why Indexing Matters for Modern Web‑3 Apps
When developers build on Ethereum‑compatible chains, the raw blockchain data is an append‑only log of low‑level state changes. Pulling a wallet’s balance, a token’s price, or a protocol’s latest trade directly from a node via RPC would require scanning millions of blocks, decoding events, stitching together traces, and performing on‑chain calculations—work that can take seconds or even minutes per request. That latency is unacceptable for user‑facing products, which need sub‑second responses.
Indexers solve this problem by continuously consuming on‑chain data, transforming it into structured tables, and exposing fast query interfaces. In practice, every high‑profile DeFi dashboard, NFT marketplace, or on‑chain analytics platform relies on an indexing layer to stay responsive and cost‑effective.
Core Design Decisions Behind an Indexing Pipeline
- Data source – Whether the solution works from raw RPC nodes, pre‑processed archives, or proprietary event streams determines the granularity of available information (logs only vs. full traces and contract storage).
- Latency vs. re‑org safety – Real‑time pipelines can deliver updates within a block, but may need extra safeguards to handle chain reorganisations.
- Back‑fill speed – The time required to ingest historic data heavily influences developer velocity; a two‑day back‑fill versus a two‑week back‑fill can be the difference between rapid experimentation and a stalled product.
- Chain coverage – Some tools are tightly focused on a few EVM networks, while others support dozens of EVM and non‑EVM chains.
- Transformation model – Indexing logic may live in a dedicated mapping language, TypeScript, Solidity listeners, or external ETL scripts. The location of aggregation (inline vs. post‑processing) affects performance and cost.
- Query layer – Options range from GraphQL endpoints, REST APIs, to raw SQL against a user‑managed database.
- Hosting & cost – Fully managed services relieve DevOps burden but add recurring fees; self‑hosted pipelines give control but require engineering resources and infrastructure spend.
Understanding where a project sits on each of these axes is essential before picking a solution.
The Landscape in Mid‑2025
| Solution | Architecture & Data Access | Performance | Chain Support | Query/API | Hosting Model | Typical Use‑Case |
|---|---|---|---|---|---|---|
| The Graph | Event‑driven subgraphs written in AssemblyScript; increasingly supports Substreams and Firehose for higher throughput. | Near‑real‑time on supported chains; historic back‑fills can be slow without Substreams. | 60+ networks in Subgraph Studio; >90 overall. | GraphQL, schema‑driven. | Hosted, decentralized network, or self‑hosted. | General‑purpose dapps needing a stable GraphQL API. |
| Ponder | Self‑hosted TypeScript framework; reads logs and can call contracts via viem. Transformations performed in the indexer code. | 10–15× faster than vanilla subgraphs when paired with low‑latency RPCs. | All EVM chains via RPC; configuration per chain. | No built‑in API; developers expose data through their own backend (REST/SQL/ORM). | Self‑managed (often on cloud VMs). | Performance‑sensitive DeFi tools, real‑time dashboards, teams already using a TypeScript stack. |
| Envio (HyperIndex) | Proprietary pre‑indexed event layer; indexer logic runs on top of Envio’s internal data format. | Back‑fill >5 000 events/sec; real‑time via HyperRPC. | Broad EVM coverage; no cross‑chain joins. | GraphQL out‑of‑the‑box. | Managed service; developer‑hosted logic. | Rapid event‑monitoring, wildcard indexing across many contracts, low‑latency front‑ends. |
| Subsquid | Distributed archive network feeds large batches to the Squid SDK, which writes to PostgreSQL, BigQuery, etc. | Extremely fast bulk ingestion (tens of thousands of blocks/sec); not optimized for sub‑second UI latency. | 200+ networks, including Substrate and other non‑EVM chains. | No default API; developers query their own database. | Self‑hosted indexers with hosted archive nodes. | Large‑scale analytics, cross‑chain dashboards, data‑team‑centric pipelines. |
| Goldsky | Managed version of The Graph’s subgraph model with performance‑focused infra. | Faster than community‑run subgraphs, especially on high‑volume contracts. | 90+ EVM networks, parity with The Graph. | GraphQL, compatible with subgraph schemas. | Fully managed SaaS, optional “Mirror” to user databases. | Teams that love GraphQL but want a turnkey, production‑grade service. |
| Home‑grown pipelines | Custom node/archival setup; raw data extraction, bespoke schema, custom APIs. | Unlimited potential, limited only by engineering effort. | Unlimited – any chain a team can run. | Fully custom (SQL, REST, GraphQL, etc.). | Entirely self‑hosted, high capex. | High‑frequency trading firms, protocols with atypical data needs, long‑term strategic data assets. |
| Sim IDX (Dune) | Indexing logic compiled into Solidity listeners that run inside Dune’s instrumented EVM (iEVM); real‑time filtering and parallel back‑fills. | Real‑time as blocks execute; highly parallel historic ingestion. | 10+ EVM chains (growing monthly); native cross‑chain queries. | TypeScript + SQL‑based REST endpoints. | Fully managed SaaS (self‑hosted DB option coming soon). | Complex DeFi protocols, applications requiring intra‑transaction state, teams that want zero‑devops indexing. |
Deep‑Dive Highlights
-
The Graph remains the de‑facto standard for GraphQL‑based data access. Its recent Substreams and Firehose extensions have narrowed the performance gap with newer tools, but developers still face limits when they need full‑trace visibility or very low latency.
-
Ponder distinguishes itself with a pure TypeScript stack and the ability to call contracts during indexing. For teams already invested in the JavaScript ecosystem, it provides a rapid development loop and cost‑effective scaling, though it requires a reliable RPC provider.
-
Envio’s HyperIndex abstracts away raw node access by offering a pre‑indexed event store. This dramatically accelerates bulk imports and makes “wildcard” indexing (e.g., “all ERC‑721 transfers”) trivial, at the cost of losing direct trace or storage reads.
-
Subsquid excels in large‑scale, analytics‑first scenarios. Its batch‑oriented architecture can ingest historic data at massive throughput, making it ideal for dashboards that query decades of activity across many networks. The trade‑off is higher latency for real‑time UI use cases.
-
Goldsky provides a managed, Graph‑compatible layer that removes the operational overhead of running one’s own graph nodes. Its “Mirror” feature lets teams pipe subgraph results into traditional data warehouses, blending the best of both worlds.
-
Home‑grown solutions grant unrestricted flexibility, but the engineering and operational cost quickly eclipses most startups’ budgets. They are usually justified only when the business model hinges on proprietary data pipelines.
- Sim IDX is Dune’s answer to the shortcomings of post‑execution indexing. By embedding listeners into the execution environment, Sim captures intra‑transaction state changes and can filter blocks before they are fully processed. This model reduces both latency and the volume of data that must be stored, offering a fresh approach for protocols with complex internal logic (e.g., intent‑based swaps, multi‑step DeFi routes).
Contributor Insights
-
Reliability vs. Speed – Many builders stress that while ultra‑fast indexing opens arbitrage opportunities, it also raises the stakes for chain re‑org handling. Solutions like Sim IDX and Envio provide built‑in safeguards, but teams must still design for eventual consistency.
-
Cost Considerations – Managed services such as Goldsky and Sim IDX can lower the total cost of ownership by removing the need for dedicated node farms and DBA staff. However, fees scale with query volume and indexing hours; careful budgeting is required for high‑traffic apps.
-
Developer Velocity – The ability to preview changes via Git‑driven workflows (available in Sim IDX and The Graph’s Studio) is repeatedly cited as a catalyst for rapid iteration. Tools that integrate with familiar IDEs and CI pipelines (Ponder, Sim IDX) are gaining traction among early‑stage teams.
- Multi‑Chain Complexity – As protocols expand to multiple rollups and L2s, a single‑chain indexer becomes a bottleneck. Subsquid’s broad network support and Sim IDX’s emerging cross‑chain query capabilities are viewed as strategic advantages for future‑proofing.
Key Takeaways
-
No “one‑size‑fits‑all” solution – The optimal indexer depends on a project’s specific latency requirements, data depth (events vs. traces), chain footprint, and budget.
-
Real‑time indexing is becoming mainstream – Both Envio’s HyperRPC and Dune’s Sim IDX demonstrate that sub‑second data delivery is no longer an experimental feature.
-
Managed services are tightening the gap with self‑hosted pipelines – Goldsky, Envio, and Sim IDX offer SaaS experiences that rival the flexibility of custom setups while significantly reducing operational overhead.
-
Transformation location matters – Moving heavy aggregations early in the pipeline (as Sim IDX does) can cut downstream storage costs and improve query performance, especially for high‑frequency DeFi use cases.
-
Multi‑chain support is a differentiator – Tools that natively handle dozens of networks (Subsquid, Goldsky) or provide easy cross‑chain querying (Sim IDX) will be preferred as the ecosystem continues to fragment across L2s and sidechains.
- Developer ergonomics drive adoption – Hot‑reloading, Git‑based deployment, and clear observability dashboards are decisive factors for teams looking to iterate quickly.
Looking Ahead
The indexing ecosystem is in a rapid growth phase, driven by new protocol primitives (account abstraction, intent‑based swaps) and the scaling of rollups. Expect further convergence between event‑driven and execution‑layer approaches, more hybrid services that combine managed real‑time streams with user‑controlled warehouses, and tighter integration with analytics platforms (e.g., Hex, Airflow‑backed ClickHouse stacks).
For developers and product teams, the strategic move is to map their data requirements first, then select the indexer whose trade‑offs align with those priorities. The right choice today will not only power a responsive dApp but also provide a resilient foundation as the blockchain data landscape evolves.
The realtime data must flow.
Source: https://dune.com/blog/the-state-of-evm-indexing
