In today’s world, most of the enterprises are building LLM based GenAI solutions with document and database vectors. This is the moment almost every enterprise In today’s world, most of the enterprises are building LLM based GenAI solutions with document and database vectors. This is the moment almost every enterprise

How GenAI is Reshaping the Modern Data Architecture

2025/12/12 13:33

The Data Architecture was working, until the GenAI arrived

In today’s world, most of the enterprises are building LLM based GenAI solutions with document and database based knowledge and multi-dimensional vectors. I believe you have either explored or have already done something similar.  Then you must have watched LLM query dragging itself across multiple networks just to fetch an embedding stored in some remote vector database, or look up a database table or a document load from an object store, and then have felt the latency issues firsthand.

  • The models waits
  • Our users waits
  • But neither our cloud cost waits nor the user’s frustration.

This is the moment almost every enterprise reaches: the GenAI works, but the data does not. Don't forget the semantics, they are inconsistent across domains and data stores. RAG systems timeout because documents, tables, and embeddings live in separate universes. And the flashy dashboards built in BI tools have no idea how to answer semantics questions, for example “Which customers are showing signs of churn based on recent sentiment shifts?”.  Eventually, there is friction and frustration between customers and business which can't be ignored.

This is not a tooling problem, it’s a data architecture problem, and in this article I present a story of how enterprises are being forced to rethink their data architecture strategy from the ground up.

For nearly two decades, data platforms evolved linearly.

This made sense when the world was about BI and traditional ML/AI workloads, but GenAI broke this linearity overnight. Now, enterprises need

  • Relational facts and dimensions
  • Massive corpora of documents for grounding
  • Most importantly Vector and graph representations for semantic reasoning

These needs are not occasional, but they are simultaneously needed and also needed at low-latency because natural language interfaces replaced traditional dashboards as the primary interface.

Every natural language query became a mini-workload explosion: SQL + vector search + graph traversal + policy enforcement.

Traditional architectures simply weren’t built for this world.

Let me share a story which I’ve seen repeatedly across enterprises. Two teams ask a simple question to a natural-language LLM powered chat interface: “What is active revenue?” Both get 2 different answers.

The LLM is trying its best to pick a definition based on whichever table it finds first. Nobody knows if the answer is 100% correct, and then we start blaming “GenAI hallucination”. But think about it, is it hallucination or is it simply the semantics which drifted years ago. GenAI only amplified the inconsistency at scale.

Let’s take another example. A support document updates, but the embedding doesn’t. Now, your RAG pipeline uses the context which is stale for hours to days. So, you have a chatbot which is confidently answering questions with outdated advice. Is it the model’s fault or is the data architecture which was not built for freshness.

But the pressure on enterprises keep increasing to provide the best in class natural-language LLM powered chat interface with freshest data and semantically correct answers. At this point, enterprises realise that their BI and traditional ML/AI optimized data architecture is cracking, and the more you scale the GenAI, the more cracks appear.

  • Latency from pulling data across hops
  • Mounting cloud costs from duplicated data, documents, graph and vector stores.
  • Losing trust because data lineage is not tied to the interface.
  • Knowledge gaps because domain expertise lives in people’s heads and not in machine readable form.

Enterprises don’t need new or better tools but they need a new and better architecture such that data and AI can coexist.

We realise the breakthrough moment when we ask ourselves “why we are dragging the data across the network to feed GenAI, why not bring GenAI to the data instead?”. At this point, everything flips around your data architecture.

  • We start generating embeddings inside the Lakehouse instead of externalising the embedding services
  • We start using in-platform vector indexes instead of remote vector databases.
  • Instead of building separate pipelines for semantics, metrics and entities, we start building a semantic layer with a knowledge graph as a shared meaning engine.

At this point the data architecture stops fighting the GenAI and it becomes an enabler for GenAI.

A semantic-First, GenAI-native Architecture

This evolving data architecture is not a nice to have architecture, instead it’s the inevitable consequence of natural-language LLM powered chat interfaces becoming the dominant consumer of the enterprise data. Core philosophy of this architecture is following

  1. Bring GenAI compute to the data: All computation like embeddings, retrievals, inferences everything must run in your lakehouse where your source of truth is stored.
  2. Your semantics should be made as the source of coherence: This philosophy is about unifying the semantic layer and knowledge graph in such a way that all systems, users and LLM interpret the data in the same way.
  3. Hybrid retrieval must be treated as table stakes: Architecture must support SQL+ vector + graph queries as a single retrieval workflow not as a stitched pipeline.
  4. Data architecture must have Trust embedded in itself: Data Lineage, provenance and access policies must enforce correctness at query time itself, it should not be after the fact.
  5. There is no option but to optimise for natural-language scale: As all enterprises are realising that Natural Language Query interfaces increase the demand of data, knowledge, relationship, orchestration by 10x-100x, the architecture must adapt to cache, accelerate and cost-optimise automatically. This should not be about adding components, it’s about reshaping the enterprise data architecture foundation around GenAI as a first class workload.

Based on the above philosophy, the data architecture feels natural like following. This is like walking through a city where each district services a purpose.

==Control Plan: The City’s Rules, memory and meaning==

This is where trust is formed. Every dataset, feature, policy gets their identity, and standardisation of meaning is done. This plane governs how everything is expected to behave as intended, continuously, seamlessly and in a predictable manner.

  • Catalog & Lineage: This is like our city’s registry office

Every datasets, tables, and feature is catalogued here with a unique identity and with full provenance. Anyone or everyone can ask “where did this number come from?”, the data lineage explains the story instantly that it originates at these sources, and transforms in this way and this is their current state. The catalog also manages schema evolution through contracts and schema registry, such that your downstream consumers or you are not surprised when upstreams silently change their structure. If this does not exist, the GenAI systems may lose trust and can start accumulating semantic debt.

  • Semantic Layer

Raw schemas don’t explain how the business thinks, and entities explain how business actually thinks. This is the layer that gives the right SQL or SPARQL or Cypher to the user’s natural language question. This is a meaning made for machines.

  • Policy Manager

Every query must be checked against policies before it’s executed, neither afterwards nor should it be part of an audit. It must be checked right at the point of use. RBAS, ABAC, regional residency, sensitivity rules, masking, row/column-level security, everything applies automatically. Thus purpose based access can be enforced, it should not be just aspirational. GenAI can’t be trusted unless the data architecture is trustworthy by design.

  • Governance & Quality

PII/PCI/PHI detection, data quality monitoring, freshness checks, completeness checks, SLA monitoring, all must run continuously. Quality signals must flow into retrieval workflow, embossings and AI pipelines. If the data is not healthy, the system must know it and hence the GenAI also must know it.

We can see that the control plane is the force which guarantees how the rest of the architecture works as expected and intended.

==Data Plane: The City’s foundation and memory==

If the control plane governs the city, the data plane is the land where everything is built on. This is the place where enterprise’s data lives- whether it’s structured, semi-structured, or unstructured. Data is stored in open formats and interoperable formats such that any compute engine can operate without copying or duplicating the data.

  • Object Storage(OneLake,S3, ADLS)

The Lakehouse sits on top of the object storage using open formats like Apache Iceberg, Delta or Parquet. This provides enterprises ACID reliability and the flexibility for any compute engine to operate on the same data without making copies.

  • Warehouse Tables

This is optimised for BI and metric-driven workloads, and these tables get materialised on top of the Lakehouse data which enables BI dashboard with governance and consistency.

  • Document & Media Stores

PDFs, HTML pages, design docs, images and audio all data live here. These are the sources GenAI with the RAG pipeline depend on to ground their answers to reflect what’s actually true inside the enterprise. When our data architecture treats text, metrics, and media as first class citizens, then only GenAI has the full context.

==Index Plane: The intelligence Layer==

If the data plane holds memories, the Index plane is what makes those memories semantically connected and searchable. We can say that this is the part of the city where information becomes intelligent.

  • Vector Indexes: The City’s semantic map

Every document, row and even graph node is embedded and stored in the approximate nearest neighbor. This enables semantic similarity search for RAG and Natural language driven retrieval. Architecturally, when a support document changes, the updated embedding appears here within seconds. Thus fresh context becomes the default nature of the architecture rather than a luxury.

  • Knowledge Graph: The City’s relationship network

This is the most interesting part of the city, where entities, relationships, lineage, policies and domain rules come to life. Vector search provides “things that look similar”, however the Knowledge graph explains “how things are connected” like

  • Which customer owns which asset
  • Which policies apply to which regions
  • Which transaction feeds which metric
  • Which dataset produced which report

It supports provenance, entity resolution and symbolic reasoning.

The index plane transforms the Lakehouse from storage into cognition.

==Compute Plane: The City’s workforce==

The index plane enables thinking, but the compute plan is the one which acts. This is the plane where all workloads-analytics, streaming, AI run seamlessly against shared data.

  • SQL Warehouse / Lakehouse Engine: The Analysts

They are the ones which power the classic workloads, whether it’s structured queries, metrics, dashboards, or any other operational analytics running the ACID tables with highest level of reliability.

  • Spark, Trino, Presto: The Builders

These are our distributed engines that share the data whether you do ETL or ELT, batch jobs, or any kinds of transformation or ad-hoc analytics. They are the ones who convert the raw material to curated forms.

  • Flink / Kafka Streams: The Traffic Controllers

They are the ones who handle real-time data ingestion, CDC feeds, and all types of low-latency stream processing. They ensure that fresh data flows freely across our city.

  • AI Services: The Specialists

We reach the busy part of the city where GenAI fits directly into the architecture.

  • Embedding services generate embeddings in-platform, reducing latency.
  • RAG orchestrators combine vector + graph + lexical retrieval into unified context.
  • LLM inference runtimes perform prompting, fine-tuning, or adapter training.
  • Guardrails enforce policy, safety, and factual consistency.

It’s the compute plane which keeps the entire AI-native environment alive and evolving.

==The Experience Plane==

Finally, the experience plane is where people meet intelligence, where everything this architecture does becomes visible, usable and valuable.

  • Natural Language UX and Co-pilots

When Users ask questions: “What was the reason churn increased last quarter?” The system routes it through the semantic layer, runs SQL, SPARQL and Vector/Graph retrieval, enforces policies and returns a sourced and explainable answer. This is the interface which makes enterprise GenAI feel like a conversation instead of just a query.

  • BI Tools and Dashboards

All kinds of BI tools like PowerBI, Tableau, Fabric, Direct Lake and their visual analytics live here, which is powered by the shared semantic and Lakehouse layers beneath them.

  • Applications and APIs

Finally the traditional experience plane like application and api also live in this plane. Decision intelligence, recommendations, RPA, analytics apps and everything else consumes data and AI via consistent APIs and governed access.

Once enterprises adopt this AI-native, semantics-first data architecture, several things become possible.

  1. Latency drops dramatically
  2. Costs stabilises
  3. Answers become trustworthy
  4. Domain expertise becomes digital
  5. AI becomes model agnostics infrastructure

This is what the future of enterprise GenAI looks like: meaning driven, trust-embedded, and built for hybrid retrieval at scale.

As we have seen data architectures were built to serve BI dashboards and traditional AI workloads for decades, now they must serve natural languages as the interface and GenAI as the primary consumer. With this evolution, the center of gravity has moved. Enterprises need to

  • Bring AI to the data
  • Elevate semantics to the level of infrastructure
  • Unify structured, unstructured and vector/graph data
  • Treat trust as design principle than a compliance checkbox
  • Build architecture where meaning, not formats becomes the connective tissue

This should not be treated as an incremental shift, instead this should be the way foundational reshaping of data systems should be done to support intelligence, and the enterprise that will embrace it early will define the next decade of enterprise GenAI. \n

\

Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Tether's value surges over 40-fold, with a $500 billion valuation hinting at both capital and narrative ambitions.

Tether's value surges over 40-fold, with a $500 billion valuation hinting at both capital and narrative ambitions.

By Nancy, PANews News that Tether is in talks to raise funds at a $500 billion valuation has propelled it to new heights. If the deal goes through, its valuation would leap to the highest of any global crypto company, rivaling even Silicon Valley unicorns like OpenAI and SpaceX. Tether, with its strong capital base, boasts profit levels that have driven its price-to-earnings ratio beyond the reach of both crypto and traditional institutions. Yet, its pursuit of a new round of capital injection at a high valuation serves not only as a powerful testament to its profitability but also as a means of shaping the market narrative through capital operations, building momentum for future business and market expansion. Net worth soared more than 40 times in a year, and well-known core investors are being evaluated. On September 24, Bloomberg reported that stablecoin giant Tether is planning to sell approximately 3% of its shares at a valuation of $15 billion to $20 billion. If the deal goes through, Tether's valuation could reach approximately $500 billion, making it one of the world's most valuable private companies and potentially setting a record for the largest single financing in the history of the crypto industry. By comparison, in November 2024, Cantor Fitzgerald, a prominent US financial services firm, acquired approximately 5% of Tether for $600 million, valuing the company at approximately $12 billion. This means Tether's value has increased more than 40-fold in less than a year. However, since Cantor Fitzgerald's former CEO, Howard Lutnick, is currently the US Secretary of Commerce, the deal was interpreted as a "friendship price" that could potentially garner more political support for Tether. Tether's rapid rise in value is largely due to its dominant market share, impressive profit margins, and solid financial position. According to Coingecko data, as of September 24th, USDT's market capitalization exceeded $172 billion, setting a new record and accounting for over 60% of the market share. Furthermore, Tether CEO Paolo Ardoino recently admitted that Tether's profit margin is as high as 99%. The second-quarter financial report further demonstrates Tether's robust financial position, with $162.5 billion in reserve assets exceeding $157.1 billion in liabilities. "Tether has about $5.5 billion in cash, Bitcoin and equity assets on its balance sheet. If calculated based on the approximately $173 billion USDT in circulation and a 4% compound yield, and if it raises funds at a valuation of $500 billion, it means that its enterprise value to annualized return (PE) multiple is about 68 times," Dragonfly investor Omar pointed out. Sources familiar with the matter revealed that the disclosed valuation represents the upper end of the target range, and the final transaction value could be significantly lower. Negotiations are at an early stage, and investment details are subject to change. The transaction involves the issuance of new shares, not the sale of shares by existing investors. Paolo Ardoino later confirmed that the company is actively evaluating the possibility of raising capital from a number of prominent core investors. Behind the high valuation of external financing, the focus is on business expansion and compliance layout Tether has always been known to be "rich." The stablecoin giant is expected to generate $13.7 billion in net profit in 2024, thanks to interest income from U.S. Treasury bonds and cash assets. For any technology or financial company, this profit level is more than enough to support continued expansion. However, Tether is now launching a highly valued external financing plan. This is not only a capital operation strategy, but also relates to business expansion and regulatory compliance. According to Paolo Ardoino, Tether plans to raise funds to expand the company's strategic scale in existing and new business lines (stablecoins, distribution coverage, artificial intelligence, commodity trading, energy, communications, and media) by several orders of magnitude. He disclosed in July this year that Tether has invested in over 120 companies to date, and this number is expected to grow significantly in the coming months and years, with a focus on key areas such as payment infrastructure, renewable energy, Bitcoin, agriculture, artificial intelligence, and tokenization. In other words, Tether is trying to transform passive income that depends on the interest rate environment into active growth in cross-industry investments. But pressure is mounting. With the increasing number of competitors and the Federal Reserve resuming its interest rate cut cycle, Tether's main source of profit faces downward risks. The company has previously emphasized that its external investments are entirely sourced from its own profits. A decline in earnings expectations would mean a shrinking pool of funds available for expansion. However, the injection of substantial financing would provide Tether with ample liquidity for its investment portfolio. What truly necessitates Tether's capital and resources is expansion into the US market. With the implementation of the US GENIUS Act, stablecoin issuance enters a new compliance framework. This presents both a challenge and an opportunity for Tether. This is especially true after competitor Circle's successful IPO and capital market recognition, with its valuation soaring to $30 billion, further magnifying Tether's compliance shortcomings. On the one hand, USDT has long been on the gray edge, walking on the edge of regulation. Tether has successfully attracted public attention through extremely small equity transactions and huge valuations, and has also used this to enhance the market narrative, thereby breaking the negative perception of the outside world and significantly enhancing its own influence. On the other hand, unlike Circle's IPO, Tether has chosen a different path to gain mainstream market acceptance. In September of this year, Tether announced that it would launch a US-native stablecoin, USAT, by the end of the year. Unlike the widely circulated USDT, USAT is designed specifically for businesses and institutions operating under US regulations. It is issued by Anchorage Digital, a licensed digital asset bank, and operates on Tether's global distribution network. This allows Tether to retain control over its core profits while meeting regulatory compliance requirements. The personnel arrangements also make this new card intriguing. USAT's CEO is Bo Hines (see also: 29-Year-Old Crypto Upstart Bo Hines: From White House Crypto Liaison to Rapid Assignment to Tether's US Stablecoin ). In August of this year, Tether appointed him as its Digital Asset and US Strategy Advisor, responsible for developing and executing Tether's US market development strategy and strengthening communication with policymakers. As previously reported by PANews, Hines previously served as the White House Digital Asset Policy Advisor, where he was responsible for promoting crypto policy and facilitating the passage of the GENIUS Act, a US stablecoin, and has accumulated extensive connections in the political and business circles. This provides USAT with an additional layer of protection when entering the US market. Cantor Fitzgerald, the advisor to this financing round, is also noteworthy. As one of the Federal Reserve's designated principal dealers, Cantor boasts extensive experience in investment banking and private equity, building close ties to Wall Street's political and business networks. Furthermore, Cantor is the primary custodian of Tether's reserve assets, providing firsthand insight into the latter's fund operations. For external investors, Cantor's involvement not only adds credibility to Tether's financing valuation but also provides added certainty for the launch of USAT in the US market.
Paylaş
PANews2025/09/24 15:52