Ask Indegene Icon

Ask Indegene (Beta)

Online
🧠 Building on our previous conversation...

Hello, how can I help you today?

You may type your question or choose from the options below:

Explore Solutions
Browse Insights
View Case Studies
Read Latest News
Explore Careers
Connect with an Expert
Please enter your full name
Please enter a valid work email
Please enter your message

Thank you!

We'll be in touch. In the meantime, feel free to keep exploring!

#FutureReadyHealthcare
Indegene
Search Icon
Reimagining Pharma Content Atomization and Tagging with GenAI
Home
What we think
Blogs Boost Content Atomization and Tagging with AI

Reimagining Pharma Content Atomization and Tagging with GenAI

Share this blog

Updated on : 29 Jul 2025

The sheer volume of customer-focused and healthcare professional (HCP)-focused content within pharma organizations presents a significant challenge in effectively managing and utilizing this vast information. As pharma organizations continuously generate an enormous amount of content, ranging from scientific research and clinical trial data to marketing materials and educational resources, it becomes increasingly difficult to streamline, categorize, and tailor content to meet the specific needs and preferences of diverse audiences.

Elevating Content Intelligence in Pharma

A modern content approval workflow is essential for pharma organizations to ensure fast, compliant, and auditable delivery of scientific and promotional materials. AI-driven solutions can streamline the multi-stage review process, automate compliance checks, and facilitate real-time collaboration among stakeholders, reducing bottlenecks and minimizing the risk of regulatory delays. This efficiency not only accelerates content deployment but also enhances overall content quality and governance.
AI content tagging equips every digital asset with rich, precise metadata, making it far easier to search, retrieve, and personalize content at scale. In pharma, automated tagging powered by advanced artificial intelligence and machine learning models ensures scientific accuracy and regulatory alignment, while speeding up the organization and categorization of high volumes of medical and commercial information. The result is dramatically improved discoverability and agility in multichannel engagement.
Content atomization, driven by Generative AI, or GenAI, enables organizations to break down complex assets into bite-sized, reusable components or “atoms.” This approach unlocks true omnichannel engagement in pharma by allowing teams to quickly assemble and adapt approved messages for diverse audiences across digital platforms. Automated content atomization drives both consistency and compliance, while reducing content production times.
Content intelligence in pharma brings together advanced analytics, AI-driven insights, and workflow automation to help organizations maximize the impact of their scientific and commercial content. By monitoring tag usage, approval cycles, and audience interactions, teams can rapidly identify trends, address gaps, and optimize performance. This data-driven approach empowers smarter, more agile decision-making, resulting in measurable gains in engagement, compliance, and commercial effectiveness.
Indegene’s NEXT Commercial Content Intelligence (NCCI) is a proprietary artificial intelligence (AI)-based solution for content atomization and tagging, specifically engineered for pharma organizations. By utilizing robust artificial intelligence and machine learning (AI/ML) models, NCCI deconstructs an array of assets, including iDetails, representative triggered emails (RTEs), banners, webpages, videos, and more, into atomized versions and then helps generate relevant tags.
Serving over 10 pharmaceutical majors for over half a decade, NCCI has demonstrated capabilities in being an automated, consistent, and scalable solution, transforming multiple use cases, including content discoverability, analytics, personalization, and many more.
With the integration of Generative AI, NCCI has now further enhanced its capabilities. This blog post details how NCCI became a comprehensive AI–powered content intelligence tool for pharma, including the opportunities explored, challenges faced, and outcomes achieved.

Areas where GenAI can help improve content management and optimization

While NEXT Commercial Content Intelligence’s current supervised learning–based models were adequate, generative AI offered the opportunity to achieve fully automated content atomization and tagging, making the entire journey more efficient and effective. The steps involved in this journey and the potential impact of generative AI are summarized in the table below:
StageActivitiesImpact of generative AIImpact level
Taxonomy Design
Use case comprehension, tag requirements, category/definition crafting, overlap analysis, manual tagging, and client revisions
Expedites taxonomy creation, seamlessly handles overlaps, and streamlines revisions
High
Model Development and Deployment
Dataset preparation, model creation, validation, and deployment
Bypasses conventional model generation steps by using intelligent prompts and optimizing model development
High
Machine Tagging
Content upload, running the model, and output generation
-
-
Subject matter expert (SME) Review
Asset audit, content comprehension, and tag validation
Generative AI enables content summarization, key area highlighting, and SME workload reduction
High
Output Generation and Transfer
Reports and API tag transfers
-
-
Model Retraining
Data selection, pipeline run, validation, and new version deployment
Generative AI–driven accuracy reduces retraining needs, allowing the effective use of intelligent prompts for efficient model enhancement
High
Explore how Indegene significantly boosted lead conversion rates by 30% for a major pharma company with AI-driven content personalization.

Boosting content atomization and content tagging with GenAI

While powering NEXT Commercial Content Intelligence with GenAI is critical to staying future ready, the path to production involved several challenges. The following section attempts to explain the process of transitioning to GenAI for one core capability, that is, atom-level key message generation.
NCCI’s core contained an atom-level key message generation model that was built through supervised learning, utilizing 250,000+ content pieces specific to the pharmaceutical domain. This model included 17 categories (efficacy, safety, brand information, study design, etc.) and was continuously retrained over the past 3 years to attain an accuracy of 83%. The aim was to now improve this model with GenAI.

Pilot experiment

The NCCI team experimented using a limited sample size with GPT 3.5 Turbo to generate key messages for 100 atoms. Several prompting techniques were experimented with before narrowing down to the prompt with the highest prediction accuracy.
Trial numberPrompt parametersAccuracy with GPT-3.5 Turbo
1
Category names and OCR text of the atom
33.2%
2
Category definition and OCR text of the atom
51.8%
3
Category definition, OCR text of the atom, and role definition
62.3%
4
Categories redefined (overlaps were identified and removed), OCR text of the atom, and role definition
89.8%
Since the pilot experiment now had an accuracy higher than the existing model, the team decided to productionize the feature with GPT-3.5 Turbo.

Scaled experimentation

The experiment was then extended to 1,000 atoms, and 3 runs were conducted. The following observations were noted:
Consistency between the runs was only 30%, that is, the same prompt given 3 times resulted in the same output only in 30% of the cases
Unintended responses were received in 20% of the cases. This included
Response outside the given/defined categories
Response with no categories
To improve the response and make the model more deterministic, hyperparameter tuning was used. After several batch tests, the temperature and top P values were optimized. The consistency improved to 87%.
To avoid unintended responses, a condition was set to verify the response before acceptance and a prompt was defined to resend and revise the output. This reduced the unintended responses to <2%. For the cases where the output was still unintended, the platform was modified to provide default values.

Final run and deployment

After hyperparameter tuning, a run for 1000 atoms was conducted and an accuracy of 84.8% was achieved. This model was cleared for deployment because the results were significantly higher than the earlier available model.
The deployment architecture was designed to integrate NEXT Commercial Content Intelligence’s content atomization workflow with the GPT-3.5 Turbo–based atom-level key message identification. This deployment also took into consideration the data privacy and security concerns of the pharmaceutical majors, making the information exchange completely secure and protected.

Custom taxonomies and use cases

Every NEXT Commercial Content Intelligence customer requires additional custom categories in addition to the default set already present in the core model. Experiments were run to accommodate new categories and modify existing categories to ensure that similar accuracies could be achieved for custom categories with prompt tuning.

Benefits of using GenAI to boost content atomization and content tagging

Reduced time to deployment: A 70% decrease in time to deployment was achieved because the model could be tuned with just prompts and did not require custom dataset preparation and model customization.
Reduced total costs: The total cost of deployment and operations for 5000 pages of content is summarized in the graph below. While model training, deployment, and retraining costs were lower for GPT-3.5 Turbo, the operational charges were slightly higher. However, the total cost of tagging reduced by a staggering 76%.

Next steps and planned refinements

The NEXT Commercial Content Intelligence team noticed that GPT-3.5 Turbo produced lower accuracy in one key area, that is, key message identification for graphical components with unstructured optical character recognition (OCR) text such as logos, graphs, illustrations, diagrams, and so on. Experiments are now being conducted to improve results in this area.

Additional GenAI–powered content management capabilities

With similar replacements to the core models, NEXT Commercial Content Intelligence is now powered by GenAI for most of its capabilities. A few of these key capabilities and their performance are summarized in the table below:
StageActivityPerformance*
Taxonomy Design
Taxonomy category definition
Taxonomy definition validation
Taxonomy overlap analysis and redefinition
90% reduction in efforts required for a taxonomy design with generative AI support
Model Development and Deployment
Content atomization
New models (e.g., Segment Anything) were able to maintain the contours of graphics and identify atoms with a coverage of 93%, compared with the earlier model’s coverage of 78%
Brand identification
The GPT-3.5 Turbo–based model improved accuracy from 83% to 93.33%
Therapy area identification
The GPT-3.5 Turbo–based model improved accuracy from 79% to 83.33%
Keyword identification –English
The GPT 3.5-Turbo–based model improved coverage from 76% to 93.33%
SME Review
Tag validation
The GPT 3.5-Turbo–based summary generation and attention optimization reduced SME review efforts by 54%
*Results are based on limited experimentation.

GenAI-driven Content Intelligence in Pharma: Insights That Drive Action

Content intelligence in pharma brings together advanced analytics, AI-driven insights, and workflow automation to help organizations maximize the impact of their scientific and commercial content. By monitoring tag usage, approval cycles, and audience interactions, teams can rapidly identify trends, address gaps, and optimize performance. This data-driven approach empowers smarter, more agile decision-making, resulting in measurable gains in engagement, compliance, and commercial effectiveness.
NEXT Commercial Content Intelligence was built to transform pharmaceutical content tagging and make it more automated, consistent, scalable, and affordable. Today, GenAI presents a transformative opportunity to further enhance these core goals.
However, using GenAI to deliver consistent and reliable output for business use cases comes with a set of factors that must be considered. These include:
Model reliability: The degree of control and understanding of model characteristics in a generalized large language model (LLM) model is much lower than a supervised learning model. This makes it necessary to regularly run scaled experiments to verify and sustain reliable output.
Version updates: LLM models are continuously updated and upgraded, and as observed during our experimentation, the output from each version is different. This requires continuous monitoring to ensure consistency and continuous improvement.
Consistency-Accuracy balance: When GenAI is used for classification, low randomness affects the accuracy of the prediction, while high randomness affects consistency. Therefore, a balance between accuracy and consistency should be maintained for reliable output.
With newer and more powerful generative AI releases upcoming, the NEXT Commercial Content Intelligence team will continue to identify and experiment with more transformative changes that deliver performance gains to life sciences businesses. Watch this space for more updates from us in the coming months.

Share this blog

    Get exclusive pharma
    insights delivered to your inbox

    Latest

    Latest

      Powered by Onetrust