Tag: Tech

Redefining Product Attribute Tagging With AI-Powered Retail Domain Language Models
In online retail, success hinges on more than just offering quality products at competitive prices. As eCommerce catalogs expand and consumer expectations soar, businesses face an increasingly complex challenge: How do you effectively organize, categorize, and present your vast product assortments in a way that enhances discoverability and drives sales?

Having complete and correct product catalog data is key. Effective product attribute tagging—a crucial yet frequently undervalued capability—helps in achieving this accuracy and completeness in product catalog data. While traditional methods of tagging product attributes have long struggled with issues of scalability, consistency, accuracy, and speed, a new thinking and fundamentally new ways of addressing these challenges are getting established. These follow from the revolution brought in Large Language Models but they fashion themselves as Small Language Models (SLM) or more precisely as Domain Specific Language Models. These can be potentially considered foundational models as they solve a wide variety of downstream tasks albeit within specific domains. They are a lot more efficient and do a much better job in those tasks compared to an LLM. .

Retail Domain Language Models (RLMs) have the potential to transform the eCommerce customer journey. As always, it’s never a binary choice. In fact, LLMs can be a great starting point since they provide an enhanced semantic understanding of the world at large: they can be used to mine structured information (e.g., product attributes and values) out of unstructured data (e.g., product descriptions), create baseline domain knowledge (e.g, manufacturer-brand mappings), augment information (e.g., image to prompt), and create first cut training datasets.

Powered by cutting-edge Generative AI and RLMs, next-generation attribute tagging solutions are transforming how online retailers manage their product catalog data, optimize their assortment, and deliver superior shopping experiences. As a new paradigm in search emerges – based more on intent and outcome, powered by natural language queries and GenAI based Search Agents – the capability to create complete catalog information and rich semantics becomes increasingly critical.

In this post, we’ll explore the crucial role of attribute tagging in eCommerce, delve into the limitations of conventional tagging methods, and unveil how DataWeave’s innovative AI-driven approach is helping businesses stay ahead in the competitive digital marketplace.

Why Product Attribute Tagging is Important in eCommerce

As the eCommerce landscape continues to evolve, the importance of attribute tagging will only grow, making it a pertinent focus for forward-thinking online retailers. By investing in robust attribute tagging systems, businesses can gain a competitive edge through improved product comparisons, more accurate matching, understanding intent, and enhanced customer search experiences.

Taxonomy Comparison and Assortment Gap Analysis

Products are categorized and organized differently on different retail websites. Comparing taxonomies helps in understanding focus categories and potential gaps in assortment breadth in relation to one’s competitors: missing product categories, sizes, variants or brands. It also gives insights into the navigation patterns and information architecture of one’s competitors. This can help in making search and navigation experience more efficient by fine tuning product descriptions to include more attributes and/or adding additional relevant filters to category listing pages.

For instance, check out the different Backpack categories on Amazon and Staples in the images below.

Or look at the nomenclature of categories for “Pens” on Amazon (left side of the image) and Staples (right side of the image) in the image below.

Assortment Depth Analysis

Another big challenge in eCommerce is the lack of standardization in retailer taxonomy. This inconsistency makes it difficult to compare the depth of product assortments across different platforms effectively. For instance, to categorize smartphones,
- Retailer A might organize it under “Electronics > Mobile Phones > Smartphones”
- Retailer B could use “Technology > Phones & Accessories > Cell Phones”
- Retailer C might opt for “Consumer Electronics > Smartphones & Tablets”
Inconsistent nomenclature and grouping create a significant hurdle for businesses trying to gain a competitive edge through assortment analysis. The challenge is exacerbated if you want to do an in-depth assortment depth analysis for one or more product attributes. For instance, look at the image below to get an idea of the several attribute variations for “Desks” on Amazon and Staples.

Custom categorization through attribute tagging is essential for conducting granular assortment comparisons, allowing companies to accurately assess their product offerings against those of competitors.

Enhancing Product Matching Capabilities

Accurate product matching across different websites is fundamental for competitive pricing intelligence, especially when matching similar and substitute products. Attribute tagging and extraction play a crucial role in this process by narrowing down potential matches more effectively, enabling matching for both exact and similar products, and tagging attributes such as brand, model, color, size, and technical specifications.

For instance, when choosing to match similar products in the Sofa category for 2-3 seater sofas from Wayfair and Overstock, tagging attributes like brand, color, size, and more is a must for accurate comparisons.

Taking a granular approach not only improves pricing strategies but also helps identify gaps in product offerings and opportunities for expansion.

Fix Content Gaps and improve Product Detail Page (PDP) Content

Attribute tagging plays a vital role in enhancing PDP content by ensuring adherence to brand integrity standards and content compliance guidelines across retail platforms. Tagging attributes allows for benchmarking against competitor content, identifying catalog gaps, and enriching listings with precise details.

This strategic tagging process can highlight missing or incomplete information, enabling targeted optimizations or even complete rewrites of PDP content to improve discoverability and drive conversions. With accurate attribute tagging, businesses can ensure each product page is fully optimized to capture consumer attention and meet retail standards.

Elevating the Search Experience

In today’s online retail marketplace, a superior search experience can be the difference between a sale and a lost customer. Through in-depth attribute tagging, vendors can enable more accurate filtering to improve search result relevance and facilitate easier product discovery for consumers.

By integrating rich product attributes extracted by AI into an in-house search platform, retailers can empower customers with refined and user-friendly search functionality. Enhanced search capabilities not only boost customer satisfaction but also increase the likelihood of conversions by helping shoppers find exactly what they’re looking for more quickly and with minimal effort.

Pitfalls of Conventional Product Tagging Methods

Traditional methods of attribute tagging, such as manual and rule-based systems, have been significantly enhanced by the advent of machine learning. While these approaches may have sufficed in the past, they are increasingly proving inadequate in the face of today’s dynamic and expansive online marketplaces.

Scalability

As eCommerce catalogs expand to include thousands or even millions of products, the limitations of machine learning and rule-based tagging become glaringly apparent. As new product categories emerge, these systems struggle to keep pace, often requiring extensive revisions to existing tagging structures.

Inconsistencies and Errors

Not only is reliance on an entirely human-driven tagging process expensive, but it also introduces a significant margin for error. While machine learning can automate the tagging process, it’s not without its limitations. Errors can occur, particularly when dealing with large and diverse product catalogs.

As inventories grow more complex to handle diverse product ranges, the likelihood of conflicting or erroneous rules increases. These inconsistencies can result in poor search functionality, inaccurate product matching, and ultimately, a frustrating experience for customers, drawing away the benefits of tagging in the first place.

Speed

When product information changes or new attributes need to be added, manually updating tags across a large catalog is a time-consuming process. Slow tagging processes make it difficult for businesses to quickly adapt to emerging market trends causing significant delays in listing new products, potentially missing crucial market opportunities.

How DataWeave’s Advanced AI Capabilities Revolutionize Product Tagging

Advanced solutions leveraging RLMs and Generative AI offer promising alternatives capable of overcoming these challenges and unlocking new levels of efficiency and accuracy in product tagging.

DataWeave automates product tagging to address many of the pitfalls of other conventional methods. We offer a powerful suite of capabilities that empower businesses to take their product tagging to new heights of accuracy and scalability with our unparalleled expertise.

Our sophisticated AI system brings an advanced level of intelligence to the tagging process.

RLMs for Enhanced Semantic Understanding

Semantic Understanding of Product Descriptions

RLMs analyze the meaning and context of product descriptions rather than relying on keyword matching.
Example: “Smartphone with a 6.5-inch display” and “Phone with a 6.5-inch screen” are semantically similar, though phrased differently.

Attribute Extraction

RLMs can identify important product attributes (e.g., brand, size, color, model) even from noisy or unstructured data.
Example: Extracting “Apple” as a brand, “128GB” as storage, and “Pink” as the color from a mixed description.

Identifying Implicit Relationships

RLMs find implicit relationships between products that traditional rule-based systems miss.
Example: Recognizing that “iPhone 12 Pro” and “Apple iPhone 12” are part of the same product family.

Synonym Recognition in Product Descriptions

Synonym Matching with Context

RLMs identify when different words or phrases describe the same product.
Examples: “Sneakers” = “Running Shoes”, “Memory” = “RAM” (in electronics)
Even subtle differences in wording, like “rose gold” vs “pink” are interpreted correctly.

Overcoming Brand-Specific Terminology

Some brands use their own terminologies (e.g., “Retina Display” for Apple).
RLMs can map proprietary terms to more generic ones (e.g., Retina Display = High-Resolution Display).

Dealing with Ambiguities

RLMs analyze surrounding text to resolve ambiguities in product descriptions.
Example: Resolving “charger” to mean a “phone charger” when matched with mobile phones.

Contextual Understanding for Improved Accuracy and Precision

By leveraging advanced natural language processing (NLP), DataWeave’s AI can process and understand the context of lengthy product descriptions and customer reviews, minimizing errors that often arise at human touch points. The solution processes and interprets information to extract key information to dramatically improve the overall accuracy of product tags.

It excels at grasping the subtle differences between similar products, sizes, colors and identifying and tagging minute differences between items, ensuring that each product is uniquely and accurately represented in a retailer’s catalog.

This has a major impact on product and similarity-based matching that can even help optimize similar and substitute product matching to enhance consumer search. At the same time, our AI can understand that the same term might have different meanings in various product categories, adapting its tagging approach based on the specific context of each item.

This deep comprehension ensures that even nuanced product attributes are accurately captured and tagged for easy discoverability by consumers.

Case Study: Niche Jewelry Attributes

DataWeave’s advanced AI can assist in labeling the subtle attributes of jewelry by analyzing product images and generating prompts to describe the image. In this example, our AI identifies the unique shapes and materials of each item in the prompts.

The RLM can then extract key attributes from the prompt to generate tags. This assists in accurate product matching for searches as well as enhanced product recommendations based on similarities.

This multi-model approach provides the flexibility to adapt as product catalogs expand while remaining consistent with tagging to yield more robust results for consumers.

Unparalleled Scalability

DataWeave can rapidly scale tagging for new categories. The solution is built to handle the demands of even the largest eCommerce catalogs enabling:
- Effortless management of extensive product catalogs: We can process and tag millions of products without compromising on speed or accuracy, allowing businesses to scale without limitations.
- Automated bulk tagging: New product lines or entire categories can be tagged automatically, significantly reducing the time and resources required for catalog expansion.
Normalizing Size and Color in Fashion

Style, color, and size are the core attributes in the fashion and apparel categories. Style attributes, which include design, appearance, and overall aesthetics, can be highly specific to individual product categories.

Our product matching engine can easily handle color and sizing complexity via our AI-driven approach combined with human verification. By leveraging advanced technology to identify and normalize identical and similar products from competitors, you can optimize your pricing strategy and product assortment to remain competitive. Using Generative AI in normalizing color and size in fashion is key to powering competitive pricing intelligence at DataWeave.

Continuous Adaptation and Learning

Our solution evolves with your business, improving continuously through feedback and customization for retailers’ specific product categories. The system can be fine-tuned to understand and apply specialized tagging for niche or industry-specific product categories. This ensures that tags remain relevant and accurate across diverse catalogs and as trends emerge.

The AI in our platform also continuously learns from user interactions and feedback, refining its tagging algorithms to improve accuracy over time.

Stay Ahead of the Competition With Accurate Attribute Tagging

In the current landscape, the ability to accurately and consistently tag product attributes is no longer a luxury—it’s essential for staying competitive. With advancements in Generative AI, companies like DataWeave are revolutionizing the way product tagging is handled, ensuring that every item in a retailer’s catalog is presented with precision and depth. As shoppers demand a more intuitive, seamless experience, next-generation tagging solutions are empowering businesses to meet these expectations head-on.

DataWeave’s innovative approach to attribute tagging is more than just a technical improvement; it’s a strategic advantage in an increasingly competitive market. By leveraging AI to scale and automate tagging processes, online retailers can keep pace with expansive product assortments, manage content more effectively, and adapt swiftly to changes in consumer behavior. In doing so, they can maintain a competitive edge.

To learn more, talk to us today!
November 14, 2024
Using Siamese Networks to Power Accurate Product Matching in eCommerce
Retailers often compete on price to gain market share in high performance product categories. Brands too must ensure that their in-demand assortment is competitively priced across retailers. Commerce and digital shelf analytics solutions offer competitive pricing insights at both granular and SKU levels. Central to this intelligence gathering is a vital process: product matching.

Product matching or product mapping involves associating identical or similar products across diverse online platforms or marketplaces. The matching process leverages the capabilities of Artificial Intelligence (AI) to automatically create connections between various representations of identical or similar products. AI models create groups or clusters of products that are exactly the same or “similar” (based on some objectively defined similarity criteria) to solve different use cases for retailers and consumer brands.

Accurate product matching offers several key benefits for brands and retailers:
- Competitive Pricing: By identifying identical products across platforms, businesses can compare prices and adjust their strategies to remain competitive.
- Market Intelligence: Product matching enables brands to track their products’ performance across various retailers, providing valuable insights into market trends and consumer preferences.
- Assortment Planning: Retailers can analyze their product range against competitors, identifying gaps or opportunities in their offerings.
Why Product Matching is Incredibly Hard

But product matching stands out as one of the most demanding technical processes for commerce intelligence tools. Here’s why:

Data Complexity

Product information comes in various (multimodal) formats – text, images, and sometimes video. Each format presents its own set of challenges, from inconsistent naming conventions to varying image quality.

Data Variance

The considerable fluctuations in both data quality and quantity across diverse product categories, geographical regions, and websites introduce an additional layer of complexity to the product matching process.

Industry Specific Nuances

Industry specific nuances introduce unique challenges to product matching. Exact matching may make sense in certain verticals, such as matching part numbers in industrial equipment or identifying substitute products in pharmaceuticals. But for other industries, exactly matched products may not offer accurate comparisons.
- In the Fashion and Apparel industry, style-to-style matching, accommodating variants and distinguishing between core sizes and non-core sizes and age groups become essential for accurate results.
- In Home Improvement, the presence of unbranded products, private labels, and the preference for matching sets rather than individual items complicates the process.
- On the other hand, for grocery, product matching becomes intricate due to the distinction between item pricing and unit pricing. Managing the diverse landscape of different pack sizes, quantities, and packaging adds further layers of complexity.
Diverse Downstream Use Cases

The diverse downstream business applications give rise to various flavors of product matching tailored to meet specific needs and objectives.

In essence, while product matching is a critical component in eCommerce, its intricacies demand sophisticated solutions that address the above challenges.

To solve these challenges, at DataWeave, we’ve developed an advanced product matching system using Siamese Networks, a type of machine learning model particularly suited for comparison tasks.

Siamese Networks for Product Matching

Our methodology involves the use of ensemble deep learning architectures. In such cases, multiple AI models are trained and used simultaneously to ensure highly accurate matches. These models tackle NLP (natural language processing) and Computer Vision challenges specific to eCommerce. This technology helps us efficiently narrow down millions of product candidates to just 5-15 highly relevant matches.

The Tech Powering Siamese Networks

The key to our approach is creating what we call “embeddings” – think of these as unique digital fingerprints for each product. These embeddings are designed to capture the essence of a product in a way that makes similar products easy to identify, even when they look slightly different or have different names.

Our system learns to create these embeddings by looking at millions of product pairs. It learns to make the embeddings for similar products very close to each other while keeping the embeddings for different products far apart. This process, known as metric learning, allows our system to recognize product similarities without needing to put every product into a rigid category.

This approach is particularly powerful for eCommerce, where we often need to match products across different websites that might use different names or images for the same item. By focusing on the key features that make each product unique, our system can accurately match products even in challenging situations.

How Siamese Networks Work?

Imagine having a pair of identical twins who are experts at spotting similarities and differences. That’s essentially what a Siamese network is – a pair of identical AI systems working together to compare things.

How it works:
- Twin AI systems: Two identical AI systems look at two different products.
- Creating ‘fingerprints’ or ‘embedding’: Each system creates a unique ‘fingerprint’ of the product it’s looking at.
- Comparison: These ‘fingerprints’ are then compared to see how similar the products are.
Architecture

The architecture of a Siamese network typically consists of three main components: the shared network, the similarity metric, and the contrastive loss function.
- Shared Network: This is the ‘brain’ that creates the product ‘fingerprints’ or ‘embeddings.’ It is responsible for extracting meaningful feature representations from the input samples. This network is composed of layers of neural units that work together. Weight sharing between the twin networks ensures that the model learns to extract comparable features for similar inputs, providing a basis for comparison.
- Similarity Metric: After the shared network processes the inputs, a similarity metric is employed. This decides how alike two ‘fingerprints’ or ‘embeddings’ are. The selection of a similarity metric depends on the specific task and characteristics of the input data. Frequently used similarity metrics include the Euclidean distance, cosine similarity, or correlation coefficient, each chosen based on its suitability for the given context and desired outcomes.
- Loss Function: For training the Siamese network, a specialized loss function is used. This helps the system improve its comparison skills over time. It guides and trains the network to generate akin embeddings for similar inputs and disparate embeddings for dissimilar inputs.
  
  This is achieved by imposing penalties on the model when the distance or dissimilarity between similar pairs surpasses a designated threshold, or when the distance between dissimilar pairs falls below another predefined threshold. This training strategy ensures that the network becomes adept at discerning and encoding the desired level of similarity or dissimilarity in its learned embeddings.
How DataWeave Uses Siamese Networks for Product Matching

At DataWeave, we use Siamese Networks to match products across different retailer websites. Here’s how it works:

Pre-processing (Image Preparation)
- We collect product images from various websites.
- We clean these images up to make them easier for our AI to understand.
- We use techniques like cropping, flipping, and adjusting colors to help our AI recognize products even if the images are slightly different.
Training The AI
- We show our AI system millions of product images, teaching it to recognize similarities and differences.
- We use a special learning method called “Triplet Loss” to help our AI understand which products are the same and which are different.
- We’ve tested different AI structures to find the one that works best for product matching, including ResNet, EfficientNet, NFNet, and ViT.
Image Retrieval
- Once trained, our AI creates a unique “fingerprint” for each product image.
- We store these fingerprints in a smart database.
- When we need to find a match for a product, we:
  - Create a fingerprint for the new product.
  - Quickly search our database for the most similar fingerprints.
  - Return the top matching products.
Matches are then assigned a high or a low similarity score and segregated into “Exact Matches” or “Similar Matches.” For example, check out the image of this white shoe on the left. It has a low similarity score with the pink shoe (below) and so these SKUs are categorized as a “Similar Match.” Meanwhile, the shoe on the right is categorized as an “Exact Match.”

Similarly, in the following image of the dress for a young girl, the matched SKU has a high similarity score and so this pair is categorized as an “Exact Match.”

Siamese Networks play a pivotal role in DataWeave’s Product Matching Engine. Amid the millions of images and product descriptions online, our Siamese Networks act as an equalizing force, efficiently narrowing down millions of candidates to a curated selection of 10-15 potential matches.

In addition, these networks also find application in several other contexts at DataWeave. They are used to train our system to understand text-only data from product titles and joint multimodal content from product descriptions.

Leverage Our AI-Driven Product Matching To Get Insightful Data

In summary, accurate and efficient product matching is no longer a luxury – it’s a necessity. DataWeave’s advanced product matching solution provides brands and retailers with the tools they need to navigate this complex landscape, turning the challenge of product matching into a competitive advantage.

By leveraging cutting-edge technology and simplifying it for practical use, we empower businesses to make informed decisions, optimize their operations, and stay ahead in the ever-evolving eCommerce market. To learn more, reach out to us today!
June 26, 2024