The DataWeave Blog

Tag: Artificial Intelligence

The retail industry is one of the pioneers in the adoption of Artificial Intelligence. Check out how DataWeave’s proprietary Artificial Intelligence technology helps retailers make smarter decisions.

DataWeave’s AI Evolution: Delivering Greater Value Faster in the Age of AI and LLMs
In retail, competition is fierce, and in its ever-evolving landscape, consumer expectations are higher than ever.

For years, our AI-driven solutions have been the foundation that empowers businesses to sharpen their competitive pricing and optimize digital shelf performance. But in today’s world, evolution is constant—so is innovation. We now find ourselves at the frontier of a new era in AI. With the dawn of Generative AI and the rise of Large Language Models (LLMs), the possibilities for eCommerce companies are expanding at an unprecedented pace.

These technologies aren’t just a step forward; they’re a leap—propelling our capabilities to new heights. The insights are deeper, the recommendations more precise, and the competitive and market intelligence we provide is sharper than ever. This synergy between our legacy of AI expertise and the advancements of today positions DataWeave to deliver even greater value, thus helping businesses thrive in a fast-paced, data-driven world.

This article marks the beginning of a series where we will take you through these transformative AI capabilities, each designed to give retailers and brands a competitive edge.

In this first piece, we’ll offer a snapshot of how DataWeave aggregates and analyzes billions of publicly available data points to help businesses stay agile, informed, and ahead of the curve. These fall into four broad categories:
- Product Matching
- Attribute Tagging
- Content Analysis
- Promo Banner Analysis
- Other Specialized Use Cases
Product Matching

Dynamic pricing is an indispensable tool for eCommerce stores to remain competitive. A blessing—and a curse—of online shopping is that users can compare prices of similar products in a few clicks, with most shoppers gravitating toward the lowest price. Consequently, retailers can lose sales over minor discrepancies of $1–2 or even less.

All major eCommerce platforms compare product prices—especially their top selling products—across competing players and adjust prices to match or undercut competitors. A typical product undergoes 20.4 price changes annually, or roughly once every 18 days. Amazon takes it to the extreme, changing prices approximately every 10 minutes. It helps them maintain a healthy price perception among their consumers.

However, accurate product matching at scale is a prerequisite for the above, and that poses significant challenges. There is no standardized approach to product cataloging, so even identical products bear different product titles, descriptions, and attributes. Information is often incomplete, noisy, or ambiguous. Image data contains even more variability—the same product can be styled using different backgrounds, lighting, orientations, and quality; images can have multiple overlapping objects of interest or extraneous objects, and at times the images and the text on a single page might belong to completely different products!

DataWeave leverages advanced technologies, including computer vision, natural language processing (NLP), and deep learning, to achieve highly accurate product matching. Our pricing intelligence solution accurately matches products across hundreds of websites and automatically tracks competitor pricing data.

Here’s how it works:

Text Preprocessing

It identifies relevant text features essential for accurate comparison.
- Metadata Parsing: Extracts product titles, descriptions, attributes (e.g., color, size), and other structured data elements from Product Description Pages (PDP) that can help in accurately identifying and classifying products.
- Attribute-Value Normalization: Normalize attributes names (e.g. RAM vs Memory) and their values (e.g., 16 giga bytes vs 16 gigs vs 16 GB); brand names (e.g., Benetton vs UCB vs United Colors of Benetton); mapping category hierarchies a standard taxonomy.
- Noise Removal: Removes stop words and other elements with no descriptive value; this focuses keyword extraction on meaningful terms that contribute to product identification.
Image Preprocessing

Image processing algorithms use feature extraction to define visual attributes. For example, when comparing images of a red T-shirt, the algorithm might extract features such as “crew neck,” “red,” or “striped.”

Image hashing techniques create a unique representation (or “hash”) of an image, allowing for efficient comparison and matching of product images. This process transforms an image into a concise string or sequence of numbers that captures its essential features even if the image has been resized, rotated, or edited.

Before we perform these activities there is a need to preprocess images to prepare them for downstream operations. These include object detection to identify objects of interest, background removal, face/skin detection and removal, pose estimation and correction, and so forth.

Embeddings

We have built a hybrid or a multimodal product-matching engine that uses image features, text features, and domain heuristics. For every product we process we create and store multiple text and image embeddings in a vector database. These include a combination of basic feature vectors (e.g. tf-idf based, colour histograms, share vectors) to more advanced deep learning algorithms-based embeddings (e.g., BERT, CLIP) to the latest LLM-based embeddings.

Classification

Classification algorithms enhance product attribute tagging by designating match types. For example, the product might be identified as an “exact match”, “variant”, “similar”, or “substitute.” The algorithm can also identify identical product combinations or “baskets” of items typically purchased together.

What is the Business Impact of Product Matching?
- Pricing Intelligence: Businesses can strategically adjust pricing to remain competitive while maintaining profitability. High-accuracy price comparisons help businesses analyze their competitive price position, identify opportunities to improve pricing, and reclaim market share from competitors.
- Similarity-Based Matching: Products are matched based on a range of similarity features, such as product type, color, price range, specific features, etc., leading to more accurate matches.
- Counterfeit Detection: Businesses can identify counterfeit or unauthorized versions of branded products by comparing them against authentic product listings. This helps safeguard brand identity and enables brands to take legal action against counterfeiters.
Attribute Tagging

Attribute tagging involves assigning standardized tags for product attributes, such as brand, model, size, color, or material. These naming conventions form the basis for accurate product matching. Tagging detailed attributes, such as specifications, features, and dimensions, helps match products that meet similar criteria. For example, tags like “collar” or “pockets” for apparel ensure high-fidelity product matches for hard-to-distinguish items with minor stylistic variations.

Including tags for synonyms, variants, and long-tail keywords (e.g., “denim” and “jeans”) improves the matching process by recognizing different terms used for similar products. Metadata tags categorize similar items according to SKU numbers, manufacturer details, and other identifiers.

Altogether, these capabilities provide high-quality product matches and valuable metadata for retailers to classify their products and compare their product assortment to competitors.

User-Generated Content (UGC) Analysis

Customer reviews and ratings are rich sources of information, enabling brands to gauge consumer sentiment and identify shortcomings regarding product quality or service delivery. However, while informative, reviews constitute unstructured “noisy” data that is actionable only if parsed correctly.

Here’s where DataWeave’s UGC analysis capability steps in.
- Feature Extractor: Automatically pulls specific product attributes mentioned in the review (e.g., “battery life,” “design” and “comfort”)
- Feature Opinion Pair: Pairs each product attribute with a corresponding sentiment from the review (e.g., “battery life” is “excellent,” “design” is “modern,” and “comfort” is “poor”)
- Calculate Sentiment: Calculates an overall sentiment score for each product attribute
The final output combines the information extracted from each of these features, which looks something like this:
- Battery life is excellent
- Design is modern
- Not satisfied with the comfort
The algorithm also recognizes spammy reviews and distinguishes subjective reviews (i.e., those fueled by emotion) from objective ones.

Promo Banner Analysis

Our image processing tool can interpret promotional banners and extract information regarding product highlights, discounts, and special offers. This provides insights into pricing strategies and promotional tactics used by other online stores.

For example, if a competitor offers a 20% discount on a popular product, you can match or exceed this discount to attract more customers.

The banner reader identifies successful promotional trends and patterns from competitors, such as the timing of discounts, frequently promoted product categories or brands, and the duration of sales events. Ecommerce stores can use this information to optimize their promotion strategies, ensuring they launch compelling and timely offers.

Other Specialized Use Cases

While these generalized AI tools are highly useful in various industries, we’ve created other category—and attribute-specific capabilities for specialty goods (e.g., those requiring certifications or approval by federal agencies) and food items. These use cases help our customers adhere to compliance requirements.

Certification Mark Detector

This detector lets retailers match items based on official certification marks. These marks represent compliance with industry standards, safety regulations, and quality benchmarks.

Example:
- USDA Organic: Certification for organic food production and handling
- ISO 9001: Quality Management System Certification
By detecting these certification marks, the system can accurately match products with their certified counterparts. By identifying which competitor products are certified, retailers can identify products that may benefit from certification.

Nutrition Fact Table Reader

Product attributes alone are insufficient for comparing food items. Differences in nutrition content can influence product category (e.g., “health food” versus regular food items), price point, and consumer choice. DataWeave’s nutrition fact table reader scans nutrition information on packaging, capturing details such as calorie count, macronutrient distribution (proteins, fats, carbohydrates), vitamins, and minerals.

The solution ensures items with similar nutritional profiles are correctly identified and grouped based on specific dietary requirements or preferences. This helps with price comparisons and enables eCommerce stores to maintain a reliable database of product information and build trust among health-conscious consumers.

Building Next-Generation Competitive and Market Intelligence

Moving forward, breakthroughs in generative AI and LLMs have fueled substantial innovation, which has enabled us to introduce powerful new capabilities for our customers.

These include:
- Building Enhanced Products, Solutions, and Capabilities: Generative AI and LLMs can significantly elevate the performance of existing solutions by improving the accuracy, relevance, and depth of insights. By leveraging these advanced AI technologies, DataWeave can enhance its product offerings, such as pricing intelligence, product matching, and sentiment analysis. These tools will become more intuitive, allowing for real-time updates and deeper contextual understanding. Additionally, AI can help create entirely new solutions tailored to specific use cases, such as automating competitive analysis or identifying emerging market trends. This positions DataWeave to remain at the forefront of innovation, offering cutting-edge solutions that meet the evolving needs of retailers and brands.
- Reducing Turnaround Time (TAT) to Go-to-Market Faster: Generative AI and LLMs streamline data processing and analysis workflows, enabling faster decision-making. By automating tasks like data aggregation, sentiment analysis, and report generation, AI dramatically reduces the time required to derive actionable insights. This efficiency means that businesses can respond to market changes more swiftly, adjusting pricing or promotional strategies in near real-time. Faster insights translate into reduced turnaround times for product development, testing, and launch cycles, allowing DataWeave to bring new solutions to market quickly and give clients a competitive advantage.
- Improving Data Quality to Achieve Higher Performance Metrics: AI-driven technologies are exceptionally skilled at cleaning, organizing, and structuring large datasets. Generative AI and LLMs can refine the data input process, reducing errors and ensuring more accurate, high-quality data across all touchpoints. Improved data quality enhances the precision of insights drawn from it, leading to higher performance metrics like better product matching, more accurate price comparisons, and more effective consumer sentiment analysis. With higher-quality data, businesses can make smarter, more informed decisions, resulting in improved revenue, market share, and customer satisfaction.
- Augmenting Human Bandwidth with AI to Enhance Productivity: Generative AI and LLMs serve as powerful tools that augment human capabilities by automating routine, time-consuming tasks such as data entry, classification, and preliminary analysis. This allows human teams to focus on more strategic, high-value activities like interpreting insights, building relationships with clients, and developing new business strategies. By offloading these repetitive tasks to AI, human productivity is significantly enhanced. Employees can achieve more in less time, increasing overall efficiency and enabling teams to scale their operations without needing a proportional increase in human resources.
In our ongoing series, we will dive deep into each of these capabilities, exploring how DataWeave leverages cutting-edge AI technologies like Generative AI and LLMs to solve complex challenges for retailers and brands.

In the meantime, talk to us to learn more!
September 18, 2024
How AI-Powered Visual Highlighting Helps Brands Achieve Product Consistency Across eCommerce
As eCommerce increasingly becomes a prolific channel of sales for consumer brands, they find that maintaining a consistent and trustworthy brand image is a constant struggle. In an ecosystem filled with dozens of marketplaces and hundreds of third-party merchants, ensuring that customers see what aligns with a brand’s intended image is quite tricky. With many fakes and counterfeit products doing the rounds, brands may further struggle to get the right representation.

One way brands can track and identify inconsistencies in their brand representation across marketplaces is to use Digital Shelf Analytics solutions like DataWeave’s – specifically the Content Audit module.

This solution uses advanced AI models to identify image similarities and dissimilarities compared with the original brand image. Brands could then use their PIM platform or work with the retailer to replace inaccurate images.

But here’s the catch – AI can’t always accurately predict all the differences. Relying solely on scores given by these models poses a challenge in tracking the subtle differences between images. Often, image pairs with seemingly high match scores fail to catch important distinctions. Fake or counterfeit products and variations that slip past the AI’s scrutiny can lead to significant inaccuracies. Ultimately, it puts the reliability of the insights that brands depend on for crucial decisions at risk, impacting both top and bottom lines.

Dealing with this challenge means finding a balance between the number-based assessments of AI models and the human touch needed for accurate decision-making. However, giving auditors the ability to pinpoint variations precisely goes beyond simply sharing numerical values of the match scores with them. Visualizing model-generated scores is important as it provides human auditors with a tangible and intuitive understanding of the differences between two images. While numerical scores are comparable in the relative sense, they lack specificity. Visual interpretation empowers auditors to identify precisely where variations occur, aiding in efficient decision-making.

How AI-Powered Image Scoring Works

At DataWeave, our approach involves employing sophisticated computer vision models to conduct extensive image comparisons. Convolutional Neural Network (CNN) models such as Resnet-50 or YOLO, in conjunction with feature extraction models, analyze images quantitatively. This AI-powered image scoring process yields scores that indicate the level of similarity between images.

However, interpreting these scores and understanding the specific areas of difference can be challenging for human auditors. While computer vision models excel at processing vast amounts of data quickly, translating their output into actionable insights can be a stumbling block. A numerical score may not immediately convey the nature or extent of the differences between images

In the assessment of these images, all fall within the 70 to 80 range of scores (out of a maximum of 100). However, discerning the nature of differences—whether they are apparent or subtle—poses a challenge for the AI models and human auditors. For example, there are differences in the placement or type of images in the packaging, as well as packing text that are often in an extremely small font size. It is, of course, possible for human auditors to identify the differences in these images, but it’s a slow, error-prone, and tiring process, especially when auditors often have to check hundreds of image pairs each day.

So how do we ensure that we identify differences in images accurately? The answer lies in the process of visual highlighting.

How Visual Highlighting Works

Visual highlighting is a method that enhances our ability to comprehend differences in images by combining sophisticated algorithms with human understanding. Instead of relying solely on numerical scores, this approach introduces a visual layer, resembling a heatmap, guiding human auditors to specific areas where discrepancies are present.

Consider the scenario depicted in the images above: a computer vision model assigns a score of 70-85 for these images. While this score suggests relatively high similarity, it fails to uncover major differences between the images. Visual highlighting comes into play to overcome this limitation, precisely indicating regions where even subtle differences are seen.

Visual highlighting entails overlaying compared images and emphasizing areas of difference, achieved through techniques like color coding, outlining, or shading specific regions. The significance of the difference in a particular area determines the intensity of the visual highlight.

For instance, if there’s a change in the product’s color or a discrepancy in the packaging, these variations will be visually emphasized. This not only streamlines the auditing process but also enables human evaluators to make well-informed decisions quickly.

Benefits of Visual Highlighting
- Intuitive Understanding: Visual highlighting offers an intuitive method for interpreting and acting upon the outcomes of computer vision models. Instead of delving into numerical scores, auditors can concentrate on the highlighted areas, enhancing the efficiency and accuracy of the decision-making process.
- Accelerated Auditing: By bringing attention to specific regions of concern, visual highlighting speeds up the auditing process. Human evaluators can swiftly identify and address discrepancies without the need for exhaustive image analysis.
- Seamless Communication: Visual highlighting promotes clearer communication between automated systems and human auditors. Serving as a visual guide, it enhances collaboration, ensuring that the subtleties captured by computer vision models are effectively conveyed.
The Way Forward

As technology continues to evolve, the integration of visual highlighting methodologies is likely to become more sophisticated. Artificial intelligence and machine learning algorithms may play an even more prominent role in not only detecting differences but also in refining the visual highlighting process.

The collaboration between human auditors and AI ensures a comprehensive approach to maintaining brand integrity in the ever-expanding digital marketplace. By visually highlighting differences in images, brands can safeguard their visual identity, foster consumer trust, and deliver a consistent and reliable online shopping experience. In the intricate dance between technology and human intuition, visual highlighting emerges as a powerful tool, paving the way for brands to uphold their image with precision and efficiency.

To learn more, reach out to us today!

(This article was co-authored by Apurva Naik)
March 4, 2024
AI-powered Product Matching: The Key to Competitive Pricing Intelligence in eCommerce
With thousands of products and hundreds of online retailers to choose from, the average modern-day shopper usually compares prices across several e-commerce sites effortlessly before often settling for the lowest priced option. As a result, retailers today are forced to execute millions of price changes per day in a never-ending race to be the lowest priced – without losing out on any potential margin.

Identifying, classifying, and matching products is the first step to comparing prices across websites. However, there is no standardization in the way products are represented across e-commerce websites, causing this process to be fairly complex.

Here’s an example:

What’s needed is a pricing intelligence solution that first matches products across several websites swiftly and accurately, and then enables automated tracking of competitor pricing data on an ongoing basis.

Pricing intelligence solutions already exist. What’s wrong with using them?

There are several challenges with the incumbent solutions in the market – the biggest one being that they don’t work in a timely manner. In essence, it’s like deferring the process of finding actionable information that helps retailers acquire a competitive advantage, and instead doing it in hindsight. Like an autopsy of sorts.

Here are the various solution types we have in the market today:
- Internally developed systems – Solutions developed by retailers themselves often rely on heavy manual data aggregation and have poor product matching capabilities. Since these solutions have been developed by professionals not attuned to building data crunching machines, they pose significant operational challenges in the form of maintenance, updates, etc.
- Web scraping solutions – These solutions have no data normalization or product matching capabilities, and lack the power to deliver relevant actionable insights. What’s more, it’s a struggle to scale them up to accommodate massive volumes of data during peak times such as promotional campaigns.
- DIY solutions – These solutions require manual research and entry of data. It goes without saying that due to the level of human intervention and effort required, they’re expensive, difficult to scale, slow, and of questionable accuracy.
As common as it is nowadays, AI has the answer

DataWeave’s competitive pricing intelligence solution is designed to help retailers achieve precisely the competitive advantage they need by providing them with accurate, timely, and actionable pricing insights enabled by matching products at scale. We provide retailers with access to detailed pricing information on millions of products across competitors, as frequently as they need it.

Our technology stack broadly consists of the following.

1. Data Aggregation

At DataWeave, we can aggregate data from diverse web sources across complex web environments – consistently and at a very high accuracy. Having been in the industry for close to a decade, we’re sitting on a lot of data that we can use to train our product matching platform.

Our datasets include data points from tens of millions of products and have been collected from numerous geographies and verticals in retail. The datasets contain hierarchically arranged information based on retail taxonomy. At the root level, there’s information such as category and subcategory, and at the top level, we have product details such as title, description, and other <attribute, value> relationships. Our machine learning architectures and semi-automated training data building systems, augmented by the skills of a strong QA team, help us annotate the necessary information and create labeled datasets using proprietary tools.

2. AI for Product Matching

Product matching at DataWeave is done via a unified platform that uses both text and image recognition capabilities to accurately identify similar SKUs across thousands of e-commerce stores and millions of products. We use an ensemble deep learning architectures tailored to NLP and Computer Vision problems specific to us and heuristics pertinent to the Retail domain. Products are also classified based on their features, and a normalization layer is designed based on various text/image-based attributes.

Our semantics layer, while technically an integral part of the product matching process, deserves particular mention due to its powerful capabilities.

The text data processing consists of internal, deep pre-trained word embeddings. We use state-of-the-art, customized word representation techniques such as ELMO, BERT, and Transformer to capture deeply contextualized text with improved accuracy. A self-attention/intra-attention mechanism learns the correlation between the word in question and a previous part of the description.

Image data processing starts with object detection to identify the region of interest of a given product (for example, the upper body of a fashion model displaying a shirt). We then leverage deep learning architectures such as VggNet, Inception-V3, and ResNet, which we have trained using millions of labeled images. Next, we apply multiple pre-processing techniques such as variable background removal, face removal, skin removal, and image quality enhancing and extract image signatures via deep learning and machine learning-based algorithms to uniquely identify products across billions of indexed products.

Finally, we efficiently distribute billions of images across multiple stores for fast access, and to facilitate searches at a massive scale (in a matter of milliseconds, without the slightest compromise on accuracy) using our image matching engine.

3. Human Intelligence in the Loop

In scenarios where the confidence scores of the machine-driven matches are low, we have a team of Quality Assurance (QA) specialists who verify the output.

This team does three things:
- Find out why the confidence score is low
- Confirm the right product matches
- Figure out a way to encode this knowledge into a rule and feed it back to the algorithm
In this way, we’ve built a self-improving feedback loop which, by its very nature, performs better over time. This system has accumulated knowledge over the 8 years of our operations, which is going to be hard for anyone to replicate. Essentially, this process enables us to match products at massive scale quickly and at very high levels of accuracy (usually over 95%).

4. Actionable Insights Via Data Visualization

Once the matching process is completed, the prices are aggregated at any frequency, enabling retailers to optimize their prices on an ongoing basis. Pricing insights are typically consumed via our SaaS-based web-portal, which consists of dashboards, reports, and visualizations.

Alternatively, we can integrate with internal analytics platforms through APIs or generate and deliver spreadsheet reports on a regular basis, depending on the preferences of our customers.

To summarize

The benefits of our solution are many. Detailed price improvement opportunity-related insights generated in a timely manner empower retailers to significantly enhance their competitive positioning across categories, product types, and brands, as well as ability to influence their price perception among consumers. These insights, when leveraged at a higher granularity over the long term, can help maximize revenue through price optimization at a large scale.

Our solution also helps drive process-based as well as operational optimizations for retailers. Such modifications help them better align themselves to effectively adopt a data-driven approach to pricing, in turn helping them achieve much smarter retail operations across the board.

All of this wouldn’t be possible if the product matching process, inherent to this system, was unreliable, expensive, or time-consuming.

If you would like to learn more about DataWeave’s proprietary product matching platform and the benefits it offers to eCommerce businesses and brands, talk to us now!
December 29, 2023
How Artificial Intelligence is giving the Indian Beauty Industry a Facelift

With the help of artificial intelligence and machine learning, beauty and cosmetics companies are exploring new possibilities. According to a report by Avendus, the global beauty and personal care market are expected to touch US$725 billion by 2025 and the young Indian market is expected to grow to $28 billion by then. This segment is a space of opportunity and today we have more than 80 Indian brands in this domain.

D2C beauty brand logos

While technology in this space plays a very important role, Artificial Intelligence (AI) amongst everything else is giving the beauty industry a makeover. This is because, AI can create an impact on all stages of the beauty value chain — from research & development to supply chain management to product selection, marketing, and more! Resonating this thought, Chaitanya Nallan, CEO & Co-Founder, SkinKraft Laboratories mentions “As a digital-first brand, we sell across multiple e-commerce platforms as well as through our own website. Thus, it is very important that we track and maintain inventory across all channels in real-time to avoid stock-outs and loss of sales. We use AI for this. We have built an in-house data tracking dashboard that pulls in inventory information from all warehouses and maps them against sales to give us an accurate estimate of days of inventory across all SKUs and across all platforms. This information directly feeds into our procurement dashboard and also helps the marketing team to create the right sales strategy.”

Stock availability is crucial to driving sales. If you need help tracking your online inventory – DataWeave can help give you a near real-time view of your product stock status across marketplaces.

With AI being a powerful technology wand, here is how it can drive the future of beauty brands within the D2C segment in India.

Making Virtual Product trials a reality

Virtual Product Trial

Augmented Reality (AR) is a prevalent term and many companies are already using it on an everyday basis. More commonly, the Snapchat and Instagram filters we use are all powered by AR. In a similar vein, virtual images can be laid over actual images in real-time using AI. And keeping this concept handy, beauty brands are bringing to the front the AR-powered ‘virtual mirrors’ that let consumers try on cosmetic products in real-time. Modiface by L’Oréal is a perfect example of VR-mirrors, which has pioneered the AR-powered makeup try-ons in the market. These virtual mirrors use AI algorithms to detect the user’s face through a camera by focal points and map the face. Then using AR, images of makeup are adjusted according to the terms obtained and overlaid over the features on the face giving consumers a virtual feel of what they’d look like wearing the product.

Virtual Try-on

Much recently, Indian brand Lakme has made ‘virtual try on’ possible by creating a smart mirror on its official website that allows customers to watch their reflection, try on different shades, and customize those shades according to their preferences. Shade matching until a few years back was an entirely on-ground phenomenon and customers visiting a local cosmetics store were able to choose and match the shade of compact, eye shadow, and lipstick against their true skin tone. Today AI can allow you to narrow down on products based on a virtual shade card, put them against your skin in real-time.

Make it Truly Personalised

Every customer is unique, and one size does not fit all. Everyone has a personalized beauty regime they follow & understanding this could be the key to success for beauty brands. For this reason, the future of beauty lies in harnessing AI and AR solutions to tailor the beauty shopping experience to match the needs of the individual consumer. This not only enhances digital engagement but also increases purchasing confidence which in turn helps brands drive conversion and brand loyalty.

Pre-pandemic, offline beauty advisors played a consultative role when customers were making purchase decisions. A lot of this has moved online – take for instance Olay. It launched an online “Skin Advisor” app based on a deep-learning algorithm that analyses a consumer’s skin using a simple selfie! Armed with information on their skin type, customers can make an informed, personalized purchase that’s right for their specific skin type.

Skin Advisor App

Understanding customer preferences and using data from their past purchases also help with personalized marketing in a big way. “Data-driven personalization gives brands insight into what their customers are interested in. We integrate this data into our marketing campaigns and deliver specific, personalized, and relevant content. This way, we make sure to target the right audience with the right messaging. This, in turn, helps us increase engagement and retain customers. Moreover, this combined data, allows us to get repeat sales through upselling and cross-selling. Further, knowing customers beyond just simple demographics helps us improve our targeting and helps us predict future behaviour. We’d like to know, for instance, if a customer clicked on our advertisement, liked, or commented on our social media product displays, signed up to our email list, etc. These analytics reveal a customer’s interest. Combine it with demographics – and you get a sense of what the customer is interested in,” Dhruv Madhok, Co-Founder, ARATA highlights.

Boost Product Development

Social listening

AI algorithms can be used to study and analyse customer feedback. The algorithm works towards interpreting customer comments, reviews, and feedback on a brand’s website, social media channels, and other online platforms. Artificial Intelligence can also decode and analyse questionnaires and feedback forms that the customers may have responded to online or offline.

The beauty and personal care industry is largely driven by usage and customer preferences, so gauging how customers feel about key products can help businesses create & develop products that customers will most likely prefer to buy. For instance, reputed beauty brand Avon recently mentioned that it developed the True 5-in-1 Lash Genius Mascara based on actual consumer feedback! They used machine learning & artificial intelligence to read, filter, process & rank thousands of online consumer comments to determine the top features they crave in a mascara. Using this customer gathered intelligence, they developed a unique product that consumers we’re “asking for”!

True 5-in-1 Lash Genius Mascara by beauty brand Avon

Need help listening to what your consumers are saying about your brand online? Read more about DataWeave’s AI Powered Sentiment Analysis solution.

More and more brands are listening to customer responses closely to give way to new products, bring in tweaks to their existing basket, and innovate further. “Our ORM team is leading the knowledge accumulation as far as social listening is concerned. They are not just responsible for responding to customer queries, they are also instrumental in highlighting key insights based on user behaviour being observed,” Chaitanya of SkinKraft Laboratories further asserts.

Bombay Shaving Company too with its data-centric culture leverages customer responses for decision making & product development. “In-home personal care and hygiene exploded during the pandemic. We used data analytics to explore different dimensions of in-home experience-driven needs (new usage occasions, need for convenience and DIY, etc.). We listened to our customers & were able to introduce our women’s brand, with innovative hair removal products in a big way during this period. Which today contributes to a significant percentage of our business,” Shantanu Deshpande, Founder & CEO, Bombay Shaving Company mentions.

Given the scope and scale of the beauty and personal care industry that is major ‘usage’ driven, Artificial Intelligence with its diverse potential can bring a paradigm shift in the industry. AI can help not only with virtual trials, personalization, listening in to customers’ feedback but also with monitoring a brand’s Digital Shelf. Brands can amplify their online sales by tracking Digital Shelf KPIs like share of search & product visibility, pricing & discounting, product content, availability & assortment. Reach out to our Digital Shelf experts to learn more.

November 17, 2021
AI-Driven Mapping of Retail Taxonomies- Part 2
Mapping product taxonomies using Deep Learning

In Part 1 we discussed the importance of Retail taxonomy and the applications of mapping retail taxonomies in Assortment Analytics, building Knowledge Graph, etc. Here, we will discuss how we approached the problem of mapping retail taxonomies across sources.

We solved this problem by classifying every retail product to a standard DataWeave defined taxonomy so that products from different websites could be brought at the same level. Once these products are at the same level, mapping taxonomies becomes straightforward.

We’ve built an AI-based solution that uses state-of-the-art algorithms to predict the correct DataWeave Taxonomy for a product from its textual information like Title, Taxonomy and Description. Our model predicts a standard 4 level (L1-L2-L3-L4) taxonomy for any given product. These Levels denote Category, Sub Category, Parent Product Type and Product Type respectively.

Approach

Conventional methods for taxonomy prediction are typically based on machine learning classification algorithms. Here, we need to provide textual data and the classifier will predict the entire taxonomy as a class.

We used the classification approach as a baseline, but found a few inherent flaws in this:
- A Classification model cannot understand the semantic relation between input text and output hierarchy. Which means, it cannot understand if there’s any relation between the textual input and the text present in the taxonomy. For a classifier, the output class is just a label encoded value
- Since the taxonomy is a tree and each leaf node uniquely defines a path from the root to leaf, the classification algorithms effectively output an existing root-to-leaf path. However, it cannot predict new relationships in the tree structure
- Let’s say, our training set has only the records for “Clothing, Shoes & Jewelry > Men > Clothing > Shorts” and “Clothing, Shoes & Jewelry > Baby > Shoes > Boots”, Example:
{‘title’: “Russell Athletic Men’s Cotton Baseline Short with Pockets – Black – XXX-Large”,

‘dw_taxonomy’: “ Clothing, Shoes & Jewelry > Men > Clothing > Shorts”},

{‘title’:” Surprise by Stride Rite Baby Boys Branly Faux-Leather Ankle Boots(Infant/Toddler) – Brown -”,

’dw_taxonomy:” Clothing, Shoes & Jewelry > Baby > Shoes > Boots”}

Now, if a product with Title “Burt’s Bees Baby Baby Boys’ Terry Short” comes for prediction, then the classifier will never be able to predict the correct taxonomy. Although, it would have seen the data points of Shorts and Baby.

E-commerce product taxonomy has a very long tail, i.e. there’s a huge imbalance in counts of data per taxonomy. Classification algorithms do not perform well for very long tail problems.

Encoder-Decoder with Attention for Taxonomy Classification

What is Encoder-Decoder?

Encoder-Decoder is a classical Deep Learning architecture where there are two Deep Neural Nets, an Encoder and a Decoder linked with each other to generate desired outputs.

The objective of an Encoder is to encode the required information from the input data and store it in a feature vector. In case of text input, the encoder is mostly an RNN or Transformer based architecture and for image input, it is mostly a CNN-based architecture. Once the encoded feature vector is created, the Decoder uses it to produce the required output. The Encoder and Decoder can be interfaced by another layer which is called Attention. The Role of Attention mechanism is to train the model to selectively focus on useful parts of the input data and hence, learn the alignment between them. This helps the model to cope effectively with long input sentences (when dealing with text) or complex portions of images (when input is an image).

Instead of classification-based approaches, we use an Encoder-Decoder architecture and map the problem of taxonomy classification to the task of machine translation (MT) AKA, Seq2Seq. An MT system takes the text in one language as input and outputs its translation as a sequence of words in another language. In our case, the input maps to the textual description of a product, and the output maps to the sequence of categories and sub-categories in our taxonomy (e.g., Clothing, Shoes & Jewelry > Baby > Shoes > Boots). By framing taxonomy classification as an MT problem, we overcome a lot of limitations present in classical classification approaches.
- This architecture has the capability to predict a taxonomy that is not even present in the training data.
  - Talking about the example we discussed earlier where a traditional classification model was not able to predict the taxonomy for “Baby Boys knit terry shorts – cat & jack gray 12 m”, this Encoder-decoder model easily predicts the correct taxonomy as “ Clothing, Shoes & Jewelry > Baby > Clothing > Shorts”
- We achieved a much higher accuracy because the model understands the semantic relationship between the input and output text, as well as giving attention to the most relevant parts in the input, when generating the output
Fig. Attention visualization for product title “South of France lavender fields Bar Soap”. It can be seen from the image that the attention weights of “soap” word is very high when predicting the output at different time-steps.

We used pre-trained fasttext word embeddings to vectorize textual input, pass on to the GRU-RNN based encoder which processes the input sequentially, and generates the final encoded vector. The Decoder which is also a GRU-RNN takes this encoded input and generates the output sequentially. Along with the encoded vector, there is also an attention vector which is passed to the Decoder for the output at every time-step.

We trained both the Classification model (Baseline) and the Encoder-Decoder model for the Fashion category and the Beauty & Personal Care category.

For Fashion, we trained the model with 170,000 data points and validated it on a 30k set. For Beauty Category, we trained the model on 88k data points and validated it on a 20k set. We were able to achieve 92% Seq2Seq accuracy in 1,240 classes for the Fashion category and 96% Seq2Seq accuracy in 343 classes for the Beauty Category, using the Encoder-Decoder approach.

Summary and the Way Forward

Since we moved to this approach, we have seen drastic improvements in the accuracy of our Assortment Intelligence accounts. But the road doesn’t end here. There are several challenges to be tackled and worked upon. We’re planning on making this process language agnostic by using cross-lingual embeddings, merging models from different categories and also using product Image to complement the text-based model with visual input via a Multi-Modal approach.

References

Don’t Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation by Maggie Yundi Li, Stanley Kok and Liling Tan

SIGIR eCom’18 Data Challenge is organized by Rakuten Institute of Technology Boston (RIT-Boston)

Massive Exploration of Neural Machine Translation Architectures by Denny Britz, Anna Goldie, Minh-Thang, and Luong Quoc Le
January 13, 2021
Mapping eCommerce Product Taxonomy with AI Pt. 1
Product Taxonomy and its importance in retail

Every product on a retail website is categorized in such a way that it denotes where the product belongs in the entire catalog. Generally, these categorizations follow a hierarchy that puts the product under some Category, Subcategory and Product Type (Ex. Clothing, Shoes & Jewelry > Men > Clothing > Shirts). We call this hierarchical eCommerce product categorization as Product Taxonomy. Categorizing products in a logical manner – in a way a shopper would find intuitive, helps in navigation when he or she is browsing an e-commerce website.

In addition, with a good category organization, a product lends itself for better searchability (for search engines) on e-commerce websites. Search engines work by looking up query terms in an index which points to products which contain those terms. Matches in various fields are ranked differently in relevance.

For instance, a term that matches a word in the title, indicates greater relevance compared to one which matches the description. Additionally, terms that are exclusive to certain products, signal greater selectivity and hence contribute more to ranking. In light of this, the choice of words in fields indicating a product’s category affects the relevance of search results for a user query. This improves discoverability and as relevant results show up, it in turn improves the user experience. A good product taxonomy contributes to increased sales by helping shoppers find relevant products while browsing or searching.

Retail websites organize products into a taxonomy which they deem intuitive for their users, and fits the organization of their business units. Different retail websites could thus have taxonomies varying significantly from each other. Since we deal with millions of products across hundreds of websites on a daily basis, we often have to work with various taxonomies for the same product coming from different websites.

We are required to align these to a common standard taxonomy for our analyses. Standard taxonomies like Global Product Classification (GPC) taxonomies and Google Product taxonomies offer a standard way of representing a product. However, none of these taxonomies are complete and generic. Hence, we at DataWeave have come up with our own Standard Taxonomies for each category in e-commerce, which are generic enough to represent products on websites across different geographies.

Having a standard taxonomy for each retail product is important for our Data Orchestration pipeline. A Standard Taxonomy helps in enriching the DataWeave Retail Knowledge Graph at scale.

DataWeave’s Retail Knowledge Graph

The information about products on most of the retail websites is unstructured and broken. We process this unstructured data, derive structured information from it and store it in a connected format in our Knowledge Graph. The Knowledge Graph is used in downstream applications like Attribute Tagging, Content Analysis, etc. The Knowledge Graph follows a standard hierarchy of 4 levels (L1 > L2 > L3 > L4) for all the retail products.

Mapping eCommerce retail taxonomies is not only a requirement for the Knowledge Graph, but has some direct business applications as well:

Assortment Analytics
- Mapping competitors’ products to their own taxonomies help retailers understand the exact gap in their assortment, regardless of how competitors are categorizing their products
- Let’s say a retailer is interested in knowing the assortment of a product type, Scented Candles in their competitor’s catalog. Now, the retailer might have categorized it as Home & Kitchen > Home Decor > Scented Candle but the same product type could have been categorized as Fragrance > Home > Candles on a competitor’s website. Here, having an efficient and scalable mechanism to map product taxonomies provides accurate assortment analytics which retailers look for. Example:
Health & Household > Health Care > Alternative Medicine > Aromatherapy > Candles

Fragrance > Candles & Home Scents > Candles

Automated Catalog Suggestion

It is also used in Catalog Suggestion as a Service, where for any product we suggest the appropriate taxonomy it should follow on the website for a better browsing experience.

Stay tuned to Part-2 to know how we are solving the problem of mapping various retail taxonomies.

Click here to know more about assortment analytics
January 6, 2021
Compete Profitably in Retail: Leveraging AI-Powered Competitive Intelligence at Massive Scale

AI is everywhere. Any retailer worth his salt knows that in today’s hyper-competitive environment, you can’t win just by fighting hard – you have to do it by fighting smart. The solution? Retailers are turning to AI in droves.

The problem is that many organizations regard AI as a black box of sorts – where you can throw all your data (the digital era’s blessing that feels like a curse) in at one end and have miraculously meaningful output appearing out the other. The reality of how AI works, however, is a lot more complex. It takes a lot of work to make AI work for you – and then to derive value out of it.

Image Source: https://xkcd.com/1838

Following the advent of the digital era, businesses across industries, particularly retail, were left grappling with massive amounts of internal data. To make things worse, this data was unstructured and siloed, making it difficult to process effectively. Yet, businesses learned to leverage simple analytics to extract relevant data and insights to affect smarter decisions.

But just as that happened, the e-commerce revolution stirred things up again. As businesses of all shapes, sizes, and types moved online, they suddenly became a whole lot more vulnerable to other players’ movements than they were just about a decade ago, when buyers rarely visited more than one store before they made a purchase. In other words, retailers are now operating in entire ecosystems – with consumers evaluating a number of retailers before making a purchase, and a disproportionate number of players vying for the same consumer mindshare and share of wallet.

Thus, external data from the web – the largest source of data known to man at present – is becoming critical to business’ ability to compete profitably in the market.

Competing profitably in the digital era: Can AI help?

As organizations across industries and geographies increasingly realized that their business decisions were affected by what’s happening around them (such as competitors’ pricing and merchandize decisions), they started shifting away from their excessive obsession with internal data, and began to look for ways to gather external data, integrate it with their internal data, and process it all in entirety to derive wholesome, meaningful insights.

Simply put, harnessing external data consistently and on a large scale is the only way for businesses to gain a sustainable competitive advantage in the retail market. And the only way to practically accomplish that is with the help of AI. Many global giants are already doing this – they’re analyzing loads of external data every minute to take smarter decisions.

That said, though, what you need to know is that all this data, while publicly available and therefore accessible, is massive, unstructured, noisy, scattered, dynamic, and incomplete. There’s no algorithm in the world that can start working on it overnight to churn out valuable insights. AI can only be effective if enormous amounts of training data is constantly fed back into it, coaxing it to get better and more astute each time. However, given the scarcity of readily available training datasets, limited and unreliable access to domain-specific data, and the inconsistent nature of the data itself, a majority of AI initiatives have ended up in a “garbage in, garbage out” loop that they can’t break out of.

What you need is the perfect storm

At DataWeave, we understand the challenge of blindly dealing with data at such a daunting scale. We get that what you need is a practical way to apply AI to the abundant web data out there and generate specific, relevant, and actionable insights that enable you to make the right decisions at the right time. That’s why we’ve developed a system that runs on a human-aided-machine-intelligence driven virtuous loop, ensuring better, sharper outcomes each time.

Our technology platform includes four modules:

1. Data aggregation: Here, we capture public web data at scale – whatever format, size, or shape it’s in – by deploying a variety of techniques.

2. AI-driven analytics: Since the gathered data is extremely raw, it’s cleaned, curated, and normalized to remove the noise and prepare it for the AI layer, which then analyzes the data and generates insights.

3. Human-supervised feedback: Though AI is getting smarter with time, we see that it’s still far from human cognitive capabilities – so we’ve introduced a human in the loop to validate the AI-generated insights, and use this as training data that gets fed back to the AI layer. Essentially, we use human intelligence to make AI smarter.

4. Data-driven decision-making: Once the data has been analyzed and the insights generated, they can either be used as it to drive decision-making, or then integrated with internal data for decision-making at a higher level.

With intelligent, data-backed decision-making capabilities, you can outperform your competitors

Understandably, pricing is one of the most popular applications of data analytics in retail. For instance, a leading, US-based online furniture retailer approached us with the mission-critical challenge of pricing products just right to maximize sell-through rates as well as gross margin in a cost-effective and sustainable manner. We matched about 2.5 million SKUs across 75 competitor websites using AI and captured pricing, discounts, and stock status data every day. As a result, we were able to affect an up to 30% average increase in the sales of the products tracked, and up to a 3x increase in their gross margin.

DataWeave’s powerful AI-driven platform is essentially an engine that can help you aggregate and process external data at scale and in near-real time to manage unavoidably high competition and margin pressures by enabling much sharper business decisions than before. The potential applications for the resulting insights are diverse – ranging from pricing, merchandize optimization, determination of customer perception, brand governance, and business performance analysis.

If you’d like to learn more about our unique approach to AI-driven competitive intelligence in retail, reach out to us for a demo today!

June 13, 2019
Evaluating the Influence of Learning Models

Natt Fry, a renowned thought leader in the world of retail and analytics, published recently an article expounding the value and potential of learning models influencing business decision-making across industries over the next few years.

He quotes a Wall Street Journal article (paywall) published by Steven A. Cohen and Matthew W. Granade who claim that, “while software ate the world the past 7 years, learning models will ‘eat the world’ in the next 7 years.”

The article defines a learning model as a “decision framework in which the logic is derived by algorithm from data. Once created, a model can learn from its successes and failures with speed and sophistication that humans usually cannot match.”

Narrowing this down to the world of retail, Natt states, “if we believe that learning models are the future, then retailers will need to rapidly transform from human-learning models to automated-learning models.”

This, of course, comes with several challenges, one of which is the scarcity of easily consumable data for supervised learning algorithms to get trained on. This scarcity often results in a garbage-in-garbage-out situation and limits the ability of AI systems to improve in accuracy over time, or to generate meaningful output on a consistent basis.

Enabling Retailers Become More Model-Driven
As a provider of Competitive Intelligence as a Service to retailers and consumer brands, DataWeave uses highly trained AI models to harness and analyze massive volumes of Web data consistently.

Far too often, we’ve seen traditional retailers rely disproportionately on internal data (such as POS data, inventory data, traffic data, etc.) to inform their decision-making process. This isn’t a surprise, as internal data is readily accessible and likely to be well structured.

However, if retailers can harness external data at scale (from the Web — the largest and richest source of information, ever), and use it to generate model-driven insights, they can achieve a uniquely holistic perspective to business decision-making. Also, due simply to the sheer vastness of Web data, it serves as a never-ending source of training data for existing models.

DataWeave’s AI-based model to leverage Web data

Web data is typically massive, noisy, unstructured, and constantly changing. Therefore, at DataWeave, we’ve designed a proprietary data aggregation platform that is capable of capturing millions of data points from complex Web and mobile app environments each day.

We then apply AI/ML techniques to process the data into a form that can be easily interpreted and acted on. The human-in-the-loop is an additional layer to this stack which ensures a minimum threshold of output accuracy. Simultaneously, this approach feeds information on human-driven decisions back to the algorithm, thereby rendering it more and more accurate with time.

Businesses derive the greatest value when external model-based competitive and market insights are blended with internal data and systems to generate optimized recommendations. For example, our retail customers combine competitor pricing insights provided by our platform with their internal sales and inventory data to develop algorithmic price optimization systems that maximize revenue and margin for millions of products.

This way, DataWeave enables retailers and consumer brands to utilize a unique model-based decision framework, something that will soon be fundamental (if not already) to business decision-making across industry verticals and global regions.

As AI-based technologies become more pervasive in retail, it’s only a matter of time before they’re considered merely table stakes. As summarized by Natt, “going forward, retailers will be valued on the completeness of the data they create and have access to.”

If you would like to learn more about how we use AI to empower retailers and consumer brands to compete profitably, check out our website!

Read Natt’s article in full below:

Steven A. Cohen and Matthew W. Granade published a very interesting article in the Wall Street Journal on August 19, 2018 — https://www.wsj.com/articles/models-will-run-the-world-1534716720

Their premise is that while software ate the world (Mark Andreessen essay in 2011, “Why Software is Eating the World”) the past 7 years, learning models will “eat the world” in the next 7 years.

A learning model is a decision framework in which the logic is derived by algorithm from data. Once created, a model can learn from its successes and failures with speed and sophistication that humans usually cannot match.

The authors believe a new, more powerful, business opportunity has evolved from software. It is where companies structure their business processes to put continuously learning models at their center.

Amazon, Alibaba, and Tencent are great examples of companies that widely use learning models to outperform their competitors.

The implications of a model-driven world are significant for retailers.

Incumbents can have an advantage in a model-driven world as they already have troves of data.

Going forward retailers will be valued on the completeness of the data they create and have access to.

Retailers currently rely on the experience and expertise of their people to make good decisions (what to buy, how much to buy, where to put it, etc.).

If we believe that learning models are the future then retailers will need to rapidly transform from human-learning models to automated-learning models, creating two significant challenges.

First, retailers have difficulty in finding and retaining top learning-model talent (data scientists).

Second, migrating from human-based learning models to machine-based learning models will create significant cultural and change management issues.

Overcoming these issues is possible, just as many retailers have overcome the issues presented by the digital age. The difference is, that while the digital age has developed over a 20 year period, the learning-model age will develop over the next 7 years. The effort and pace of change will need to be much greater.

October 11, 2018
Recognize Product Attributes with AI-Powered Image Analytics
Anna is a fashionista and a merchandise manager at a large fast-fashion retailer. As part of her job, she regularly browses through the Web for the most popular designs and trends in contemporary fashion, so she can augment her product assortment with fresh and fast-moving products.

She spots a picture on social media of a fashion blogger sporting a mustard colored, full-sleeved, woolen coat, a yellow sweatshirt, purple polyester leggings, and a pair of pink sneakers with laces. She finds that the picture has garnered several thousand “likes” and several hundred “shares”. She also sees that a few other online fashion influencers have blogged about similar styles in coats and shoes being in vogue.

Anna thinks it’s a good idea to house a selection of similar clothing and accessories for the next few weeks, before the trend dies down.

But, she is in a bit of a pickle.

Different brands represent their catalog differently. Some have only minimalistic text-based product categorization, while others are more detailed. The ones that are detailed don’t categorize products in a way that helps her narrow down her consideration set. Product images, too, lack standardization as each brand has its own visual merchandising norms and practices.

Poring through thousands of products across hundreds of brands, looking for similar products is time-consuming and debilitating for Anna, restricting her ability to spend time on higher-value activities. Luckily, at DataWeave, we’ve come across several merchandise managers facing challenges like hers, and we can help.

AI-powered product attribute tagging in fashion

DataWeave’s AI-powered, purpose-built Fashion Tagger automatically assigns labels to attributes of fashion products at great granularity. For example, on processing the image of the blogger described earlier, our algorithm generated the following output.

Original Image Source: Rockpaperdresses.dk

Vision beyond the obvious

Training machines is hard. While modern computers can “see” as well as any human, the difference lies in their lack of ability to perceive or interpret what they see.

This can be compared to a philistine at a modern art gallery. While he or she could quite easily identify the colors and shapes in the paintings, additional instructions would be needed on how the painting can be interpreted, evaluated, and appreciated.

While machines haven’t gotten that far yet, our image analytics platform is highly advanced, capable of identifying and interpreting complex patterns and attributes in images of clothing and fashion accessories. Our machines recognize various fashion attributes by processing both image- and associated text-based information available for a product.

Here’s how it’s done:
- With a single glance of its surroundings, the human eye can identify and localize each object within its field of view. We train our machines to mimic this capability using neural-network-based object detection and segmentation. As a result, our system is sensitive to varied backgrounds, human poses, skin exposure levels, and more, which are quite common for images in fashion retail.
- The image is then converted to 0s and 1s, and fed into our home-brewed convolutional neural network trained on millions of images with several variations. These images were acquired from diverse sources on the Web, such as user-generated content (UGC), social media, fashion shows, and hundreds of eCommerce websites around the world.
- If present, text-based information associated with images, like product title, metadata, and product descriptions are used to enhance the accuracy of the output and leverage non-visual cues for the product, like the type of fabric. Natural-language processing, normalization and several other text processing techniques are applied here. In these scenarios, the text and image pipelines are merged based on assigned weightages and priorities to generate the final list of product attributes.
The Technology Pipeline

Our Fashion Tagger can process most clothing types in fashion retail, including casual wear, sportswear, footwear, bags, sunglasses and other accessories. The complete catalog of clothing types we support is indicated in the image below.

Product Types Processed and Classified by DataWeave

One product, several solutions

Across the globe, our customers in fast-fashion wield our technology every day to compare their product assortment against their competitors. Our SaaS-based portal provides highly granular product-attribute-wise comparisons and tracking of competitors’ products, enabling our customers to spot assortment gaps of in-demand and trending products, as well as to better capitalize on the strengths in their assortment.

Some other popular use cases include:
- Similar product recommendations: This intelligent product recommendation engine can help retailers identify and recommend to their shoppers, products with similar attributes to the one they’re looking at, which can potentially help drive higher sales. For example, they can recommend alternatives to out-of-stock products, so customers don’t bounce off their website easily.
- Ensemble recommendations: Our proprietary machine-learning based algorithms analyze images on credible fashion blogs and websites to learn the trendiest combinations of products worn by online influencers, helping retailers recommend complementary products and drive more value. Combining this with insights on customer behavior can generate personalized ensemble recommendations. It’s almost like providing a personal stylist for shoppers!
- Diverse styling options: The same outfit can often be worn in several different ways, and shoppers typically like to experiment with unconventional modes of styling. Our technology helps retailers create “lookbooks” that provide real world examples of multiple ways a particular piece of clothing can be worn, adding another layer to the customer’s shopping experience.
- Search by image: Shoppers can search for products similar to ones worn by celebrities and other influencers through an option to “Search by Image”, which is possible due to our technology’s ability to automatically identify product attributes and find similar matches.
- Fast-fashion trend analysis: Retailers can study emerging trends in fashion and host them in their product assortment before anyone else.
The devil is in the details

DataWeave’s Fashion Tagger guarantees very high levels of accuracy. Our unique human-in-the-loop approach combines the power of machine-learning-based algorithms with human intelligence to accurately differentiate between similar product attributes, such as between boat, scoop and round necks in T-shirts.

This system is a closed feedback loop, in which a large amount of ground-truth (manually verified) data is generated by in-house teams, which power the algorithms. In this way, the machine-generated output gets more and more accurate with time, which goes a long way in our ability to swiftly deliver insights at massive scale.

In summary, DataWeave’s Image Analytics platform is driven by: enormous amount of training data + algorithms + infrastructure + humans-in-loop.

If you’re intrigued by DataWeave’s technology and wish to know more about how we help fashion retailers compete more effectively, check us out on our website!
April 16, 2018
Alibaba’s Singles Day Sale: Decoding the World’s Biggest Shopping Festival

$17.5 million every 60 seconds.

That’s the volume of sales Alibaba generated on 11.11, or Singles Day. This mammoth event, decisively the world’s biggest shopping day, dwarfed last years’ Black Friday and Cyber Monday combined.

This year, the anticipation around Singles Day was all-pervasive, and the sale was widely expected to break all records, as more than 60,000 global brands queued up to participate. By the end of the day, sales topped $25.3 billion, while shattering last year’s record by lunchtime.

It’s an astonishing feat of retailing, eight years in the making. When Alibaba first started 11.11 in 2009, they set out strategically to try and convert shopping into a sport, infusing it with a strong element of entertainment. “Retail as entertainment” is a unique central theme for 11.11 and this year Alibaba leveraged its media and eCommerce platforms in concert to create an entirely immersive experience for viewers and consumers alike.

From a technology perspective, the “See Now, Buy Now” fashion show and the pre-sale gala seamlessly merged offline and online shopping so viewers tuning in to both shows can watch them while simultaneously shopping via their phones or saving the items for a later date.

The eCommerce giant also collaborated with roughly 50 shopping malls in China to set up pop-up shops, eventually extending its shopper reach to span 12 cities.

Of course, attractive discounts on its eCommerce platforms were on offer as well.

Deciphering Taobao.com

At DataWeave, we have been analyzing the major sale events of several eCommerce companies from around the world. During Singles Day, when we trained our data aggregation and analysis platform on Taobao.com (Alibaba’s B2C eCommerce arm), and its competitors JD.com and Amazon.ch, our technology platform and analysts had to overcome two primary challenges:

1. All text on these websites were in Chinese

All information — names of products, brands, and categories — were displayed in Chinese. However, our technology platform is truly language agnostic, capable of processing data drawn from websites featuring all international languages. Several of our customers have benefited strategically from this unique capability.

2. Discounted prices were embedded in images on Taobao.com

While it’s normal for sale prices to be represented in text on a website (relatively easy to capture by our advanced data aggregation system), Taobao chose to display these prices as part of its product images — like the one shown in the adjacent image.

However, our technology stack comprises of an AI-powered, state-of-the-art image processing and analytics platform, which quickly extracted the selling prices embedded in the images at very high accuracy.

We analyzed the Top 150 ranked products of over 20 product types , spread across Electronics, Men’s Fashion, and Women’s Fashion, representing over 25,000 products in total, each day, between 8.11 and 12.11.

In the following infographic, we analyze the absolute discounts offered by Taobao on 11.11, compared to 8.11 (based on pricing information extracted from the product images using our image analytics platform), together with an insight into the level of premium products included in their mix for each product type, between the two days of comparison.

Unexpectedly, we noticed that each day, ALL the products in the Top 150 ranks differed from the previous day — a highly unique insight into Taobao’s unique assortment strategy.

Counter-intuitively, absolute discounts across all categories were considerably higher on 8.11 than on 11.11, even if it were for a marginally fewer number of products. The number of discounted Electronics products on sale rose on 11.11 compared to 8.11 (124 versus 102 respectively), while there was little movement in the number of discounted Men’s Fashion(55 versus 57) and Women’s Fashion (35 verses 27) products.

Taobao targeted the mobile phone and tablets segment with aggressive discounts (21.0 percent and 18.2 percent respectively), compared to the average Electronics discount level of 7.7 percent.

Interestingly, the average selling price drifted up for Electronics on 11.11 compared to 8.11 (¥4040 versus ¥3330). Men’s Fashion dropped to ¥584 from ¥604 while prices for Women’s Fashion was stable.

It’s clear that even with all the fanfare, Singles Day didn’t produce the level of discounts that one might have expected, indicating that purchases were driven as much by the hype surrounding the event as anything else.

How did Alibaba’s Competitors Fare?

While Taobao was widely expected to offer discounts during Alibaba’s major sale event, we looked at how its competitors JD.com and Amazon.ch reacted to Taobao’s strategy.

As over 80 percent of top-ranked products were consistently present in the Top 150 ranks of each product type on these websites, we analyzed the additional discounts offered during 11.11, compared to prices on 8.11.

Broadly speaking, both Amazon.ch and JD.com appear to have elected not to go head to head with Taobao on specific segments. JD.com’s discount strategy was spearheaded by Sports Shoes (22.1 percent) and Refrigerators (14.8 percent) while Amazon.ch featured TVs (15.3 percent) and Mobile Phones (10.2 percent).

The average additional discounts offered by Amazon.ch and JD.com in Electronics (8.4 percent) was slightly above Taobao’s overall absolute discount (7.7 percent). TCL was aggressive with its pricing on both websites, offering over 20% discount on almost its entire assortment.

Surprisingly, JD.com swamped Amazon.ch’s number of additionally discounted products, across all three featured categories although this may be partially explained by Amazon.ch electing to adopt a significantly more premium price position in both Men’s and Women’s Fashions compared to JD.com, while remaining roughly line ball on Electronics.

Jack Ma’s “New Retail”

Interestingly, JD.com wasn’t far behind Taobao in terms of sales, clocking up $20 billion in revenue, and sparking an interesting public debate between the two eCommerce giants extolling their respective performances.

Singles Day is one of the pillars of Jack Ma’s vision of a “New Retail” represented by the merging of entertainment and consumption. Ma’s vision sees the boundary between offline and online commerce disappearing as the focus shifts dramatically to fulfilling the personalized needs of individual customers.

Hence, Alibaba’s Global Shopping Festival should be understood as not just a one-day event that produces massive revenue, but as a demonstrable tour de force of Alibaba’s vision for the future of retail. One thing is certain — as competition heats up between Chinese retailers, we can be prepared for another Singles Day shoot-out sale next year that one-ups the staggering sales volumes this year.

If you’re intrigued by DataWeave’s technology, check out our website to learn more about how we provide Competitive Intelligence as a Service to retailers and consumer brands globally.

November 24, 2017
Video: Using Product Images to Achieve Over 90% Accuracy in Matching E-Commerce Products
Matching images is hard!

Images, intrinsically, are complex forms of information, with varying backgrounds, orientations, and noise. Developing a reliable system that achieves human-like accuracy in identifying, interpreting, and comparing images, without investing in expensive resources, is no mean task.

For DataWeave, however, the ability to accurately match images is fundamental to the value we provide to retailers and consumer brands.

Why Match Images?

Our customers rely on us for timely and actionable insights on their competitors’ pricing, assortment, promotions, etc. compared to their own. To enable this, we need to identify and match products across multiple websites, at very large scale.

One might hope to easily match products using just the product titles and descriptions on websites. However, therein lies the rub. Text-based fields are typically unstructured, and lack consistency or standardization across websites (especially for fashion products). In the following example, the same Adidas jacket is listed as “Tiro Warm-Up Jacket, Big Boys (8–20)” on Macy’s and “Youth Soccer Tiro 15 Training Jacket” on Amazon.

Hence, instead of using text-based information, we considered using deep-learning techniques to match the images of products listed on e-commerce websites. This, though, requires massive GPU resources and training data fed into the deep-learning model — an expensive proposition.

The solution we arrived upon, was to complement our image-matching system with the text-based information available in product titles and descriptions. Analyzing this combination of both text- and image-based information enabled us to efficiently match products at greater than 90% accuracy.

How We Did It

A couple of weeks ago, I gave a talk at Fifth Elephant, one of India’s renowned data science conferences. In the talk, I demonstrated DataWeave’s innovation of augmenting the NLP capabilities of Solr (a popular text search engine) with deep-learning features to match images with high accuracy.

Check out the video of the presentation for a detailed account of the system we built:

Human-Aided Machine Intelligence

All products matched with the seed product are tagged with a corresponding confidence score. When this score crosses a certain threshold, it’s presumed to be a direct match. The ones that are part of a lower range of confidence scores are quickly examined manually for possible direct matches.

The outcome, therefore, is that our technology narrows down the consideration set of possible product matches from a theoretical upper limit of millions of products, to only a few tens of products, which are then manually checked. This unique approach has two distinct advantages:
- The human-in-the-loop enables us to achieve greater than 90% accuracy in matching millions of products — a key differentiator.
- Information on all manually matched products is continually fed to the deep-learning model, which is used as training data, further enhancing the accuracy of the product matching mechanism. As a result, both our accuracy and delivery time keep improving with time.
As the world of online commerce continues to evolve and becomes more competitive, retailers and consumer brands need the ability to make quick proactive and reactive decisions, if they are to stay competitive. By building an automated self-improving system that matches products quickly and accurately, DataWeave enables just that.

Find out more about how retailers and consumer brands use DataWeave to better understand their competitive environment, optimize customer experience, and drive profitable growth.
August 9, 2017
Baahubali 2: Dissecting 75,000 Tweets to Uncover Audience Sentiments

Why did Katappa kill Baahubali?

Two years ago, not many would have foreseen this sentence capturing the imagination of the country like it has. Demolishing all regional barriers, the movie has grossed over INR 500 crores across the world in only its first three days.

While the first movie received lavish praise for its ambition, technical values, and story, the sequel, bogged by bloated expectations, has polarized the critics fraternity. Some critics compare the movie’s computer graphics favorably to Hollywood productions like Lord of the Rings. Others find the movie lacking in pacing and plot.

The masses, however, have reportedly lapped the movie up. Social media channels are brimming with opinions, and if one is to attempt finding out the aggregate views of audiences, Twitter is a good place to start.

At DataWeave, we ran our proprietary, AI-powered ‘Sentiment Analysis’ algorithm over all tweets about Baahubali 2 the first three days of its release, and observed some interesting insights.

Twitterati Reactions to Baahubali 2

Overall, the Twitterati’s views on the movie were overwhelmingly positive. We analysed over 75,000 tweets and identified the sentiments expressed on several facets of the movie, such as, Visuals, Acting, Prabhas, etc. The following graphic indicates how the movie fared in some of these categories.

The Baahubali team, Anushka (actor), Rajamouli (director), and Prabhas (actor), are all perceived as huge positive influences on the movie. Rajamouli, specifically, met with almost universal approval for his dedication and execution. Several viewers cheered the movie on as a triumph of Indian cinema, one which has redefined the cinema landscape of the country. There was considerable praise for the story, Rana (actor), and acting performances, as well.

The not-so-positive sentiments were reserved for the reason behind Katappa killing Baahubali (no spoilers!), the visuals, and the second half of the movie. Many viewers found the second half to be slow, with unrealistic visuals and action sequences. For example, one of the tweets read:

“First half was good, but the second half is beyond Rajnikanth movies: humans uprooting trees!”

While these insights seem simple enough to understand, the technology to filter inevitably chaotic online content and extract meaningful information is incredibly complex.

Unearthing Meaning from Chaos

At DataWeave, we provide enterprises with Competitive Intelligence as a Service by aggregating and analyzing millions of unstructured data points on the web, across multiple sources. This enables businesses to better understand their competitive environment and make data-driven decisions to grow their business.

One of our solutions — Sentiment Analysis — helps brands study customer preferences at a product attribute level by analyzing customer reviews. We used the same technology to analyze the reaction of audiences globally to Baahubali 2. After data acquisition, this process consists of three steps –

Step 1: Features Extraction

To identify the “features” that reviewers are talking about, we first understand the syntactical structure of the tweets and separate words into nouns, verbs, adjectives, etc. This needs to account for complexities like synonyms, spelling errors, paraphrases, noise, etc. Our AI-based technology platform then uses various advanced techniques to generate a list of “uni-features” and “compound features” (more than one word for a feature).

Step 2: Identifying Feature-Opinion Pairs

Next, we identify the relationship between the feature and the opinion. One of the reasons this is challenging with twitter is, most of the time, twitter users treat grammar with utter disdain. Case in point:

“I saw the movie visuals awesome bad climax felt director unnecessarily dragged the second half”

In this case, the feature-opinion pairs are visuals: awesome, climax: bad, second half: unnecessarily dragged. Clearly, something as simple as attributing the nearest opinion-word to the feature is not good enough. Here again, we use advanced AI-based techniques to accurately classify feature-opinion pairs.

We classified close to 1000 opinion words and matched them to each feature. The infographic below shows groups of similar words that the AI algorithm clustered into a single feature, and the top positive and negative sentiments expressed by the Twitterati for each feature.

While our technology can associate words with similar meaning, such as, ‘part after interval’ and ‘second half’, it can also identify spelling errors by identifying and grouping ‘Rajamouli’ and ‘Raajamouli’ as a single feature.

Adjectives like ‘magnificent’ and ‘creative’ were used to describe the Baahubali team positively, while words like ‘boring’, ‘disappointed’, and ‘tiring’ were used to describe the second half of the movie negatively.

Step 3: Sentiment Calculation

Lastly, we calculate the sentiment score, which is determined by the strength of the opinion-word, number of retweets and the time of tweet. A weighted average is normalized and we generate a score on a scale of 0% to 100%.

A Peephole into the Consumer’s Mind

As more and more people express their views and opinions in the online world, there is more of an opportunity to use these data points to drive business strategies.

Consumer-focused brands use DataWeave’s Sentiment Analysis solution as a key element of their product strategy, by reinforcing attributes with positive sentiments in reviews, and improving or eliminating attributes with negative sentiments in reviews.

Click here to find out more about the benefits of using DataWeave’s Sentiment Analysis!

May 5, 2017
Why is Product Matching Difficult? | DataWeave
Product Matching is a combination of algorithmic and manual techniques to recognize and match identical products from different sources. Product matching is at the core of competitive intelligence for retail. A competitive intelligence product is most useful when it can accurately match products of a wide range of categories in a timely manner, and at scale.

Shown below is PriceWeave’s Products Tracking Interface, one of the features where product matching is in action. The Products Tracking Interface lets a brand or a retailer track their products and monitor prices, availability offers, discounts, variants, and SLAs on a daily (or a more frequent) basis.

A snapshot of products tracked for a large online mass merchant

Expanded view for a product shows the prices related data points from competing stores

Product Matching helps a retailer or a brand in several ways:
- Tracking competitor prices and stock availability
- Organizing seller listings on a marketplace platform
- Discovering gaps in product catalog
- Filling the missing attributes in product catalog information
- Comparing product life cycles across competitors
Given its criticality, every competitive intelligence product strives hard to make its product matching accurate and comprehensive. It is a hard problem, and one that cannot be complete addressed in an automated fashion. In the rest of this post, we will talk about why product matching is hard.

Product Matching Guidelines

Amazon provides a guideline to sellers about how they should write product catalog information in order to achieve a good product matching with respect to their seller listings. These guidelines apply to any retail store or marketplace platform. The trouble is, more often than not these guidelines are not followed, or cannot by retailers because they don’t have access to all the product related information. Some of the challenges are:
- Products either don’t have a UPC code or it is not available. There are also non-standard products, unbranded products, and private label products.
- There are products with slights variations in technical specifications, but the complete specs are not available.
- Retailers manage a huge catalog of accessories, for instance Electronics Accessories (screen guards, flip covers, fancy USB drives, etc.).
- Apparels and Lifestyle products often have very little by way of unique identifiers. There is no standard nomenclature for colors, material and style.
- Products are often bundled with accessories or other related products. There are no standard ways of doing product bundling.
In the absence of standard ways of representing products, every retailer uses their own internal product IDs, product descriptions, and attribute names.

Algorithmic Product Matching using “Document Clustering”

Algorithmic product matching is done using some Machine Learning, typically techniques from Document Clustering. A document is a text document or a web page, or a set of terms that usually occur within a “context”. Document clustering is the process of bringing together (forming clusters of) similar documents, and separating our dissimilar ones. There are many ways of defining similarity of documents that we will not delve into in this post. Documents have “features” that act as “identifiers” that help an algorithm cluster them.

A document in our case is a product description — essentially a set of data points or attributes we have extracted from a product page. These attributes include: title, brand, category, price, and other specs. Therefore, these are the attributes that help us cluster together similar products and match products. The quality of clustering — that is how accurate and how complete the clusters are — depends on how good the features are. In our case, most of the times the features are not good, and that is what makes clustering, and in turn product matching, a hard problem.

Noisy Small Factually Weak (NSFW) Documents

The documents that we deal with, the product descriptions, are not well formed and so not readily usable for product matching. We at PriceWeave characterize them endearignly as Noisy Weak and Factually Weak (NSFW) documents. Let us see some examples to understand these terms.

Noisy
- Spelling errors, non-standard and/or incomplete representations of product features.
- Brands written as “UCB” and “WD” instead of “United Colors of Benetton” and “Western Digital”.
- Model no.s might or might not be present. A camera’s model number written as one of the following variants: DSC-WX650 vs DSCWX650 vs DSC WX 650 vs WX 650.
- Noisy/meaningless terms might be present (“brand new”, “manufacturer’s warranty”, “with purchase receipt”)
Small
- Not much description. A product simply written as “Apple iPhone” without any mention of its generation, or other features.
- Not many distinguishable features. Example, “Samsung Galaxy Note vs Samsung Galaxy Note 2”, “Apple ipad 3 16 GB wifi+cellular vs Apple ipad mini 16 GB wifi-cellular”
Factually Weak
- Products represented with generic and subjective descriptions.
- Colours and their combinations might be represented differently. Examples, “Puma Red Striped Bag”, “Adidas Black/Red/Blue Polo Tshirt”.
In the absence of clean, sufficient, and specific product information, the quality of algorithmic matching suffers. Product matching include many knobs and switches to adjust the weights given to different product attributes. For example, we might include a rule that says, “if two products are identical, then they fall in the same price range.” While such rules work well generally, they vary widely from category to category and across geographies. Further, adding more and more specific rules will start throwing off the algorithms in unexpected ways rendering them less effective.

In this post, we discussed the challenges posed by product matching that make it a hard problem to crack. In the next post, we will discuss how we address these challenges to make PriceWeave’s product matching robust.

PriceWeave is an all-around Competitive Intelligence product for retailers, brands, and manufacturers. We’re built on top of huge amounts of products data to provide real-time actionable insights. PriceWeave’s offerings include: pricing intelligence, assortment intelligence, gaps in catalogs, and promotion analysis. Please visit PriceWeave to view all our offerings. If you’d like to try us out request for a demo.

Originally published at blog.priceweave.com.
August 4, 2015