The DataWeave Blog

Category: API

Own Your Product Matches: Gain The Power of Accuracy and Control at Your Fingertips
AI-powered product matching is the backbone of competitive pricing intelligence. Accurate matches help you compare prices correctly, identify meaningful assortment gaps, and optimize product content. Inaccurate matches distort every one of these insights. In some categories, a single mismatch can cause millions of dollars of lost revenue.

Retailers and brands know this problem well. Product catalogs are vast. Competitor assortments shift daily. Titles are inconsistent. Product codes are missing. Images vary by region or packaging. Basically, context matters, and AI alone often misses that context.

This is why a human-in-the-loop approach is essential. It allows product matches to be verified consistently, at scale, and with the context that only people can provide. Many retailers have also told us they want to take this a step further. They want the ability to control and define their own product matches.

Sometimes that is because they need to fix inevitable errors quickly. Other times, it is because their teams have deeper category knowledge and can make the right judgment calls when AI falls short.

To make that possible, DataWeave introduced User-Led Match Management. It combines the scale of AI with the judgment of experts within retail organizations. The platform does not just suggest matches. It gives your teams the tools to approve, reject, or refine them. This ensures your competitive intelligence reflects both machine precision and your unique business logic.

Why AI Matching Alone Falls Short

AI has changed the speed and scale of product matching. Algorithms can process millions of SKUs quickly. They can detect similarities in text, images, and metadata. But in retail, the stakes are too high to rely on AI alone.

Here is where AI sometimes falls short:
- Category complexity: Matching rules that work in electronics may fail in fashion or grocery. An electronics SKU may depend on a model number. A fashion SKU may depend on seasonality. A grocery SKU may depend on pack size or whether it is a private label.
- Data inconsistency: Titles vary. Images differ across regions. These gaps, when large, trip up algorithms.
- Business context: Should a premium product ever be compared against a budget line? Should seasonal products match year-round items? AI may not know these boundaries.
- Scale vs. accuracy: Automated systems optimize for coverage. That speed often limits accuracy for a small set of SKUs. Even a 1% error rate across millions of SKUs creates thousands of bad comparisons.
AI is critical for scale. But accuracy requires human input. DataWeave’s human-in-the-loop framework addresses this by allowing expert reviewers to validate and improve AI outputs. Our user-led match management takes this further by putting control directly into the hands of your business teams.

What DataWeave’s User-Led Match Management Delivers

With User-Led Match Management, your team is not a passive reviewer. They become active participants in shaping the accuracy of your competitive intelligence.

Your teams can:
- Approve, reject, or flag AI-suggested matches. Every suggestion comes with full visibility into why it was made. Your team can validate matches quickly, fix errors, and improve the dataset in real time.
- Define what “similar” means for your business. A retailer may want to compare multipacks against single packs. A brand may only care about comparing premium products to other premium products. With User-Led Match Management, your team sets tolerance levels that match your strategy.
- Manually add or refine matches. When AI misses edge cases, your team can add them. This ensures coverage is complete and reflects the true competitive landscape.
This approach creates a loop where AI, complemented by DataWeave’s human-in-the-loop framework does the heavy lifting, and your teams can fine-tune the results. The outcome is both scale and accuracy.

Key Features

DataWeave designed User-Led Match Management to be simple, intuitive, and scalable:
- Expert-Led Decision Making forms the heart of the system. Rather than trusting AI suggestions blindly, teams gain full visibility into matching logic and can leverage their contextual knowledge of products, categories, and retailers. When the system suggests matching a premium product against a basic alternative, human experts can reject the match and flag it for different criteria. This expertise is particularly valuable for new product launches, seasonal items, or products with complex positioning strategies.
- Business Logic Integration: Teams can define matching parameters that reflect their specific strategic needs. A premium brand might establish rules that prevent matches against budget alternatives, while a value retailer might specifically seek those comparisons. Category managers can create different matching criteria for different product lines, ensuring that seasonal items, limited editions, and promotional products are handled appropriately.
- Transparent Decision Making: Every match decision creates an audit trail capturing who made the decision, when it occurred, and the reasoning behind it. This transparency is crucial for enterprise environments where pricing decisions need to be defensible and strategies need to be consistent across teams and time periods.
- Scalable Validation: User-Led systems provide bulk operations for efficiency while maintaining oversight. Teams can upload thousands of matches for validation, use filtered views to focus on high-priority items, and leverage automated alerts for matches that fall outside established tolerance levels.
Each of these features reduces the friction between AI outputs and business-ready insights.

Technical Foundation

The AI foundation behind User-Led Match Management is built for precision and scale.
1. It uses multimodal AI that combines text, image, and metadata analysis to identify matches even when products are described or displayed differently across retailers.
2. Domain heuristics apply retail-specific logic, recognizing that “Large” means something different in apparel than in beverages, and that seasonal items require unique treatment.
3. Knowledge graphs link products across brands, categories, and regions to reveal true relationships even when surface attributes vary.
4. Through continuous learning, every human correction improves future AI suggestions, making the system smarter and more accurate over time.
For more information, download our whitepaper here!

Why This Matters

Pricing Intelligence

With DataWeave, accurate and reliable product matching is the standard. Advanced algorithms and built-in quality checks deliver consistently high accuracy, reducing the risk of mismatched products and unreliable insights.

In the few cases where a match needs review, User-Led Match Management gives your team the ability to validate it quickly and easily. You get full visibility and control, while DataWeave ensures the integrity of the overall matching framework.

The outcome is true apples-to-apples price comparisons that protect margins, strengthen pricing strategies, and build trust in every decision.

Assortment Analytics

Gaps and overlaps only matter when matches are accurate. To understand your true competitive landscape, you need to eliminate false gaps and phantom overlaps that distort assortment insights.

DataWeave’s advanced Match Management ensures precise product alignment across retailers, categories, and regions, giving you a clear view of your position in the market. At the same time, user-led oversight adds transparent validation, allowing your teams to confirm or refine matches based on their category knowledge.

The result is a complete and trustworthy view of category coverage that reflects reality, not noise. It helps you identify real opportunities to expand assortments, close gaps, and respond quickly to market changes.

Content Optimization

Digital shelf audits only deliver value when the comparisons are accurate. DataWeave ensures that every product is benchmarked against its true competitors so that your insights reflect the real dynamics of your category. For example, a luxury serum is never compared to a basic moisturizer, and a premium electronic device is never matched with an entry-level model.

With user-led control, your teams have transparent oversight of every match. They can review, validate, or adjust comparisons to make sure each audit aligns with your business standards. The result is a more reliable and actionable view of your digital shelf performance, helping you fine-tune content, optimize visibility, and strengthen conversion across channels.

Trust and Accountability

Leadership teams need complete confidence in the data they use to make decisions. User-Led Match Management delivers that confidence by combining the scale of AI with the assurance of human validation. Every match decision is transparent and traceable, giving teams clear visibility into how and why a product was matched.

This approach builds trust across departments, from analysts to executives. It ensures that every pricing, assortment, and content decision is backed by data that is both accurate and accountable.

Your Market, Your Rules, Your Insights

Retailers and brands today need more than fast data. They need data they can trust, shape, and act on with confidence. User-Led Match Management gives them that control. It turns product matching from a static, automated process into a dynamic, collaborative workflow that adapts to how real teams operate.

Category managers can fine-tune match rules instead of waiting on system updates. Pricing teams can validate critical SKUs in minutes, not days. Digital shelf teams can ensure their audits reflect real competitors, not algorithmic guesses. Executives gain visibility into decisions they can stand behind, supported by transparent data trails and measurable accuracy.

In short, User-Led Match Management puts control back where it belongs – in your hands. It helps every team move faster, compete smarter, and make decisions powered by data they can truly believe in.

Reach out to us to learn more!
October 21, 2025
How DataWeave Enhances Transparency in Competitive Pricing Intelligence for Retailers
Retailers heavily depend on pricing intelligence solutions to consistently achieve and uphold their desired competitive pricing positions in the market. The effectiveness of these solutions, however, hinges on the quality of the underlying data, along with the coverage of product matches across websites.

As a retailer, gaining complete confidence in your pricing intelligence system requires a focus on the trinity of data quality:
- Accuracy: Accurate product matching ensures that the right set of competitor product(s) are correctly grouped together along with yours. It ensures that decisions taken by pricing managers to drive competitive pricing and the desired price image are based on reliable apples-to-apples product comparisons.
- Freshness: Timely data is paramount in navigating the dynamic market landscape. Up-to-date SKU data from competitors enables retailers to promptly adjust pricing strategies in response to market shifts, competitor promotions, or changes in customer demand.
- Product matching coverage: Comprehensive product matching coverage ensures that products are thoroughly matched with similar or identical competitor products. This involves accurately matching variations in size, weight, color, and other attributes. A higher coverage ensures that retailers seize all available opportunities for price improvement at any given time, directly impacting revenues and margins.
However, the reality is that untimely data and incomplete product matches have been persistent challenges for pricing teams, compromising their pricing actions. Inaccurate or incomplete data can lead to suboptimal decisions, missed opportunities, and reduced competitiveness in the market.

What’s worse than poor-quality data? Poor-quality data masquerading as accurate data.

In many instances, retailers face a significant challenge in obtaining comprehensive visibility into crucial data quality parameters. If they suspect the data quality of their provider is not up to the mark, they are often compelled to manually request reports from their provider to investigate further. This lack of transparency not only hampers their pricing operations but also impedes the troubleshooting process and decision-making, slowing down crucial aspects of their business.

We’ve heard about this problem from dozens of our retail customers for a while. Now, we’ve solved it.

DataWeave’s Data Statistics and SKU Management Capability Enhances Data Transparency

DataWeave’s Data Statistics Dashboard, offered as part of our Pricing Intelligence solution, enables pricing teams to gain unparalleled visibility into their product matches, SKU data freshness, and accuracy.

It enables retailers to autonomously assess and manage SKU data quality and product matches independently—a crucial aspect of ensuring the best outcomes in the dynamic landscape of eCommerce.

Beyond providing transparency and visibility into data quality and product matches, the dashboard facilitates proactive data quality management. Users can flag incorrect matches and address various data quality issues, ensuring a proactive approach to maintaining the highest standards.

Retailers can benefit in several ways with this dashboard, as listed below.

View Product Match Rates Across Websites

The dashboard helps retailers track match rates to gauge their health. High product match rates signify that pricing teams can move forward in their pricing actions with confidence. Low match rates would be a cause for further investigation, to better understand the underlying challenges, perhaps within a specific category or competitor website.

Our dashboard presents both summary statistics on matches and data crawls as well as detailed snapshots and trend charts, providing users with a holistic and detailed perspective of their product matches.

Additionally, the dashboard provides category-wise snapshots of reference products and their matching counterparts across various retailers, allowing users to focus on areas with lower match rates, investigate underlying reasons, and develop strategies for speedy resolution.

Track Data Freshness Easily

The dashboard enables pricing teams to monitor the timeliness of pricing data and assess its recency. In the dynamic realm of eCommerce, having up-to-date data is essential for making impactful pricing decisions. The dashboard’s presentation of freshness rates ensures that pricing teams are armed with the latest product details and pricing information across competitors.

Within the dashboard, users can readily observe the count of products updated with the most recent pricing data. This feature provides insights into any temporary data capture failures that may have led to a decrease in data freshness. Armed with this information, users can adapt their pricing decisions accordingly, taking into consideration these temporary gaps in fresh data. This proactive approach ensures that pricing strategies remain agile and responsive to fluctuations in data quality.

Proactively Manage Product Matches

The dashboard provides users with proactive control over managing product matches within their current bundles via the ‘Data Management’ panel. This functionality empowers users to verify, add, flag, or delete product matches, offering a hands-on approach to refining the matching process. Despite the deployment of robust matching algorithms that achieve industry-leading match rates, occasional instances may arise where specific matches are overlooked or misclassified. In such cases, users play a pivotal role in fine-tuning the matching process to ensure accuracy.

The interface’s flexibility extends to accommodating product variants and enables users to manage product matches based on store location. Additionally, the platform facilitates bulk match uploads, streamlining the process for users to efficiently handle large volumes of matching data. This versatility ensures that users have the tools they need to navigate and customize the matching process according to the nuances of their specific product landscape.

Gain Unparalleled Visibility into your Data Quality

With DataWeave’s Pricing Intelligence, users gain the capability to delve deep into their product data, scrutinize match rates, assess data freshness, and independently manage their product matches. This approach is instrumental in fostering informed and effective decisions, optimizing inventory management, and securing a competitive edge in the dynamic world of online retail.

To learn more, reach out to us today!
January 22, 2024
API of Telecom Recharge Plans in India
Several months ago we released our Telecom recharge plans API. It soon turned out to be one of our more popular APIs, with some of the leading online recharge portals using it extensively. (So, the next time you recharge your phone, remember us :))

In this post, we’ll talk in detail about the genesis of this API and the problem it is solving.

Before that — -and since we are into the business of building data products — some data points.

As you can see, most mobile phones in India are prepaid. That is to say, there is a huge prepaid mobile recharge market. Just how big is this market?

The above infographic is based on a recent report by Avendus [pdf]. Let’s focus on the online prepaid recharge market. Some facts:
1. There are around 11 companies that provide an online prepaid recharge service. Here’s the list: mobikwik, rechargeitnow, paytm, freecharge, justrechargeit, easymobilerecharge, indiamobilerecharge, rechargeguru, onestoprecharge, ezrecharge, anytimerecharge
2. RechargeItNow seems to be the biggest player. As of August 2013, they claimed an annual transactions worth INR 6 billion, with over 100000 recharges per day pan India.
3. PayTM, Freecharge, and Mobikwik seem to be the other big players. Freecharge claimed recharge volumes of 40000/day in June 2012 (~ INR 2 billion worth of transactions), and they have been growing steadily.
4. Telcos offer a commission of approximately 3% to third party recharge portals. So, it means there is an opportunity worth about 4 bn as of today.
5. Despite the Internet penetration in India being around 11%, only about 1% of mobile prepaid recharges happen online. This goes to show the huge opportunity that lies untapped!
6. It also goes to show why there are so many players entering this space. It’s only going to get crowded more.
What does all this have to do with DataWeave? Let’s talk about the scale of the “data problem” that we are dealing with here. Some numbers that give an estimate on this.

There are 13 cellular service providers in India. Here’s the list: Aircel Cellular Ltd, Aircel Limited, Bharti Airtel, BSNL, Dishnet Wireless, IDEA (operates as Idea ABTL & Spice in different states), Loop Mobile, MTNL, Reliable Internet, Reliance Telecom, Uninor, Videocon, and Vodafone. There are 22 circles in India. (Not every service provider has operations in every circle.)

Find below the number of telecom recharge plans we have in our database for various operators.

In fact, you can see that between the last week and today, we have added about 300 new plans (including plans for a new operator).

The number of plans varies across operators. Vodafone, for instance, gives its users a huge number of options.

The plans vary based on factors such as: denomination, recharge value, recharge talktime, recharge validity, plan type (voice/data), and of course, circle as well as the operator.

For a third party recharge service provider, the below are a daily pain point:
- plans become invalid on a regular basis
- new plans are added on a regular basis
- the features associated with a plan change (e.g, a ‘xx mins free talk time’ plan becomes ‘unlimited validity’ or something else)
We see that 10s of plans become invalid (and new ones introduced) every day. All third party recharge portals lose significant amount of money on a daily basis because: they might not have information about all the plans and they might be displaying invalid plans.

DataWeave’s Telecom Recharge Plans API solves this problem. This is how you use the API.

Sample API Request

“http://api.dataweave.in/v1/telecom_data/listByCircle/?api_key=b20a79e582ee4953ceccf41ac28aa08d&operator=Airtel&circle=Karnataka&page=1&per_page=10”

Sample API Output

We aggregate plans from the various cellular service providers across all circles in India on a daily basis. One of our customers once mentioned that earlier they used to aggregate this data manually, and it used to take them about a month to do this. With our API, we have reduced the refresh cycle to one day.

In addition, now that this is process is automated, they can be confident that the data they present to their customers is almost always complete as well as accurate.

Want to try it out for your business? Talk to us! If you are a developer who wants to use this or any other APIs, we let you use them for free. Just sign upand get your API key.

DataWeave helps businesses make data-driven decisions by providing relevant actionable data. The company aggregates and organizes data from the web, such that businesses can access millions of data points through APIs, dashboards, and visualizations.
August 4, 2015
Implementing API for Social Data Analysis
In today’s world, the analysis of any social media stream can reap invaluable information about, well, pretty much everything. If you are a business catering to a large number of consumers, it is a very important tool for understanding and analyzing the market’s perception about you, and how your audience reacts to whatever you present before them.

At DataWeave, we sat down to create a setup that would do this for some e-commerce stores and retail brands. And the first social network we decided to track was the micro-blogging giant, Twitter. Twitter is a great medium for engaging with your audience. It’s also a very efficient marketing channel to reach out to a large number of people.

Data Collection

The very first issue that needs to be tackled is collecting the data itself. Now quite understandably, Twitter protects its data vigorously. However, it does have a pretty solid REST API for data distribution purposes too. The API is simple, nothing too complex, and returns data in the easy to use JSON format. Take a look at the timeline API, for example. That’s quite straightforward and has a lot of detailed information.

The issue with the Twitter API however, is that it is seriously rate limited. Every function can be called in a range of 15–180 times in a 15-minute window. While this is good enough for small projects not needing much data, for any real-world application however, these rate limits can be really frustrating. To avoid this, we used the Streaming API, which creates a long-lived HTTP GET request that continuously streams tweets from the public timeline.

Also, Twitter seems to suddenly return null values in the middle of the stream, which can make the streamer crash if we don’t take care. As for us, we simply threw away all null data before it reached the analysis phase, and as an added precaution, designed a simple e-mail alert for when the streamer crashed.

Data Storage

Next is data storage. Data is traditionally stored in tables, using RDBMS. But for this, we decided to use MongoDB, as a document store seemed quite suitable for our needs. While I didn’t have much clue about MongoDB or what purpose it’s going to serve at first, I realized that is a seriously good alternative to MySQL, PostgreSQL and other relational schema-based data stores for a lot of applications.

Some of its advantages that I very soon found out were: documents-based data model that are very easy to handle analogous to Python dictionaries, and support for expressive queries. I recommend using this for some of your DB projects. You can play about with it here.

Data Processing

Next comes data processing. While data processing in MongoDB is simple, it can also be a hard thing to learn, especially for someone like me, who had no experience anywhere outside SQL. But MongoDB queries are simple to learn once the basics are clear.

For example, in a DB DWSocial with a collection tweets, the syntax for getting all tweets would be something like this in a Python environment:
```
rt = list(db.tweets.find())
```
The list type-cast here is necessary, because without it, the output is simply a MongoDB reference, with no value. Now, to find all tweets where user_id is 1234, we have
```
rt = list(db.retweets.find({ 'user_id': 1234 })
```
Apart from this, we used regexes to detect specific types of tweets, if they were, for example, “offers”, “discounts”, and “deals”. For this, we used the Python re library, that deals with regexes. Suffice is to say, my reaction to regexes for the first two days was much like

Once again, its just initial stumbles. After some (okay, quite some) help from Thothadri, Murthy and Jyotiska, I finally managed a basic parser that could detect which tweets were offers, discounts and deals. A small code snippet is here for this purpose.
```
def deal(id):

re_offers = re.compile(r'''

\b

(?:

deals?

|

offers?

|

discount

|

promotion

|

sale

|

rs?

|

rs\?

|

inr\s*([\d\.,])+

|

([\d\.,])+\s*inr

)

\b

|

\b\d+%

|

\$\d+\b

''',

re.I|re.X)

x = list(tweets.find({'user_id' : id,'created_at': { '$gte': fourteen_days_ago }}))

mylist = []

newlist = []

for a in x:

b = re_offers.findall(a.get('text'))

if b:

print a.get('id')

mylist.append(a.get('id'))

w = list(db.retweets.find( { 'id' : a.get('id') } ))

if w:

mydict = {'id' : a.get('id'), 'rt_count' : w[0].get('rt_count'), 'text' : a.get('text'), 'terms' : b}

else:

mydict = {'id' : a.get('id'), 'rt_count' : 0, 'text' : a.get('text'), 'terms' : b}

track.insert(mydict)
```
This is much less complicated than it seems. And it also brings us to our final step–integrating all our queries into a REST-ful API.

Data Serving

For this, mulitple web-frameworks are available. The ones we did consider were Flask, Django and Bottle.

Weighing the pros and cons of every framework can be tedious. I did find this awesome presentation on slideshare though, that succinctly summarizes each framework. You can go through it here.

We finally settled on Bottle as our choice of framework. The reasons are simple. Bottle is monolithic, i.e., it uses the one-file approach. For small applications, this makes for code that is easier to read and maintainable.

Some sample web address routes are shown here:

#show all tracked accounts
```
id_legend = {57947109 : 'Flipkart', 183093247: 'HomeShop18', 89443197: 'Myntra', 431336956: 'Jabong'}

@route('/ids')

  def get_ids():

    result = json.dumps(id_legend)

    return result
```
#show all user mentions for a particular account @route(‘/user_mentions’)
```
def user_mention():

  m = request.query.id

  ac_id = int(m)

  t = list(tweets.find({'created_at': { '$gte': fourteen_days_ago }, 'retweeted': 'no', 'user_id': { '$ne': ac_id} }))

  a = len(t)

  mylist = []

  for i in t:

    mylist.append({i.get('user_id'): i.get('id')})

  x = { 'num_of_mentions': a, 'mentions_details': mylist }

  result = json.dumps(x)

  return result
```
This is how the DataWeave Social API came into being. I had a great time doing this, with special credits to Sanket, Mandar and Murthy for all the help that they gave me for this. That’s all for now, folks!
August 4, 2015