Category: API

  • How DataWeave Enhances Transparency in Competitive Pricing Intelligence for Retailers

    How DataWeave Enhances Transparency in Competitive Pricing Intelligence for Retailers

    Retailers heavily depend on pricing intelligence solutions to consistently achieve and uphold their desired competitive pricing positions in the market. The effectiveness of these solutions, however, hinges on the quality of the underlying data, along with the coverage of product matches across websites.

    As a retailer, gaining complete confidence in your pricing intelligence system requires a focus on the trinity of data quality:

    • Accuracy: Accurate product matching ensures that the right set of competitor product(s) are correctly grouped together along with yours. It ensures that decisions taken by pricing managers to drive competitive pricing and the desired price image are based on reliable apples-to-apples product comparisons.
    • Freshness: Timely data is paramount in navigating the dynamic market landscape. Up-to-date SKU data from competitors enables retailers to promptly adjust pricing strategies in response to market shifts, competitor promotions, or changes in customer demand.
    • Product matching coverage: Comprehensive product matching coverage ensures that products are thoroughly matched with similar or identical competitor products. This involves accurately matching variations in size, weight, color, and other attributes. A higher coverage ensures that retailers seize all available opportunities for price improvement at any given time, directly impacting revenues and margins.

    However, the reality is that untimely data and incomplete product matches have been persistent challenges for pricing teams, compromising their pricing actions. Inaccurate or incomplete data can lead to suboptimal decisions, missed opportunities, and reduced competitiveness in the market.

    What’s worse than poor-quality data? Poor-quality data masquerading as accurate data.

    In many instances, retailers face a significant challenge in obtaining comprehensive visibility into crucial data quality parameters. If they suspect the data quality of their provider is not up to the mark, they are often compelled to manually request reports from their provider to investigate further. This lack of transparency not only hampers their pricing operations but also impedes the troubleshooting process and decision-making, slowing down crucial aspects of their business.

    We’ve heard about this problem from dozens of our retail customers for a while. Now, we’ve solved it.

    DataWeave’s Data Statistics and SKU Management Capability Enhances Data Transparency

    DataWeave’s Data Statistics Dashboard, offered as part of our Pricing Intelligence solution, enables pricing teams to gain unparalleled visibility into their product matches, SKU data freshness, and accuracy.

    It enables retailers to autonomously assess and manage SKU data quality and product matches independently—a crucial aspect of ensuring the best outcomes in the dynamic landscape of eCommerce.

    Beyond providing transparency and visibility into data quality and product matches, the dashboard facilitates proactive data quality management. Users can flag incorrect matches and address various data quality issues, ensuring a proactive approach to maintaining the highest standards.

    Retailers can benefit in several ways with this dashboard, as listed below.

    View Product Match Rates Across Websites

    The dashboard helps retailers track match rates to gauge their health. High product match rates signify that pricing teams can move forward in their pricing actions with confidence. Low match rates would be a cause for further investigation, to better understand the underlying challenges, perhaps within a specific category or competitor website.

    Our dashboard presents both summary statistics on matches and data crawls as well as detailed snapshots and trend charts, providing users with a holistic and detailed perspective of their product matches.

    Additionally, the dashboard provides category-wise snapshots of reference products and their matching counterparts across various retailers, allowing users to focus on areas with lower match rates, investigate underlying reasons, and develop strategies for speedy resolution.

    Track Data Freshness Easily

    The dashboard enables pricing teams to monitor the timeliness of pricing data and assess its recency. In the dynamic realm of eCommerce, having up-to-date data is essential for making impactful pricing decisions. The dashboard’s presentation of freshness rates ensures that pricing teams are armed with the latest product details and pricing information across competitors.

    Within the dashboard, users can readily observe the count of products updated with the most recent pricing data. This feature provides insights into any temporary data capture failures that may have led to a decrease in data freshness. Armed with this information, users can adapt their pricing decisions accordingly, taking into consideration these temporary gaps in fresh data. This proactive approach ensures that pricing strategies remain agile and responsive to fluctuations in data quality.

    Proactively Manage Product Matches

    The dashboard provides users with proactive control over managing product matches within their current bundles via the ‘Data Management’ panel. This functionality empowers users to verify, add, flag, or delete product matches, offering a hands-on approach to refining the matching process. Despite the deployment of robust matching algorithms that achieve industry-leading match rates, occasional instances may arise where specific matches are overlooked or misclassified. In such cases, users play a pivotal role in fine-tuning the matching process to ensure accuracy.

    The interface’s flexibility extends to accommodating product variants and enables users to manage product matches based on store location. Additionally, the platform facilitates bulk match uploads, streamlining the process for users to efficiently handle large volumes of matching data. This versatility ensures that users have the tools they need to navigate and customize the matching process according to the nuances of their specific product landscape.

    Gain Unparalleled Visibility into your Data Quality

    With DataWeave’s Pricing Intelligence, users gain the capability to delve deep into their product data, scrutinize match rates, assess data freshness, and independently manage their product matches. This approach is instrumental in fostering informed and effective decisions, optimizing inventory management, and securing a competitive edge in the dynamic world of online retail.

    To learn more, reach out to us today!

  • API of Telecom Recharge Plans in India

    API of Telecom Recharge Plans in India

    Several months ago we released our Telecom recharge plans API. It soon turned out to be one of our more popular APIs, with some of the leading online recharge portals using it extensively. (So, the next time you recharge your phone, remember us :))

    In this post, we’ll talk in detail about the genesis of this API and the problem it is solving.

    Before that — -and since we are into the business of building data products — some data points.

    As you can see, most mobile phones in India are prepaid. That is to say, there is a huge prepaid mobile recharge market. Just how big is this market?

    The above infographic is based on a recent report by Avendus [pdf]. Let’s focus on the online prepaid recharge market. Some facts:

    1. There are around 11 companies that provide an online prepaid recharge service. Here’s the list: mobikwik, rechargeitnow, paytm, freecharge, justrechargeit, easymobilerecharge, indiamobilerecharge, rechargeguru, onestoprecharge, ezrecharge, anytimerecharge
    2. RechargeItNow seems to be the biggest player. As of August 2013, they claimed an annual transactions worth INR 6 billion, with over 100000 recharges per day pan India.
    3. PayTM, Freecharge, and Mobikwik seem to be the other big players. Freecharge claimed recharge volumes of 40000/day in June 2012 (~ INR 2 billion worth of transactions), and they have been growing steadily.
    4. Telcos offer a commission of approximately 3% to third party recharge portals. So, it means there is an opportunity worth about 4 bn as of today.
    5. Despite the Internet penetration in India being around 11%, only about 1% of mobile prepaid recharges happen online. This goes to show the huge opportunity that lies untapped!
    6. It also goes to show why there are so many players entering this space. It’s only going to get crowded more.

    What does all this have to do with DataWeave? Let’s talk about the scale of the “data problem” that we are dealing with here. Some numbers that give an estimate on this.

    There are 13 cellular service providers in India. Here’s the list: Aircel Cellular Ltd, Aircel Limited, Bharti Airtel, BSNL, Dishnet Wireless, IDEA (operates as Idea ABTL & Spice in different states), Loop Mobile, MTNL, Reliable Internet, Reliance Telecom, Uninor, Videocon, and Vodafone. There are 22 circles in India. (Not every service provider has operations in every circle.)

    Find below the number of telecom recharge plans we have in our database for various operators.

    In fact, you can see that between the last week and today, we have added about 300 new plans (including plans for a new operator).

    The number of plans varies across operators. Vodafone, for instance, gives its users a huge number of options.

    The plans vary based on factors such as: denomination, recharge value, recharge talktime, recharge validity, plan type (voice/data), and of course, circle as well as the operator.

    For a third party recharge service provider, the below are a daily pain point:

    • plans become invalid on a regular basis
    • new plans are added on a regular basis
    • the features associated with a plan change (e.g, a ‘xx mins free talk time’ plan becomes ‘unlimited validity’ or something else)

    We see that 10s of plans become invalid (and new ones introduced) every day. All third party recharge portals lose significant amount of money on a daily basis because: they might not have information about all the plans and they might be displaying invalid plans.

    DataWeave’s Telecom Recharge Plans API solves this problem. This is how you use the API.

    Sample API Request

    “http://api.dataweave.in/v1/telecom_data/listByCircle/?api_key=b20a79e582ee4953ceccf41ac28aa08d&operator=Airtel&circle=Karnataka&page=1&per_page=10”

    Sample API Output

    We aggregate plans from the various cellular service providers across all circles in India on a daily basis. One of our customers once mentioned that earlier they used to aggregate this data manually, and it used to take them about a month to do this. With our API, we have reduced the refresh cycle to one day.

    In addition, now that this is process is automated, they can be confident that the data they present to their customers is almost always complete as well as accurate.

    Want to try it out for your business? Talk to us! If you are a developer who wants to use this or any other APIs, we let you use them for free. Just sign upand get your API key.

    DataWeave helps businesses make data-driven decisions by providing relevant actionable data. The company aggregates and organizes data from the web, such that businesses can access millions of data points through APIs, dashboards, and visualizations.

  • Implementing API for Social Data Analysis

    Implementing API for Social Data Analysis

    In today’s world, the analysis of any social media stream can reap invaluable information about, well, pretty much everything. If you are a business catering to a large number of consumers, it is a very important tool for understanding and analyzing the market’s perception about you, and how your audience reacts to whatever you present before them.

    At DataWeave, we sat down to create a setup that would do this for some e-commerce stores and retail brands. And the first social network we decided to track was the micro-blogging giant, Twitter. Twitter is a great medium for engaging with your audience. It’s also a very efficient marketing channel to reach out to a large number of people.

    Data Collection

    The very first issue that needs to be tackled is collecting the data itself. Now quite understandably, Twitter protects its data vigorously. However, it does have a pretty solid REST API for data distribution purposes too. The API is simple, nothing too complex, and returns data in the easy to use JSON format. Take a look at the timeline API, for example. That’s quite straightforward and has a lot of detailed information.

    The issue with the Twitter API however, is that it is seriously rate limited. Every function can be called in a range of 15–180 times in a 15-minute window. While this is good enough for small projects not needing much data, for any real-world application however, these rate limits can be really frustrating. To avoid this, we used the Streaming API, which creates a long-lived HTTP GET request that continuously streams tweets from the public timeline.

    Also, Twitter seems to suddenly return null values in the middle of the stream, which can make the streamer crash if we don’t take care. As for us, we simply threw away all null data before it reached the analysis phase, and as an added precaution, designed a simple e-mail alert for when the streamer crashed.

    Data Storage

    Next is data storage. Data is traditionally stored in tables, using RDBMS. But for this, we decided to use MongoDB, as a document store seemed quite suitable for our needs. While I didn’t have much clue about MongoDB or what purpose it’s going to serve at first, I realized that is a seriously good alternative to MySQL, PostgreSQL and other relational schema-based data stores for a lot of applications.

    Some of its advantages that I very soon found out were: documents-based data model that are very easy to handle analogous to Python dictionaries, and support for expressive queries. I recommend using this for some of your DB projects. You can play about with it here.

    Data Processing

    Next comes data processing. While data processing in MongoDB is simple, it can also be a hard thing to learn, especially for someone like me, who had no experience anywhere outside SQL. But MongoDB queries are simple to learn once the basics are clear.

    For example, in a DB DWSocial with a collection tweets, the syntax for getting all tweets would be something like this in a Python environment:

    rt = list(db.tweets.find())

    The list type-cast here is necessary, because without it, the output is simply a MongoDB reference, with no value. Now, to find all tweets where user_id is 1234, we have

    rt = list(db.retweets.find({ 'user_id': 1234 })

    Apart from this, we used regexes to detect specific types of tweets, if they were, for example, “offers”, “discounts”, and “deals”. For this, we used the Python re library, that deals with regexes. Suffice is to say, my reaction to regexes for the first two days was much like

    Once again, its just initial stumbles. After some (okay, quite some) help from Thothadri, Murthy and Jyotiska, I finally managed a basic parser that could detect which tweets were offers, discounts and deals. A small code snippet is here for this purpose.

    def deal(id):
    
    re_offers = re.compile(r'''
    
    \b
    
    (?:
    
    deals?
    
    |
    
    offers?
    
    |
    
    discount
    
    |
    
    promotion
    
    |
    
    sale
    
    |
    
    rs?
    
    |
    
    rs\?
    
    |
    
    inr\s*([\d\.,])+
    
    |
    
    ([\d\.,])+\s*inr
    
    )
    
    \b
    
    |
    
    \b\d+%
    
    |
    
    \$\d+\b
    
    ''',
    
    re.I|re.X)
    
    x = list(tweets.find({'user_id' : id,'created_at': { '$gte': fourteen_days_ago }}))
    
    mylist = []
    
    newlist = []
    
    for a in x:
    
    b = re_offers.findall(a.get('text'))
    
    if b:
    
    print a.get('id')
    
    mylist.append(a.get('id'))
    
    w = list(db.retweets.find( { 'id' : a.get('id') } ))
    
    if w:
    
    mydict = {'id' : a.get('id'), 'rt_count' : w[0].get('rt_count'), 'text' : a.get('text'), 'terms' : b}
    
    else:
    
    mydict = {'id' : a.get('id'), 'rt_count' : 0, 'text' : a.get('text'), 'terms' : b}
    
    track.insert(mydict)

    This is much less complicated than it seems. And it also brings us to our final step–integrating all our queries into a REST-ful API.

    Data Serving

    For this, mulitple web-frameworks are available. The ones we did consider were FlaskDjango and Bottle.

    Weighing the pros and cons of every framework can be tedious. I did find this awesome presentation on slideshare though, that succinctly summarizes each framework. You can go through it here.

    We finally settled on Bottle as our choice of framework. The reasons are simple. Bottle is monolithic, i.e., it uses the one-file approach. For small applications, this makes for code that is easier to read and maintainable.

    Some sample web address routes are shown here:

    #show all tracked accounts

    id_legend = {57947109 : 'Flipkart', 183093247: 'HomeShop18', 89443197: 'Myntra', 431336956: 'Jabong'}
    
    @route('/ids')
    
      def get_ids():
    
        result = json.dumps(id_legend)
    
        return result

    #show all user mentions for a particular account @route(‘/user_mentions’)

    def user_mention():
    
      m = request.query.id
    
      ac_id = int(m)
    
      t = list(tweets.find({'created_at': { '$gte': fourteen_days_ago }, 'retweeted': 'no', 'user_id': { '$ne': ac_id} }))
    
      a = len(t)
    
      mylist = []
    
      for i in t:
    
        mylist.append({i.get('user_id'): i.get('id')})
    
      x = { 'num_of_mentions': a, 'mentions_details': mylist }
    
      result = json.dumps(x)
    
      return result

    This is how the DataWeave Social API came into being. I had a great time doing this, with special credits to Sanket, Mandar and Murthy for all the help that they gave me for this. That’s all for now, folks!