Posts Tagged Real Time Data

What media companies can learn from Walmart

Posted by on Wednesday, 14 September, 2011

As reported in a number of places, Walmart has acquired OneRiot — a startup that originally tried to do social search before pivoting to focus on social advertising. OneRiot joins a unit called Walmart Labs, which the giant retailer created earlier this year with the acquisition of a company called Kosmix. Why should media companies (or anyone else, for that matter) find this interesting? Because what drove Walmart to make these acquisitions and create Walmart Labs is the same thing that plenty of other companies, and particularly media entities, should be interested in: namely, making sense of all the data that is coming in from users on social networks and their sharing activity.

Anand Rajaraman — the co-founder of Kosmix and now the guy in charge of Walmart Labs — knows a thing or two about large amounts of data and how to analyze it: he has taught data-mining at Stanford University, and Kosmix was designed to take his knowledge of data analysis techniques and apply them to the massive amounts of data on the web (Rajaraman was also a co-founder of Junglee, which was sold to Amazon in 1998). Then Twitter came along and Kosmix took all of the semantic analysis and other research it had been doing and applied it to the firehose of data coming from the real-time information network to create something called Tweetbeat.

Making sense of the social-network firehose

As Rajaraman told me when I interviewed him at the Disrupt conference last year, where Tweetbeat was launched (a video clip from our interview is embedded below): “It was like we were waiting for this real-time data flow to come along so we could apply our semantic filter to it.” An understanding of how to filter those billions of tweets using semantic tools and a “taxonomy” or structured view of online data allowed Tweetbeat to generate customized views of the content being posted to Twitter in real time — so in one of its first offerings, Tweetbeat let users follow not just information about the World Cup, but tweets and links about individual players, teams and countries.

Obviously, that kind of real-time filtering and analysis of activity can be applied to far more than just showing which soccer team is the most popular, and Walmart’s purchase of Kosmix showed it is clearly interested in the potential of using these techniques to understand its customers and its market. And the addition of OneRiot adds an advertising-related aspect to Walmart’s approach, which could help the retailer understand more about what drives users to click or interact with ads and ad-related content on social networks. As Rajaraman said in his blog post about the purchase:

The technology at the core of what we do is the Social Genome, which enables us to connect millions of consumers with the best products based on their interests at any given moment. The OneRiot technology will enrich the Social Genome, and the OneRiot team adds to the already deep expertise we have around social data analysis.

It may seem odd to think of a company like Walmart as being interested in data, or having anything in common with a media company, but the giant retailer has been passionate about making sense of the data being generated by its business since long before the web and social networks came along. Over a decade ago, the company was already legendary for having a satellite-information network that rivalled that of the U.S. military, which it used to track the movements of every single truck and package throughout its massive empire. In many ways, understanding the movement and intentions of users online is just an extension of that.

Understanding the intentions of users

Media companies and content creators may not see themselves as having anything in common with a giant retailing entity, but the reality is that they need to understand the behavior and interests of their users or customers (which they call readers or listeners or viewers) just like Walmart does. Why do people click on certain stories and not others? How long do they spend on a page and where do they go after they leave? That’s the kind of information that tools like Omniture and comScore can provide — but real-time tools like Chartbeat and the new analytical offering from Twitter can add another element that provides even more data about activity and intent.

As social activity on networks like Twitter and Facebook have become a larger and larger part of what people do online, understanding those “social signals” becomes even more important for anyone whose business depends on attracting online users (which is just about everyone by now). That’s why Google launched Google+ and is adding social features to its search engine, so that it can understand social intent and behavior and how it influences search relevance. And that’s presumably why Walmart has a research lab that is focused on making sense of social-behavior data.

More companies, both media and otherwise, need to start thinking about doing the same thing with the data that flows into and out of their organizations as well. Somewhere in that data is an understanding of why your customers do what they do online, and how to give them more of what they want and when.



Watch this video for free on GigaOM

Disclosure: Cambrian Ventures, a venture firm in which Anand Rajaraman is a partner, was an early investor in Giga Omni Media, the parent company of GigaOM.

Post and thumbnail photos courtesy of Flickr users Luc Legay and Ryan Lackey

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Players and Strategies for Real-Time In-Stream Advertising
  • Flash analysis: prospects for Google+
  • Finding the Value in Social Media Data



alt=''
border='0'
/>


GigaOM


Tweet the heat: Twitter teams up with The Weather Channel

Posted by on Thursday, 11 August, 2011

Your tweeted complaints about this year’s summer heat wave could soon find their way to the TV screen: The Weather Channel and Twitter are launching a deep integration of tweets in the network’s on-air programming, its website and its mobile platform on Thursday.

The Weather Channel Social, as the collaboration is officially called, brings weather-related tweets to the airwaves as well as to Weather.com and the Weather Channel iPhone app. The Weather Channel is also launching 220 custom local Twitter feeds to update Twitter users about their city’s weather forecast.

Tweets displayed on the Weather Channel properties will be curated to filter out any swearing not suitable for broadcasting. The Weather Channel’s properties will also only display content relevant to the location of a particular weather forecast — it just doesn’t help to know that it’s cold elsewhere if you’re braving the high temperatures in Austin, TX.

But even with those caveats, the network still has a lot of material to use. On an average day, Twitter sees about 200 tweets per minute just about the weather. If it gets a little hotter or colder than usual, that rate raises to about 300 to 500 tweets per minute. And when it rains really hard, it also pours tweets: “Significant weather events” can provoke up to two million tweets per day, according to Twitter.

One of the biggest challenges of the integration was apparently to separate weather-related tweets from observations about all the other things that can be hot, cool and foggy in this world. The Weather Channel is relying on technology provided by the New York-based real time data specialists from Wiredset, which also runs Trendrr.com, to curate the Twitter firehose. Wiredset built an AI engine based on the Maximum Entropy Method of data analysis to make sense of all these tweets. Another challenge was that only three percent of tweets come with location information, which is why The Weather Channel is relying on Twitter profiles and location information within the actual text of each tweet, rather than geotagged data.

Other networks have occasionally experimented with the integration of tweets into on-air programming, but those experiments have so far mostly been based on single events like the MTV Music Awards. Making tweets a constant part of your programming is definitely a much bolder step, and The Weather Channel is placing an interesting bet on the power of citizen reporting with this integration.

Tweets about your average sunny day will be part of regular forecasts on Weather.com and featured during select on-air programming, but the social integration will play a much bigger role for the network when things go awry. In other words: The Weather Channel essentially just turned millions of Twitter users into field reporters, ready to be put into the spotlight and featured on air whenever severe or even catastrophic weather strikes parts of the U.S.

Photo courtesy of Flickr user Dennis Jernberg.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Millennials in the enterprise, part 1: strategies for supporting the new digital workforce
  • Infrastructure Q2: Big data and PaaS gain more momentum
  • Players and Strategies for Real-Time In-Stream Advertising



alt=''
border='0'
/>


GigaOM — Tech News, Analysis and Trends


Twitter to open source Hadoop-like tool

Posted by on Friday, 5 August, 2011

Attention webscale aficionados, Twitter says it is planning to open source Storm, its Hadoop-like real-time data processing tool. In a blog post Thursday, the microblogging network said it plans to release the Storm code on Sept. 19 at the Strange Loop event in St. Louis, Mo.

The question is — does the world need another real-time data processing tool? After all there are many tools like HStreaming (using Hadoop), the open source S4 and StreamBase, but the overall analytics market (if you can call it a market) is already fragmented. The Storm code comes from Twitter’s acquisition of BackType last month and seems to be an effort to get folks comfortable parsing data on Twitter.

The post does an excellent job laying out use cases for Storm and hints at more to come. While the code can deal with distributed nodes and huge amounts of data a la Hadoop or Map Reduce, Storm handles jobs that are “infinite.” It’s not for a data processing job with an end point, it’s good for streams of data and continual processing. From the post by Nathan Marz:

Here’s a recap of the three broad use cases for Storm:

  • Stream processing: Storm can be used to process a stream of new data and update databases in realtime. Unlike the standard approach of doing stream processing with a network of queues and workers, Storm is fault-tolerant and scalable.
  • Continuous computation: Storm can do a continuous query and stream the results to clients in realtime. An example is streaming trending topics on Twitter into browsers. The browsers will have a realtime view on what the trending topics are as they happen.
  • Distributed RPC: Storm can be used to parallelize an intense query on the fly. The idea is that your Storm topology is a distributed function that waits for invocation messages. When it receives an invocation, it computes the query and sends back the results. Examples of Distributed RPC are parallelizing search queries or doing set operations on large numbers of large sets.

But wait! There’s more! At the end of the post we are assured that there’s more to Storm than the blog post has even defined, which we can learn more about next month at the Strange Loop event. From the post:

I’ve only scratched the surface on Storm. The “stream” concept at the core of Storm can be taken so much further than what I’ve shown here — I didn’t talk about things like multi-streams, implicit streams, or direct groupings. I showed two of Storm’s main abstractions, spouts and bolts, but I didn’t talk about Storm’s third, and possibly most powerful abstraction, the “state spout”. I didn’t show how you do distributed RPC over Storm, and I didn’t discuss Storm’s awesome automated deploy that lets you create a Storm cluster on EC2 with just the click of a button.

So for those anxious to test out a new method of crunching terabytes of real-time data on the fly, get thee to GitHub! And wait.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Defining Hadoop: the Players, Technologies and Challenges of 2011
  • Infrastructure Overview, Q2 2010
  • Big Data Marketplaces Put a Price on Finding Patterns



alt=''
border='0'
/>


GigaOM — Tech News, Analysis and Trends


Gojee shows that big data and food is a delicious combo

Posted by on Saturday, 23 July, 2011

Tonight, I’m making a sweet potato, black bean and avocado dinner thanks to Gojee, a recipe-search site that knows what I have in my pantry and uses that knowledge about my ingredients and past recipes to deliver fresh ideas for dinner. The site, which launched earlier this month, reached 50,000 users in 10 days. But it may be most useful as a lesson on bringing big-data applications to the masses.

The search engine is simple: You sign up with your email, and tell it what you want, what you have on hand, and anything you dislike — then it shows you beautiful photos of meals that you can make with your ingredients. It’s a recommendation engine, but for a select portion of its customers it’s also a real-time data repository that can use their trips to the grocery store as a means of telling them what to cook. Gojee has a relationship with D’Agostino, a 15-chain store in New York that has connected its loyalty card to the service. So within a minute or two of buying their groceries, Gojee users who shop at D’Agostino (and give Gojee their loyalty card number) can see their “I have” section populate with their freshly bought groceries.

Finding the recipe for success

I’ve been really interested in how we can make food fit for the web, so we can track what we eat and help connect our food with our digital to-do lists, social networks and meal-planning services. Mike LaValle, CEO and co-founder of Gojee is just as interested. The current version of Gojee is actually the third iteration of the site, as LaValle and his cofounder struggled to bring food data to the mass consumer audience. LaValle, who used to work at Morgan Stanley, built his first attempt at Gojee to offer what LaValle calls “a Mint.com-like experience around food.”

The first version failed, however, in part because consumers found it too overwhelming — so LaValle tried again with a what he explained was more qualitative analysis rather than quantitative analysis. He compared it to both a Twitter for food and something like a Farecaster for groceries, that would estimate the price for things that one buys. But that site didn’t even make it past the initial testing. When someone said they really like the recipe-recommendation feature, LaValle — desperate to find something that worked — seized upon it and built Gojee. He apparently has hit enough of a winner to draw an enthusiastic group of initial users, and now he’s seeking an undisclosed amount in seed funding to help expand the site.

While LaValle seems to have discovered what CEO Andy Smith from the DailyBurn discovered about consumers and health and food data when he released the MealSnap app — namely that simple is better for mass adoption — it’s the depths of the data he can access that make the service intriguing. LaValle may not be taking advantage of it, but food is a huge opportunity for startups and web companies because everyone eats. And while today most people may not want calorie information on a site, if they could link it to their pedometer or to a computer-created grocery list comprised of recipes they want to serve in the coming week, there could be a big opportunity.

The coming food revolution

LaValle thinks we’re about 24 months away from mass adoption of such an opportunity, and that Gojee is merely a small wedge trying to pry that opportunity open by enticing grocers to share their data and by getting consumers comfortable with the idea. “I think right now we want to be the No. 1 go-to spot for inspiration and that’s our focus for short to medium term. Then we’ll be broadening the service and reaching out to a lot of chains that have been watching us and starting that avalanche of turning that information back to users,” La Valle said.

LaValle said that translating food consumption into applications and actionable data for consumers is a huge source of innovation for startups and the packaged goods and grocery industries much like open financial data standards and searchable travel data did a few years back. “This could spawn a decade in food data that mirrors what happened in financial data and the travel industry as well,” LaValle said. I for one can’t wait.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight
  • Finding the Value in Social Media Data
  • Defining Hadoop: the Players, Technologies and Challenges of 2011



alt=''
border='0'
/>


GigaOM — Tech News, Analysis and Trends


Twitter to Client Developers: We’ll Take it From Here

Posted by on Saturday, 12 March, 2011

After increasing speculation about Twitter’s growing hostility toward Twitter client developers, the micro-messaging service today removed all doubt about what kind of relationship it wants with developers. In a blunt message on its developer forum, platform lead Ryan Sarver basically said new Twitter apps are not welcome and existing ones are going to be on a very tight leash.

The latest pronouncement about Twitter’s stance makes explicit that Twitter is not interested in developers working on rival client apps. While existing apps are OK for now, new apps need not apply, said Sarver:

Developers have told us that they’d like more guidance from us about the best opportunities to build on Twitter.  More specifically, developers ask us if they should build client apps that mimic or reproduce the mainstream Twitter consumer client experience.  The answer is no.

Instead new developers should consider different opportunities such as building publisher tools, curation applications, real-time data analysis, social CRM services for enterprise clients and the kind of integration that Foursquare and Instagram have included in their apps.

For those developers like Ubermedia who continue to put out Twitter clients, Sarver said they will be held to a very high standard as laid out in an updated terms of service agreement. Ubermedia last month had several of its apps including UberTwitter and Twidroyd temporarily shut down for policy violations that stemmed from privacy, monetization and trademark issues. The episode illustrated the hard line Twitter intends to take with client developers. And now with its more explicit roadmap for developers, Twitter is underscoring the risk associated with building a business atop another company’s platform.

So why is Twitter doing this, basically discouraging a community of developers that helped propel it to success? Sarver said the multitude of apps can sow confusion with consumers when the user experiences and functionality differ between them and the official Twitter application and website. He said this fragmentation and inconsistency, along with potential for privacy and other policy violations by developers, requires Twitter to push for a more uniform experience and discourage new Twitter apps.

This is, of course, Twitter’s right. They own the platform. And they have increasingly shown they want to be the primary way people consume and interact with tweets. This is important as the company looks to squeeze more money out of its burgeoning network and figure out its business model. The company already has bought up clients like Tweetie, and now its apps and website are the top five ways people access the service, said Sarver. He said 90 percent of active Twitter users use official Twitter apps on a monthly basis. Still, Twitter hosts an ecosystem of some 750,000 apps, which will it continue to support.

Now whether developers will continue to turn to Twitter is another question. RSS pioneer Dave Winer said the new roadmap for developers underscores the need for developers to look at building off the Internet instead of platforms in which the owner is too active. “The Internet remains the best place to develop because it is the Platform With No Platform Vendor. Every generation of developers learns this for themselves.”

For now, it looks like Twitter is trying to push developers into helping business customers figure out how to leverage the service. But developers who choose that route have also got to wonder how long before Twitter wants to go after that business too. By being so active and now throwing its weight around, it’s hard for developers to know how far Twitter ambitions go and how long before they too are looked at as unwelcome competition.

Related content from GigaOM Pro (subscription req’d):

  • The Near-Term Evolution of Social Commerce
  • A 2011 Connected Consumer Forecast
  • A 2011 NewNet Forecast



alt='The exponential data center is here: Juniper Networks'
border='0'
/>


GigaOM


Facebook Unveils the Secrets Behind the Like Button

Posted by on Tuesday, 8 March, 2011

If there’s one thing that websites and publishers can’t get enough of, it’s analytics — data-mining tools like Google Analytics and real-time snapshots of activity like Chartbeat, which show who comes to a site and when, where they come from, and what they do when they get there. Now websites can get that kind of info from Facebook too, thanks to some new analytical tools that the social network launched today, which give publishers insights via Facebook’s plugins — including the ubiquitous “like” button. As social media starts to drive more and more traffic to websites, such tools are becoming even more important.

Facebook has had analytics for its own pages for some time, which show “fan” page administrators how users are interacting with the pages, whether they are sharing content, etc. — along with particulars about their age, sex and any other demographic info they have chosen to share through the network. And since it launched its social plugins last year, the network has provided some data about how users are responding to “like” buttons, etc. But the new features it launched today provide a lot more information, and real-time data, about that activity. The analytics include:

  • Like button analytics: Facebook provides anonymized data to show sites the number of times people saw “like” buttons on their pages (known as “impressions”), how many times they clicked on them, as well as how many times people saw those buttons on Facebook and clicked through to the site.
  • Comment analytics: Sites can see the number of times people saw the comment plugins that Facebook recently launched, how many times they actually posted a comment, and how many times they clicked through from a comment that was cross-posted from the site to Facebook.
  • Demographic analytics: Just as it does with Facebook pages, the social network can show websites aggregated demographic data about the visitors to their pages who logged in with their Facebook profile.
  • Organic sharing analytics: Even if a site doesn’t use the Facebook open-graph social plugins, the site’s new analytics offers data on how often content from a site is shared on the network, either by someone pasting a URL or sharing in some other way.

Although many websites and publishers have concerns about integrating themselves so tightly with Facebook, in part because of the control that gives the giant social network (and in some cases, concern about the impact on users’ privacy), there is no question that this kind of data analysis is going to be very appealing to a lot of sites — particularly the ones that are using its social tools to expand their reach, and looking for evidence that this strategy is working. They can see exactly which content is getting engagement and when.

Already, some sites such as Talking Points Memo have started to notice that Facebook is generating a growing amount of their traffic (the Nieman Journalism Lab is asking other sites to submit data about where their traffic comes from, so it can track those patterns). And the implementation of Facebook comments is likely to drive those numbers higher for many, although there are concerns about that as well.

One risk for publishers, however, is that they start to focus only on users who login via Facebook and spend less time paying attention to visitors who don’t. And the ultimate extension of that kind of thinking, of course, is to give up on your website altogether and just use a Facebook page, as the hyper-local community site Rockville Central recently did — something the social network is no doubt happy to facilitate.

Related GigaOM Pro content (sub req’d):

  • Are Comments Facebook’s Next Big Service?
  • Social Advertising Models Go Back to the Future
  • What Facebook Messages Is Really After

Post and thumbnail courtesy of Flickr user Retinafunk



alt='The exponential data center is here: Juniper Networks'
border='0'
/>


GigaOM