Solve These Tough Data Problems and Watch Job Offers Roll In

Late in 2015, Gilberto Titericz, an electrical engineer at Brazil’s state oil company Petrobras, told his boss he planned to resign, after seven years maintaining sensors and other hardware in oil plants. By devoting hundreds of hours of leisure time to the obscure world of competitive data analysis, Titericz had recently become the world’s top-ranked data scientist, by one reckoning. Silicon Valley was calling. “Only when I wanted to quit did they realize they had the number-one data scientist,” he says.

Petrobras held on to its champ for a time by moving Titericz into a position that used his data skills. But since topping the rankings that October he’d received a stream of emails from recruiters around the globe, including representatives of Tesla and Google. This past February, another well-known tech company hired him, and moved his family to the Bay Area this summer. Titericz described his unlikely journey recently over colorful plates of Nigerian food at the headquarters of his new employer, Airbnb.

Titericz earned, and holds, his number-one rank on a website called Kaggle that has turned data analysis into a kind of sport, and transformed the lives of some competitors. Companies, government agencies, and researchers post datasets on the platform and invite Kaggle’s more than one million members to discern patterns and solve problems. Winners get glory, points toward Kaggle’s rankings of its top 66,000 data scientists, and sometimes cash prizes.

Ryan Young for Wired

Alone and in small teams with fellow Kagglers, Titericz estimates he has won around $ 100,000 in contests that included predicting seizures from brainwaves for the National Institutes of Health, the price of metal tubes for Caterpillar, and rental property values for Deloitte. The TSA and real-estate site Zillow are each running competitions offering prize money in excess of $ 1 million.

Veteran Kagglers say the opportunities that flow from a good ranking are generally more bankable than the prizes. Participants say they learn new data-analysis and machine-learning skills. Plus, the best performers like the 95 “grandmasters” that top Kaggle’s rankings are highly sought talents in an occupation crucial to today’s data-centric economy. Glassdoor has declared data scientist the best job in America for the past two years, based on the thousands of vacancies, good salaries, and high job satisfaction. Companies large and small recruit from Kaggle’s fertile field of problem solvers.

In March, Google came calling and acquired Kaggle itself. It has been integrated into the company’s cloud-computing division, and begun to emphasize features that let people and companies share and test data and code outside of competitions, too. Google hopes other companies will come to Kaggle for the people, code, and data they need for new projects involving machine learning—and run them in Google’s cloud.

Kaggle grandmasters say they’re driven as much by a compulsion to learn as to win. The best take extreme lengths to do both. Marios Michailidis, a previous number one now ranked third, got the data-science bug after hearing a talk on entrepreneurship from a man who got rich analyzing trends in horseraces. To Michailidis, the money was not the most interesting part. “This ability to explore and predict the future seemed like a superpower to me,” he says. Michailidis taught himself to code, joined Kaggle, and before long was spending what he estimates was 60 hours a week on contests—in addition to a day job. “It was very enjoyable because I was learning a lot,” he says.

Michailidis has since cut back to roughly 30 hours a week, in part due to the toll on his body. Titericz says his own push to top the Kaggle rankings, made not long after the birth of his second daughter, caused some friction with his wife. “She’d get mad with me every time I touched the computer,” he says.

Entrepreneur SriSatish Ambati has made Kagglers a core strategy of his startup, H2O, which makes data-science tools for customers including eBay and Capital One. Ambati hired Michailidis and three other grandmasters after he noticed a surge in downloads when H2O’s software was used to win a Kaggle contest. Victors typically share their methods in the site’s busy forums to help others improve their technique.

Related Stories

H2O’s data celebrities work on the company’s products, providing both expertise and a marketing boost akin to a sports star endorsing a sneaker. “When we send a grandmaster to a customer call their entire data-science team wants to be there,” Ambati says. “Steve Jobs had a gut feel for products; grandmasters have that for data.” Jeremy Achin, cofounder of startup DataRobot, which competes with H2O and also has hired grandmasters, says high Kaggle rankings also help weed out poseurs trying to exploit the data-skills shortage. “There are many people calling themselves data scientists who are not capable of delivering actual work,” he says.

Competition between people like Ambati and Achin helps make it lucrative to earn the rank of grandmaster. Michailidis, who works for Mountain View, California-based H2O from his home in London, says his salary has tripled in three years. Before joining H2O, he worked for customer analytics company Dunnhumby, a subsidiary of supermarket Tesco.

Large companies like Kaggle champs, too. An Intel job ad posted this month seeking a machine-learning researcher lists experience winning Kaggle contests as a requirement. Yelp and Facebook have run Kaggle contests that dangle a chance to interview for a job as a prize for a good finish. The winner of Facebook’s most recent contest last summer was Tom Van de Wiele, an engineer for Eastman Chemical in Ghent, Belgium, who was seeking a career change. Six months later, he started a job at Alphabet’s artificial-intelligence research group DeepMind.

H2O is trying to bottle some of the lightning that sparks from Kaggle grandmasters. Select customers are testing a service called Driverless AI that automates some of a data scientist’s work, probing a dataset and developing models to predict trends. More than 6,000 companies and people are on the waitlist to try Driverless. Ambati says that reflects the demand for data-science skills, as information piles up faster than companies can analyze it. But no one at H2O expects Driverless to challenge Titericz or other Kaggle leaders anytime soon. For all the data-crunching power of computers, they lack the creative spark that makes a true grandmaster.

“If you work on a data problem in a company you need to talk with managers, and clients,” says Stanislav Semenov, a grandmaster and former number one in Moscow, who is now ranked second. He likes to celebrate Kaggle wins with a good steak. “Competitions are only about building the best models, it’s pure and I love it.” On Kaggle, data analysis is not just a sport, but an art.

Tech

Consumer goods firms harness online data to tap Southeast Asia e-commerce boom

SINGAPORE/BANGKOK (Reuters) – When diaper maker DSG International (Thailand) wants to know what its customers are thinking, it often turns to Lazada, an e-commerce firm majority-owned by Alibaba Group Holding (BABA.N).

FILE PHOTO: The Singapore Lazada website is seen in this illustration photo June 20, 2017. REUTERS/Thomas White/Illustration/File Photo

“From (their) data, we know mothers sometimes browse at night, so we can offer flash sales when we know customers are browsing,” says Ambrose Chan, the Thai company’s CEO.

Southeast Asia is the world’s fastest-growing internet market, home to 600 million consumers from Vietnam to Indonesia via Singapore, many of them tech- and social media-savvy. They are rapidly spending more time and money online. A Nielsen study in 2015 estimated Southeast Asia’s middle-class will hit 400 million by 2020, doubling from 2012.

Gross merchandise value of ecommerce in Southeast Asia will balloon to $ 65.5 billion by 2021, from $ 14.3 billion last year, predicts consultancy Frost & Sullivan.

Research firm Euromonitor forecasts internet retailing in Indonesia, for example, will more than double to $ 6.2 billion by 2021, and Thailand will increase 85 percent to $ 2.8 billion.

(For a graphic on Southeast Asia internet sales click reut.rs/2l3qULe)

Consumer goods firms, such as Unilever (UNc.AS) and Japanese cosmetics firm Shiseido (4911.T), say the e-commerce boom allows them to push deeper into markets that can otherwise be difficult to understand and tough to penetrate due to poor retail networks and infrastructure.

“Data from Lazada has been used to position certain products where consumer preferences are different. For example, Thai customers like to buy diapers in special cartons, while Malaysians prefer multiple packs,” says Chan.

To reach more customers and get a better handle on their online behavior, consumer goods companies are forging partnerships with e-commerce firms like Lazada and fashion website Zalora.

POWERFUL, INSIGHTFUL

A customer who clicked on a 50 milliliter product may instead buy a smaller 30 ml product, said Pranay Mehra, vice president, digital and e-commerce at Shiseido Asia Pacific, noting that data and online selling experience can help firms bundle offers, decide on packaging and distribution, and influence where to set up a physical presence.

“This data is very powerful and very insightful, if used properly,” Mehra added.

Unilever, whose products range from Hellmann’s mayonnaise to Dove soap, said it is seeing more demand from rural consumers in developing markets like Indonesia and Vietnam.

RedMart’s President Vikram Rupani poses at their fulfillment centre in Singapore September 22, 2017. Picture taken September 22, 2017. REUTERS/Edgar Su

“With all our e-commerce partners, we’re using data to help us find innovative solutions to unlock key barriers of high cost delivery and poor credit card penetration in remote areas,” said Anusha Babbar, e-commerce director at Unilever Southeast Asia and Australasia.

The conglomerate, which works with the likes of Singapore online grocer RedMart, Indonesia’s Blibli and Vietnam’s Tiki, said it introduced its St Ives skincare brand on Lazada after seeing a trend towards natural products and shopper search data.

DATA AND LOGISTICS

“Traditional retailers will struggle to see customer behavior,” said Lazada Thailand’s CEO, Alessandro Piscini. “We can tell if a customer is pregnant from their search behavior.”

Slideshow (10 Images)

Lazada, he said, plans to use data science to help its merchants customize offers for specific customer groups based on age, gender and other preferences.

Zalora, which sells clothing and accessories online in markets including Singapore, Malaysia and Indonesia, said it was working on ad-hoc projects with some brands to help them understand their customers based on data.

Lazada and Zalora are among the few e-commerce platforms that operate in multiple Southeast Asian countries. But the region is becoming a new battleground as Amazon (AMZN.O) and JD.com (JD.O) make beachheads in Singapore and Thailand.

Lazada Thailand will focus on partnering with fast-moving consumer goods companies to maintain its lead, Piscini said, and is expanding its logistics footprint across a region that has poor roads, clogged cities and thousands of often remote islands.

To be sure, online still contributes a tiny portion to consumer goods companies’ sales, but some local firms are going beyond partnerships and investing in their own e-commerce capabilities.

Thailand’s top consumer goods manufacturer Saha Group (SPI.BK) (SPC.BK) has seen online sales of some of its brands rise tenfold since it began a partnership with Lazada in June, but online still represents just 1-2 percent of total sales.

Saha is using e-commerce data to customize offerings.

“We now make real-time offerings to customers. Before, promotions would be seasonal,” Chairman Boonsithi Chokwatana told Reuters.

The company, whose products include instant noodles, toothpaste and laundry detergent, is investing 2 billion baht ($ 60 million) in logistics to support its e-commerce ambitions, including a 21-storey warehouse and a big data team, he said.

Reporting by Aradhana Aravindan in SINGAPORE and Chayut Setboonsarng in BANGKOJK; Editing by Ian Geoghegan

Our Standards:The Thomson Reuters Trust Principles.

Tech

Data center construction increases thanks to the cloud

A new report from a real estate firm that specializes in data center construction and leasing says data center construction in North America is up 43 percent over the same period in 2016, and industry consolidation has driven $ 10 billion in mergers and acquisitions (M&A) so far. 

Jones Lang LaSalle just published its report on the North America data center market, highlighting trends such as consolidation, enterprise hybrid cloud, security, and high-performance computing.

+ Also on Network World: Ireland the best place to set up a data center in the EU +

While construction continues at a record clip, the report also found that absorption of data center space available for lease has returned to normal levels after record leasing in 2016. So many of the cloud providers are still digesting the capacity they picked up last year.

To read this article in full or to leave a comment, please click here

Network World Cloud Computing

IDG Contributor Network: Has big data reached a tipping point in the cloud?

There is no doubt that big data analytics is fast becoming integral to business intelligence. Besides many initial failed projects, primarily due to the massive infrastructure needed to store, process and analyze big data in-house, there is an increasing number of success stories. This gives pause to completely discount the paradigm.

Moving big data analytics to the cloud seems to accompany these successes. It is impossible to ignore the competitive edge gained by organizations leveraging big data analysis. From real-time data analytics facilitating industrial processes to financial trading algorithms, big data is a definitive part of the corporate future.

To read this article in full or to leave a comment, please click here

CIO Cloud Computing

Amazon, VMware rumored to be developing data center software

Amazon Web Services (AWS) and VMware are reportedly in talks about possibly teaming up to develop data center software products, according to The Information, which cited anonymous sources.

Unfortunately, the article doesn’t have much if any detail on what that product would be. The speculation is it might be a stack-like product, since VMware already provides what would be the base software for such a product and stacks are becoming the in thing.

Already there is OpenStack, the open-source product that runs cloud services in a data center, and Microsoft just shipped Azure Stack, its answer to OpenStack that will allow the same features of its Azure public cloud to run within a company’s private data center.

To read this article in full or to leave a comment, please click here

Network World Cloud Computing

IDG Contributor Network: Data at the edge: the promise and challenge

What happens when cloud computing goes away? A bold—and perhaps surprising—question. But one that Peter Levine, general partner at Andreessen Horowitz, didn’t shy away from asking during a presentation at the VC firm’s a16z Summit in 2016.

Just as the distributed client server model that took off in the 1980s replaced the centralized mainframes of the 1960s and 1970s, distributed edge intelligence will replace today’s centralized cloud computing, Levine predicts. And he believes this change is already underway but will really take off beginning in 2020.

“Everything that’s ever popular in technology always gets replaced by something else,” Levine said. “It always goes away. That’s either the opportunity or the beauty of the business.”

To read this article in full or to leave a comment, please click here

InfoWorld Cloud Computing

IDG Contributor Network: Got big data? Go public cloud or go home.

The impact the cloud is having on big data and advanced analytics is shocking. We’ve hit a go public or go home situation – and while many enterprises I’ve spoken to about this migration are still on the fence, they understand they need to invest in more public cloud to engage with empowered customers. The problem is many are struggling with organizational momentum and regulatory issues that often manifest in technical objections that don’t hold water.

Public cloud was the number one priority for big data in 2016. Why? Because firms are running into a cost wall as they scale out their one premise infrastructures. They want to go bigger and faster and on premise configurations, including the on-premise portion of hybrid, but can’t keep the pace. The consensus in the industry is that hybrid is the best most can do – I disagree. Firms should have a public-first policy and rely on hybrid or on premise as interim measures only when necessary.

To read this article in full or to leave a comment, please click here

CIO Cloud Computing

IDG Contributor Network: Accelerating Organizational Velocity through a Data Center Autopilot

Understanding the impact of the data center autopilot

Current state of the art and my disappointment with traditional databases aside, I mentioned in my comments last week that the data center autopilot will have big consequences. It seems to me that there is not enough recognition of the likely impact. The tactical observations are that automation will reduce people costs, at least on a per-workload basis, and that automation will:

  • Minimize over-provisioning,
  • Help reduce downtime,
  • Help to manage SLAs, and
  • Improve transparency, governance, auditing and accounting.

That is all true, but it’s not the big story: The overall strategic impact is to significantly accelerate organizational velocity. The acceleration is partly as a result of the above efficiencies, but much more importantly as a consequence of automated decisions being made and implemented orders-of-magnitude faster than manual decisions can be. Aviation autopilots do things that human pilots are not fast enough to do. They are used to stabilize deliberately unstable aircraft such as the Lockheed F-117 Nighthawk at millisecond timescales, and deliver shorter flight times by constantly monitoring hundreds of sensors in real time and optimally exploiting jetstreams.

To read this article in full or to leave a comment, please click here

CIO Cloud Computing

IDG Contributor Network: Data migrations without the migraines

Whether or not you work in the IT department, you have likely experienced the pain of migrating from one system to another. When you buy a new laptop, or a new phone, you’re faced with having to backup and replicate your old data to your new system, or start from scratch with none of the files you might need on your new device.

Imagine this problem at enterprise scale. Moving terabytes of data is a daunting task that also requires planning and downtime when IT has to add a new storage system,  upgrade or replacement. Just like with our smartphones, the old system likely still has some value, but since data can’t move easily from one system to the other, the equipment we’re leaving behind often remains as a backup to the backup copy.

To read this article in full or to leave a comment, please click here

Network World Cloud

IDG Contributor Network: Data Center Automation and the Software-Defined Database

If you take a step back for a moment and think about airplane flight, it turns out that something rather extraordinary is happening. Most of the time the plane is being flown by an autopilot and the pilot is actually kind of a “meta pilot” – a minder that watches to ensure that the autopilot is not doing anything dumb. And every year, millions of us entrust our lives to this system – we’re not only okay with it, we’re in fact impressed that an auto-pilot can do that stuff so effectively. Against that backdrop, now consider how extraordinary it is that we don’t have computer software that can “fly” a data-center. Don’t bet that it will stay that way. It is changing, and the changes are going to have big consequences.

To read this article in full or to leave a comment, please click here

CIO Cloud Computing

OneLogin hack exposed sensitive US customer data and ability to decrypt data

OneLogin, an identity management company which provides a single sign-on platform for logging into multiple apps and sites, was hacked. US customer data was potentially compromised,“including the ability to decrypt encrypted data.”

The company, which claims “over 2000+ enterprise customers in 44 countries across the globe trust OneLogin,” announced the security incident on May 31. It was short on details, primarily saying the unauthorized access it detected had been blocked and law enforcement was notified.

To read this article in full or to leave a comment, please click here

Network World Cloud

How Cosmos DB ensures data consistency in the global cloud

Cloud computing isn’t like working on-premises. Instead of limiting code to one or maybe two datacenters, we’re designing systems that span not just continents but the entire world.

And that’s where we start to get issues. Even using fiber connections, the latency of crossing the Atlantic Ocean is around 60ms, though in practice delays are around 75ms. The Pacific is wider, so latency through trans-Pacific fiber is around 100ms.

Delays add up, and they make it hard to ensure that distributed databases are in sync. That makes it harder still to be sure that a query in the U.K. will return the same result as one in the U.S. Yes, most replication strategies mean that eventually the two will have the same content, but there’s a big question over just when that will happen. If the connections are busy, or there a lot of database writes, data can easily get delayed.

To read this article in full or to leave a comment, please click here

InfoWorld Cloud Computing

Cloud can ease burden of data protection compliance, Google execs say

The EU’s General Data Protection Regulation (GDPR) is fast approaching but with significant resource investment required, many organisations are struggling to meet the May 2018 deadline. According to Google executives, moving data to the cloud will help take some of the pain out of upgrading security practices and data protection standards in line with the regulations.

GDPR is the biggest change to data protection regulations in two decades, and is a major challenge for many businesses. A survey from analyst firm Gartner released yesterday showed that around half of those affected by the legislation – whether in the EU or outside – will not be in full compliance when the regulations take effect.

To read this article in full or to leave a comment, please click here

CIO Cloud Computing

IDG Contributor Network: Modern monitoring is a big data problem

Why did VMware acquire Wavefront? The start of the answer to this question comes with an understanding of what Wavefront is (or was). Wavefront was started by former Google engineers who set out to build a monitoring system for the commercial market that had the same features and benefits as the monitoring system that Google had built for itself.

Due to the massive scale of Google, such a system would have to have two key attributes:

  1. The ability to consume and process massive amounts of data very quickly. In fact, the Wavefront website make the claim, “Enterprise-grade cloud monitoring and analytics at over 1 million data points per second.”
  2. The ability to quickly find what you want in this massive ocean of data

So, it is clear that the folks at Wavefront viewed modern monitoring to be a big data problem, and it is clear that some people at VMware were willing to pay a fair amount of money for a monitoring system that took a real-time and highly scalable approach to monitoring.

To read this article in full or to leave a comment, please click here

Network World Cloud Computing

IDG Contributor Network: Why CEOs believe problem solving through data is the future of business

A recent statistic caught my eye: according to a study by IDG Enterprise, 53 percent of IT decision-makers report that their company is planning to implement a data-driven project, and specifically – a project with the goal of generating greater value from existing company data within the next year. [1] Within the same survey, 78% of respondents report feeling strongly that the analysis of big data could potentially fundamentally change the way their company does business, and 71 percent feel strongly that data will create new revenue streams and lines of business within three years. [1]

Now what makes this interesting to me is that at the same time, 90 percent of these same IT decision-makers surveyed report having directly experienced challenges in the use of such data – the same data that they predict will revolutionize the way their company looks at everything from business performance and revenue streams, to supply chain and HR. The major pain points are cited as data access and analysis (38 percent)), followed by data transformation (17 percent), data creation/collection (13 percent)), and migration issues (13 percent)). [1] At the same time as advocating data project spend, they have almost across the board experienced challenges in making their projects a successful reality.

To read this article in full or to leave a comment, please click here

CIO Cloud Computing