Tutorials

1. Large-scale Knowledge Graphs in Information Fusion Applications

Monu Kedia and Srikanta Bedathur Jagannat

IBM Research Lab, New Delhi.

Abstract

Large-scale Knowledge Graphs in Information Fusion Applications Information mash-up or information integration is changing at a fundamental level. Fusing information from both structured (possibly intra-organization) sources as well as unstructured and external sources such as social media has become ever more crucial. It is neither practical no re cipient to build application and do analytics with data residing in many di fferent sources in isolation. The search for a canonical representation that can glue together these disparate sources has led to the use of Knowledge graph representations that model relationships between real-world entities for large-scale, flexible information fusion. Knowledge graphs are property graphs with types and attributes associated with nodes and edges. The representation is able to express the real-world entities as well as relationships in strikingly simple yet powerful way. Figure 1 shows a simple knowledge graph in an enterprise setting modeling the company, employees, customer and products. This tutorial is targeted to give a broad overview of knowledge graph based computing to the audience and enable her to apply it in real world scenarios. The subsequent sub-sections discuss various details of the proposal.

2. Applications of Data Mining to detection of frauds in various domains

Girish Keshav Palshikar

Systems Research Lab., Tata Consultancy Services.

Abstract

Frauds are financial crimes which are widespread in various domains, such as credit cards, telecom, banking, insurance, stock markets, banking and in general, in financial accounting.Money laundering is another kind of financial crime aimed at converting “dirty money” into“clean” money. Frauds cause a lot of financial loss to various organizations and also lead to erosion of their credibility among customers. Frauds are usually white-collar crimes that misuse the limitations and weaknesses in the business processes to obtain money, goods or services. This tutorial will begin by conveying the nature, motivation and scale of frauds in various domains. We will then describe example typologies of frauds in some particular domains. We will outline features of IT systems and business processes for detecting, preventing and controlling frauds. We will then identify challenges in developing and deploying effective fraud detection algorithms and cover some specific analytics techniques reported for detection of particular types of frauds. We will then present our conclusions and outline opportunities for research in fraud detection The tutorial will focus primarily on frauds in stock markets and money laundering.

3. Processing Spatio-Temporal Data On Map-Reduce

Himanshu Gupta and Sriram Lakshminarsi

IBM Research Lab India.

Abstract

The amount of spatio-temporal data generated in numerous scientific and industrial settings have exploded in recent years. Without a distributed platform, supporting efficient analytics on such voluminous data become prohibitively expensive. As a result there has been a increasing interest in using map-reduce to parallelize the processing of large-scale spatio-temporal data. While Hadoop, which is currently the de-facto map-reduce implementation, has shown to be effective in processing large volumes of unstructured data, several key issues needs to be addressed to exploit its power for processing spatio-temporal data. In this tutorial, we explore the design techniques for processing spatiotemporal data on map-reduce, based on recent work in this area. We outline the strategies for devising map-reduce algorithms for performing fundamental spatial analytics involving computational geometry operations, as well as two-way and multi-way spatial join operations. We discuss the storage aspects inovlved in efficient spatio-temporal data organization on HDFS such as partitioning and colocation, along with indexing for fast spatial data access.

4. Location-aware Review Analytics: A Big Data Perspective

Dhaval Patel and Sahisnu Mazumder

IIT Roorkee.

Abstract

Large amount of location-aware reviews are accumulating on the web in recent years due to the escalating popularity of Location-based Social Networking platforms like Yelp, Foursquare, Brightkite etc among users. By analyzing the location-aware review data, we can discover various location-specific knowledge and develop myriads of location-aware recommendation applications. But, processing of the large-scale review data using NLP techniques is not an easy task as it requires longer time in the knowledge discovery process. We can leverage big data analytics platforms like Hadoop and the concept of Map-reduce programming to process these data in real-time.

5.Text Analytics in Big Data Environment

K. M. Annervaz, Anutosh Maitra and Shubhashis Sengupta

Accenture Technology Labs.

Abstract

It is an established fact that 80% of the enterprise and social data, from which actionable intelligence can be potentially extracted, are in unstructured format – mainly in images, video, voice and text. While the Big Data environments are traditionally focused on working with large and often structured data sets for analytics applications; the processing of unstructured textual data for extracting analytical insights is increasingly gaining importance. In this tutorial, we will focus on text analytics with Big Data; touching upon issues related to textual data collection and aggregation, data preprocessing, data consolidation, analytics and visualization. The tutorial will discuss the core technologies of Natural Language Processing, Semantic Data Aggregation and Processing, Common Text Processing Pipeline Architecture, Text Processing with Hadoop technology ecosystem, Common Text Analytics Algorithms, Inference and Reasoning from Text, and Visualization. The tutorial will have a balanced mix of theory and practice with examples and demonstrations. Various industry domains and applications will be discussed - especially in the area of medical text processing.