Publications | Tuhin Sharma

A Generalized Framework for Quantifying Trust of Social Media Text Documents

Sat, 16 Jul 2016 00:00:00 +0000

Abstract

Social media has become a very popular place for users seeking knowledge about a wide variety of topics. While it contains many helpful documents, it also contains many useless and malicious documents or spams. For a casual observer it is very hard to identify high quality or trustworthy documents. As the volume of such data increases, the task for identifying the trustworthy documents becomes more and more difficult. A huge number of research works have focused on quantifying trust in certain specific social network domains. Some have quantified trust based on social graph. In this work, we use such social graph named Reduced node Social Graph with Relationships (RSGR) and we develop a three-step syntax and semantic based trust mining framework. Here we generalize the concept of trust mining for all structured as well as unstructured unsupervised text documents from all social network domains. We calculate trust based on metadata, trust based on relationships with other documents and finally we propagate the trust calculated so far along various relationship edges to calculate the final trust. Finally we show that our method calculates the trust of social media text documents with more than 80% accuracy.

A Generalized Relationship Mining Method for Social Media Text Data

Mon, 21 Jul 2014 00:00:00 +0000

Abstract

Increasing popularity of Social Media has resulted in the creation of a huge amount of user generated documents. A large number of research works have focused on inferring relationship in certain specific social network domains. Few have considered structured data to establish syntax based relationship. In this work, we develop a two-step syntax based and semantic based relationship mining approach. Here we generalize the concept of relationship mining for all structured as well as unstructured unsupervised text documents from all social network domains. At first, we choose suitable features from individual document and store them in graph structure. Then we establish relationships in the graph generated to obtain Reduced node Social Graph with Relationships (RSGR). Our empirical study on various social media document validates the effectiveness of our approach and suggests its generality in finding relationships irrespective of the type of text documents and the social network domains.

Dynamic Network Traffic Data Classification for Intrusion Detection Using Genetic Algorithm

Thu, 20 Dec 2012 00:00:00 +0000

Abstract

Intrusion Detection System (IDS) classifies network traffic data either (anomaly( or (normal( to protect computer systems from different types of attacks. In this paper, data mining concepts and genetic algorithm have been applied to classify online traffic data efficiently by developing a rule based lazy classifier. The proposed method updates the rule set dynamically to accommodate the changing pattern in the traffic data in order to attain highest classification accuracy and at the same time maintaining consistency. The classifier is able to detect variants of common network traffic data patterns or modified existing security attacks based on the knowledge gained from its existing training data set with significant classification accuracy.

Generation of Sufficient Cut Points to Discretize Network Traffic Data Sets

Thu, 20 Dec 2012 00:00:00 +0000

Abstract

Classification accuracy and efficiency of an intrusion detection system (IDS) are largely affected by the discretization methods applied on continuous attributes. Cut generation is one of the methods of discretization and by applying variable number of cuts (in a partition) to the continuous attributes, different classification accuracy are obtained. In the paper to maximize accuracy of classifying network traffic data either ‘normal’ or ‘anomaly’, the proposed algorithm determines the set of cut points for each of the continuous attributes. After generation of appropriate and necessary cut points, they are mapped into corresponding intervals following centre-spread encoding technique. The learnt cut points are applied on the test data set for discretization to achieve maximum classification accuracy.