Bidisha Mondal - Developer Portfolio

Case Study - Fake News Detection

A machine learning model that can accurately identify and categorize news articles from various news and social media platforms as either fake or legitimate.

Industry
Computer Software
Year
Service
Predictive Analysis

Overview

In today's digital era, the rapid spread of information across various online platforms has made fake news a major problem. Our project utilizes machine learning algorithms to automatically assess the authenticity of news articles, offering a crucial resource for fighting misinformation.

Fake news analyser and predictor is an academic project focusing on creating a machine learning model that can accurately identify and categorize news articles from various news and social media platforms as either fake or legitimate. The spread of fake news can have significant negative impacts on individuals and society. To address this, we are developing and training a model using a wide-ranging dataset of news articles.

We have employed four different methods to evaluate the performance of the model.

  1. Logistic Regression
  2. Decision Tree Classifier
  3. Random Forest Classifier
  4. Naive Bayes Classifier

Implementation

We are utilizing a labeled dataset that includes news articles and their respective classifications (true or false).

This dataset is organized into two categories -

True: Authentic news articles False: Fake or fabricated news articles

The implementation of the fake news detection system begins with importing essential libraries such as Pandas, Matplotlib, NumPy, Seaborn, and several modules from Scikit-Learn. These libraries facilitate data manipulation, visualization, and the building of machine learning models. Following this, the dataset containing fake and real news is loaded using Pandas' read_csv function. The datasets are then previewed, revealing their structure and content. Subsequently, a new column labeled 'class' is added to both datasets, with fake news assigned a class of 0 and real news a class of 1.

Once the data is prepared, it is combined into a single dataframe and shuffled to ensure a random distribution of fake and real news. The text data undergoes preprocessing, which includes cleaning and transforming the text to a format suitable for machine learning algorithms. This involves removing punctuation, converting text to lowercase, and applying tokenization. The processed text data is then vectorized using TF-IDF vectorization, converting the text into numerical features that can be used to train the models. The dataset is split into training and testing sets to evaluate the performance of the models.

Several machine learning models are then trained on the vectorized data, including Logistic Regression, Decision Tree, Random Forest, and Naive Bayes classifiers. Each model is evaluated based on its accuracy and other performance metrics. The implementation includes functions to predict the class of manually entered news articles, utilizing the trained models to classify new inputs as fake or real. This comprehensive approach ensures that the system can effectively detect fake news by leveraging multiple machine learning techniques and thorough data preprocessing.

Technologies

Numpy

Jupyter

Pandas

Matplotlib

Scikit-Learn

Git Cli

Github

More Applications

main*
Go Live