What is NLP?

Natural Language Processing (NLP) is a field at the intersection of computer science, artificial intelligence, and linguistics. The goal is for computers to process or “understand” natural language to perform tasks like Language Translation and Question Answering.

With the rise of voice interfaces and chatbots, NLP is one of the most important technologies of the information age a crucial part of artificial intelligence. Fully understanding and representing the meaning of language is an extremely difficult goal. Why? Because human language is quite special.

The field of artificial intelligence has always envisioned machines being able to mimic the functioning and abilities…

Understanding Encoders-Decoders, Sequence to Sequence Architecture in Deep Learning.

Translate from one language to another.

In Deep Learning, Many Complex problems can be solved by constructing better neural network architecture. The RNN(Recurrent Neural Network) and its variants are much useful in sequence to sequence learning. The RNN variant LSTM (Long Short-term Memory) is the most used cell in seq-seq learning tasks.

The encoder-decoder architecture for recurrent neural networks is the standard neural machine translation method that rivals and in some cases outperforms classical statistical machine translation methods.

This architecture is very new, having only been pioneered in 2014, although, has been adopted as the core technology…


XGBoost stands for “Extreme Gradient Boosting”. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements Machine Learning algorithms under the Gradient Boosting framework. It provides a parallel tree boosting to solve many data science problems in a fast and accurate way.

XGBoost is a software library that you can download and install on your machine, then access from a variety of interfaces. Specifically, XGBoost supports the following main interfaces:

  • Command Line Interface (CLI).
  • C++ (the language in which the library is written).
  • Python interface as well as a model in scikit-learn.

A time series forecasting series.

Holt-Winters forecasting is a way to model and predict the behavior of a sequence of values over time — a time series. Holt-Winters is one of the most popular forecasting techniques for time series.

It’s decades old, but it’s still ubiquitous in many applications, including monitoring, where it’s used for purposes such as anomaly detection and capacity planning.

Holt-Winters is a model of time series behavior. Forecasting always requires a model, and Holt-Winters is a way to model three aspects of the time series: a typical value (average), a slope (trend) over time, and a…

A time series is a sequence of observations taken sequentially in time.

Time series forecasting uses information regarding historical values and associated patterns to predict future activity. Most often, this relates to trend analysis, cyclical fluctuation analysis, and issues of seasonality.

Observation of trend, seasonality, and random

Why Time Series Forecasting?

If the independent variables are

  • Unknown
  • Not available
  • Might not fit the data
  • Difficult to forecast

Typical Time Series

Essential Skills you Need to know to start Doing Data Science.

Data science is ever-evolving, so mastering its foundational technical and soft skills will help us be successful in a career as a Data Scientist, as well as pursue advanced concepts, such as deep learning and artificial intelligence.

Data Science is such a broad field that includes several subdivisions:

  • Data Preparation and Exploration
  • Data Representation and Transformation
  • Data Visualization and Presentation
  • Predictive Analytics
  • Machine Learning….etc

Lie #01

Money is the root cause of all evil.

From an early age, thinking about money was not encouraged!

It was the unsaid rule that money is important, but thinking about it is evil. After all, money was the reason behind the fights, the wars, the disagreements.

Now I know, it is not money that is the cause. It is the importance we attach to it, in our lives. Money is simply a medium of transaction. when it becomes an emotion, is when it consumes us. And that is true for all things in life!.

Lie #2.

Be wary of those who are rich.

It was always assumed that getting rich was possible only…

A Decision Tree has many analogies in real life and turns out, it has influenced a wide area of Machine Learning, covering both Classification and Regression. Sometimes Decision trees are also referred to as CART, which is short for Classification and Regression Tree. In Decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making.

In This Blog, We’ll Cover the Following:

  • What is Decision Trees?
  • Types of Decision Trees
  • Key Terminology
  • How To Create a Decision Tree
  • Gini Impurity
  • Chi-Square
  • Information Gain
  • Applications of Decision Trees
  • Decoding the Hyperparameters
  • Coding the Algorithm
  • Advantages…

Association rules can be thought of as an If-Then Relationship.

ARM(Association Rule Mining) is one of the important techniques in data science. In ARM, the frequency of patterns and associations in the dataset is identified among the item sets then used to predict the next relevant item in the set. This ARM technique is mostly used in business decisions according to customer purchases.

Suppose item A is being bought by the customer, then the chances of item B being picked by the customer too under the same transaction ID is found out.

For example, People who buy diapers are likely…

  1. Spending Less Time in Understanding the data and EDA.
  2. Not Communicating well.
  3. Spending more time on Theory without practical application.
  4. Focusing on Accuracy over Understanding how the model works.
  5. Giving Preference to Tools over Business problems.
  6. Ignoring Outliers.
  7. Using L1, L2 Regularization without Standardization.
  8. No proper understanding on how to transform categorical variables.
  9. Not picking the right loss function.
  10. Not Focusing on the Distribution of data.
  11. Correlation Does Not Imply Causation.
  12. Assuming the Algorithms are more important then Domain Knowledge.


Jr Data Scientist | AI researcher

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store