Binary variables

< Types of variables < Binary variables

binary variables
An example of a binary variable: red dice / white dice.

A binary variable has only two possible values.

For example:

  • Yes / No
  • Success / Failure
  • Male / Female
  • The drug works / the drug doesn’t work
  • The light is on / the light is off
  • The email is spam / the email is not spam.

Use of binary variables

binomial variables in a graph
Graph of two Bernoulli distributed binary random variables.

Binary variables are often found in statistical analysis and in probability distributions, such as in the Bernoulli distribution or binomial distribution, but the term “binary variable” isn’t often used. You’ll probably see specific names such as “binomial random variable” or “Bernoulli random variable” instead. A random variable is the outcome of an experiment — in both of these cases, an experiment with just two possible outcomes.

In computer science, a binary variable is equivalent to a “bit”; in mathematical logic, it’s called a “truth value.” These are different terms for the same concept, similar to how statisticians refer to a Bell curve as a “Normal Distribution,” while physicists call the same thing a “Gaussian distribution.”

Opposite and Conjunct binary variables

Binary variables can be categorized into two types: opposite and conjunct.

  • Opposite binary variables are complete opposites, like “Win” and “Lose.” Something either works, or it doesn’t. There’s no in-between.
  • Conjunct binary variables are more of a grey area. For example, in U.S. politics, you can affiliate with either the Democrats or Republicans. However, most people aren’t strictly one party or the other. It’s common for individuals to switch parties, agree with some ideas from the opposing party, or even identify with an entirely different party, like the Green Party.

Opposite and conjunct binary variables aren’t that common, but you might come across them in a few narrow situations, such as:

  • Opposite binary variables:
    • In medical research, these variables often represent the effectiveness of treatments. For instance, a researcher might use an opposite binary variable to indicate whether a new drug successfully treats cancer.
    • In marketing, they often represent customer behavior. A marketer might use an opposite binary variable to predict if a customer is likely to buy a product.
    • In finance, they often represent investment risks. A financial analyst might use an opposite binary variable to predict whether a stock’s value will rise or fall.
  • Conjunct binary variables:
    • In political science, these variables often represent voter behavior. A political scientist might use a conjunct binary variable to predict if a voter is likely to vote for a specific candidate based on their political affiliation, age, and gender.
    • In education, they often represent student achievement. An educator might use a conjunct binary variable to predict if a student is likely to pass a test based on their grade level, previous test scores, and attendance record.
    • In business, they often represent customer churn. A business analyst might use a conjunct binary variable to predict if a customer is likely to cancel their subscription based on their length of service, number of purchases, and age.

Examples of binary dependent variable

A binary dependent variable can only take on values 0 or 1 at each observation; usually it’s coding qualitative (categorical) data. Qualitative data is non-numerical information that describes or characterizes attributes, properties, or phenomena. A few examples: married versus not married, approved for a credit card versus not approved [1].

Both independent and dependent binary variables can only take on two values, typically 0 and 1. However, there are some important distinctions between the two:

  • Independent binary variables are not affected by other model variables. In other words, their value remains constant regardless of the values of other variables in the model.
  • Dependent binary variables are influenced by other variables in the model. Their value is determined by the values of the independent variables in the model.

Independent binary variables are sometimes used to account for the impact of other variables. For example, if you’re studying the correlation between income and political affiliation, you might include a binary variable for “gender” in your model to account for its effects. This allows you to focus on the influence of income on political affiliation, without the interference of gender. On the other hand, dependent binary variables do not account for the effects of other variables. Their purpose is to forecast the value of the dependent variable based on the values of the independent variables.

Binary vs. Dummy variables

dummy variable takes on two values, such as 0 and 1; they are often used to code qualitative (categorical) variables during analysis. The terms dummy variable and binary variable are sometimes used interchangeably (e.g., [2]). However, they are not exactly the same thing.

You can turn a binary variable into a dummy variable. For example, the “comparison” group of a binary variable could be coded 0 and the “target” group coded 1. Thus, we can say that dummy variables are a type of binary variable. However, the opposite isn’t true because not all dummy variables are binary variables. For example, the variable “Number of children” could be represented with values of 0 (no children) and 1 (has children), but it is not a dummy variable because it doesn’t represent a categorical variable (it represents a numerical one).

Another example: when classifying race, you might code a dummy variable 1 as Caucasian, 2 as African American, 3 as Asian. This is not a binary variable as it has more than two options. But if your dummy variable has only two options, like 1 = Male and 2 = Female, then that dummy variable is also a binary variable.

References

[1] Models for a binary dependent variable

[2] 8.2 – The Basics of Indicator Variables

Scroll to Top