Factors in R With Examples

In this tutorial, we’ll move on to understanding factors in R programming. One operation we perform frequently in data science is the estimation of a variable based upon the model we built. We are sometimes required to estimate the price of a share or a house, and sometimes we need to estimate what color car is likely to be sold the fastest.

Variables in data science fall under two categories – continuous and categorical. Continuous variables are those that can take numerical values including floating points. Prices of houses or shares, quantifiable variables like age, weight or height of a person are all continuous variables.

On the other hand, categorical variables take a set of fixed values that can be represented using a set of labels. Examples for this category as marital status, gender, the color of the vehicle, the highest educational degree of a person and so on.

Categorical variables are represented using the factors in R.

Creating Factors in R

Factors can be created using a factor() function.

The first argument to factor function is the vector x of values that you wish to factorize. Note that you cannot create a factor using a matrix. X should always be a single-dimensional vector of character strings or integer values.

Secondly, you need to supply the list of levels you need in the factor. Levels is a vector of unique values used in the factor. This is an optional argument.

The third argument is labels. Sometimes when you encode the variables as a vector of integers, you need to specify what integer represents what label. You could use 0 and 1 to represent male and female, but you need to specify that using these labels. So basically this is the key for looking up the factors.

Finally, you have a Boolean valued argument is.ordered. Sometimes you may wish to retain the order amongst the factors used. For example, you may encode the month of joining using integers 1 to 12, to represent months from January to Decemeber. In these cases, you need to specify ordered to TRUE.

Let us look at examples of factors now.

Notice how the levels are automatically obtained from the vector’s unique values here. Let us try another example where we define male and female as 0 and 1 using labels.

Observe that the labels you have defined are displayed instead of 0 and 1 defined in the factor.

Ordering in Factors in R Programming

Let us work another example using the ordering of factor levels. Let us first define a vector representing the month of joining for 8 employees.

Now, there is no way for the compiler to know that May comes before Jun in the order of months. So the following code throws FALSE.

To impose ordering, we need to define a vector with all the months in order first.

Now create a factor for our data using our moj vector, set the levels to ordermonths and set the argument ordered to TRUE.

Now factormoj displays as follows.

The compiler now knows the ordering among the months. Let us check if it knows that May comes before June.

Modifying Factors

Each element of factor can be assigned a value individually using indexing, just like we index vectors. Let us modify a value from the genfactor we created earlier in the tutorial.

We’ll continue with the same variable from before, genfact to make things easier for you.

Adding New Levels to Factors

To add a new level to a factor, which hasn’t been defined earlier, you just need to modify the levels vector in the following manner. Let’s try this on our existing genfact variable.

You can now modify the factors to the newly defined level “Other” as well.

By admin

Leave a Reply

%d bloggers like this: