Data Management and Manipulation

Brittany E. Hanson

3 Data Management and Manipulation

3.1 Setup/Edit Variables

In Variables view, select the row of the variable you would like to setup/edit and select Edit from the top ribbon or double-click on the row of the variable you would like to edit.

In Data view, select the column of the variable you would like to setup/edit and select Setup from the top ribbon or double-click the column/name of the variable you would like to edit.

3.1.1 Name

On the first line, enter the name of the variable. Do not use any spaces in your variable name. When you have a scale with multiple items, it is convenient to name them scale1, scale2, etc. For example, if your scale measures extraversion, you can name your items extravert1, extravert2, extravert3, extravert4.

Note. In jamovi you can technically use a variable name with a space. However, I strongly recommend using variable names without spaces to simplify other functions within jamovi. If you do have a variable name with a space, you will need to add single quotations to each variable when transforming or computing new variables, e.g., ‘Life Satisfaction’. For similar reasons, you should also avoid using single or double quotations in variable names.

3.1.2 Description

Include a variable description here. If the variable corresponds with an individual item/question, it is a good idea for the description to be the actual question you asked participants.

Jamovi variable editing window. The variable name is Happy_Import and the description reads: How accurate or inaccurate is the following statement: To have a meaningful life, I need to feel happy most of the time.

3.1.3 Measure Type

Indicate via the dropdown box what measurement type the variable is.

Nominal – a variable with groups that have no order (also called categorical).
Ordinal – a variable with groups that do have an order, but the distance between ranks is not equal
Continuous – a variable is continuous (interval or ratio), there is an order to the response options and the distance between each one is equal.
ID – variable is a participant ID number. The number itself is not meaningful other than as a way to identify the participants.

It is important that your variables have the correct measure type selected because jamovi will often limit how you can use a variable based on its measure type. Measure type can also change how an analysis is run (e.g., nominal versus continuous predictors in multiple regression).

Jamovi variable editing window. The variable name is Happy_Import. Under Measure type, a green arrow is pointing to Continuous.

Note. In psychology, we often use Likert-type scales as response options (e.g., a very much agree to very much disagree scale). Psychologists usually treat these variables as continuous. However, when you open data in jamovi, Likert scales are often identified as nominal. Therefore, it is important to check each variable has the correct measure type after opening a new dataset.

3.1.4 Data Type

Indicates whether the data are integers (whole numbers), decimals, or text. This option will auto update based on the type of data you input.

3.1.5 Missing Values

If you are using a specific value to indicate missing values (example: -99) you can specify that value here

3.1.6 Levels

In this box you can specify the different levels of the variable.

Nominal Variables: for nominal variables we often use values like 1 and 2 to indicate different groups, such as people living in the northeast vs. southwest. However, these numbers are arbitrary (i.e., 1 could be used to indicate people in the northeast or 1 could be used to indicate people in the southwest, it’s up to the researcher to decide how they want to code the variable). Therefore, it is important to add labels to the levels of your variable so that it is clear how you coded the variable.
Continuous Variables: Jamovi does not allow you to add labels to the levels for continuous variables

To add a label to a level, click on the number and type in a descriptive label.

Jamovi variable editing window. The variable name is Region. The cursor is next to the 1 under Levels.

Jamovi variable editing window. The variable name is Region. Under levels the response options are now labeled: Northeast (1), South (2), Midwest (3), West Coast (4)

Note. Although it is a best practice to avoid spaces in variable names, you can use spaces in level names without the same negative effects as variable names with spaces. For example, the fourth level of the Region variable is labeled West Coast.

3.2 Transforming Variables

Why? Sometimes we “reverse code” an item. That is, we ask a question where the answer indicates the opposite of the construct we are trying to measure. For example, the four items we used to measure subjective well-being are:

To what extent do you agree or disagree with the following statements?

I like how my life is going.
I am content with my life.
If I could live my life over, I would change many things.
I am satisfied with where I am in life right now.

If a participant selects strongly agree for the first item, this response would indicate that the participant has a high level of well-being (the construct we are interested in). The same is true for the second and fourth items. However, for the third item, “If I could live my life over, I would change many things”, if a participant selects strongly agree, that response would indicate a low level of well-being. When using multi-item scales such as the one above, psychologists usually average together the participants’ responses to have one number indicating the participants’ level of the construct (in this case, well-being). However, we cannot average together the responses to these items as is because higher numbers mean more well-being for three of the items but the opposite is true for one of the items. Therefore, before we can average the items together, we need to switch the direction of participants’ answers for the third item so the direction matches the rest of the items in the scale.

Other times, we want to recode responses into another meaningful variable. For example, suppose the item is a factual multiple-choice question where the second answer is the correct answer. In that case, we might want to recode that variable into a new variable, indicating the participant answered the question correctly (coded 1) or incorrectly (coded 0). We could then sum a set of variables to indicate how many questions the participant got correct (see section 3.4 Computing Variables).

There are a few ways you can create a transformed variable

In either Data or Analyses view, double-click on an empty column. Then select New Transformed Variable. You can then select the variable you want to transform in the dropdown menu next to Source variable.
Select an existing variable that you want to transform in either the variable view or data view. On the top ribbon, select Transform.

For this example, we will reverse-transform the third well-being item [Wellbeing_3].

Jamovi file in variable view with Wellbing_3 selected and the transform button highlighted.

Jamovi will automatically change the name to Wellbeing_3 (2). However, it’s good practice to give the variable a more meaningful name and a description. You can rename your variable first, or you can add a renaming convention to your transformation rule (which is what we will do here).

Next, you will need to tell jamovi how you want to transform the variable. From the using transform menu, select Create new transform.

Transform variable menu. Variable is named Wellbeing_3 (2). Under the using transform menu, create new transform is selected

A new transform menu will appear. You can name and provide a description of the transformation. This is helpful if you have multiple items that need to be transformed in the same way. For example, if there are multiple items that need to be reverse transformed and they use the same scale (e.g., a 7-point scale). Once you create a transform rule, you can reuse it on other variables.

You can also add a suffix to the rule. If you add a suffix, the current variable and any future transformed variable that you apply this rule to will automatically add the suffix to the original variable’s name to create the name for the transformed variable. Here we are using “_R”. I recommend using the “_” because it will keep the formatting neater.

Transform rule menu. The name of transformation rule is Reverse7. The description is Reverses 7 point scales. The suffix box has _R.

Next, you will need to click + Add recoded condition.

Two windows will appear: if $source and else use. These use “if/then else” logic.

Transform rule sub-menu

You will need one if $source window for each level in your response scale. For the Wellbeing_3 item, participants responded on a 7-point scale, so we will need to add 7 recode conditions and therefore need 7 if $source windows.

In each recode condition, you will tell jamovi how to recode each point the 7-point scale. See example below

Seven if $source statements: 1) if $source == 1 use 7 2) if $source == 2 use 6 3) if $source == 3 use 5 4) if $source == 4 use 4 5) if $source == 5 use 3 6) if $source == 6 use 2 7) if $source == 7 use 1. One else use statement that says else use NA

Once you have set up your transformation, click the ↓ button to return to the transformed variable menu and then the ↑ to close the transformed variable menu. You will see that your new variable’s values automatically update.

Important notes

In jamovi, you need to use two = signs (i.e., ==) to specify that if the source variable equals [X], then recode it to [Y]. You can also use other logical operators like > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to), or != (not equal to), depending on the type of transformation you would like to use.
Technically, the last recode condition (if $source == 7 use 1) is not necessary, we could have used “else use 1”. However, there are two good reasons to add “if $source == 7 use 1” and then add “NA” (i.e., mark it as missing data) to the else use window: 1) If you have any missing data (which we often do) this coding technique will make sure they remain missing and are not coded 1, 2) If there are any unexpected values in the data set due to error, using this technique means those error values will also be coded as NA.

3.3 Reliability Analysis

Before calculating a new variable that represents a set of questions, you must make sure the individual items are measuring the same construct. One way of doing this is to test whether the items are all inter-correlated with each other. The statistic Cronbach’s Alpha tells us how inter-correlated a set of items are.

Analyses tab → Factor → Reliability Analysis

Jamovi in the analysis view. The "factor" menu is selected and the "Reliability Analysis" option is highlighted

Select the continuously measured items from your scale and move them to the Items window. If any items are reverse-coded, make sure to use the transformed version of the variable.

Reliability analysis menu on the left and results on the right. In the reliability analysis menu is list of variables in the dataset. Next to the list is a window with the label Items. In the items window are four variables: Wellbeing_1, Wellbeing_2, Wellbeing_3_R, Wellbeing_4. Under scale statistics, Cronbach's alpha is selected. In the results window, Cronbach's alpha = 0.93.

Report Cronbach’s Alpha as α = .93
Cronbach’s Alpha above .70 is considered acceptable. Ideally, you want the alpha to be above .80.
If your alpha is lower, select Cronbach’s α (if item is dropped) to determine if excluding any of the items improves the alpha.

Same image as previously, but now Cronbach’s α (if item is dropped) is selected. In the results there is a new table called item reliability statistics. Next to each item is the "if item is dropped Cronbach’s α". Wellbeing_1 0.87. Wellbeing_2 0.91. Wellbeing_3_R 0.91. Wellbeing_4 0.92.

Our alpha was acceptable, so we want to use all four items when calculating participants’ well-being scores. Looking at the item reliability statistics, we can also see that dropping any of the items will not improve the reliability of the scale (the alpha will not increase).

3.4 Computing Variables

Common example: Multi-item scale meant to assess participants’ level on a single construct
- Scale items can be summed or averaged to create a single index representing each participant’s score on the construct.
Less common examples:
- Transforming an existing variable to make its distribution more normal
- “Centering” predictor variables in multiple regression

In the Well-being Dataset, we used four items to measure subjective well-being. Participants responded to four individual items, but they all tried to measure the same psychological construct: participants’ level of well-being. Therefore, we need to combine participants’ scores on the four items so that we have one variable that quantifies each participant’s level of well-being. In psychology, it is common to average the individual items that are meant to measure one variable (as opposed to summing them, for example). Averaging the items is generally preferred because the resulting score is on the same scale as the original items and therefore more interpretable. For example, let’s say that Participant A responded:

slightly agree to the first item (coded 5)
moderately agree to the second item (coded 6)
moderately disagree to the third item (recall this was the reverse coded item, so originally it would be coded 2, but the reverse transformation variable that we will use in the computation would have a score of 6)
strongly agree to the fourth item (coded 7)

If we average the scores together, we get a mean score of 6. Interpreting this on the original response option scale used, we would say that the participant “moderately agreed” with the well-being statements, on average.

Before computing a new variable, you should first check the reliability of the scale, see above.

There are a few ways you can create a computed variable

In either Data or Analyses view, double-click on an empty column. Then select New Computed Variable
In Variable or Data view, click Compute on the top menu ribbon

Computed variable menu

Once you select one of the options to compute a variable, you will see the above menu with a letter as the variable name. Just like earlier, you will want to give your new variable a meaningful name and description. In the description, it’s a good idea to say how your new variable is calculated.

Computed variable menu. The variable name is Wellbeing. The description says: Subjective well-being scale, mean of Wellbeing_1, Wellbeing_2, Wellbeing_3_R, Wellbeing_4.

Here, I have named the variable “Wellbeing” and gave it a description that includes how the variable will be calculated. Note, I’m using our transformed Wellbeing_3_R variable instead of the original Wellbeing_3 variable. Recall that for the Wellbeing_3 item, “If I could live my life over, I would change many things”, higher numbers indicate less well-being, whereas for the other items, higher numbers indicate greater well-being. Therefore, we want to use the reverse coded item Wellbeing_3_R so that for all items we average, higher numbers indicate greater well-being.

Next, we will specify how we want the variable computed. Clicking on will display a menu with all of the available mathematical functions you can use. In this example, we want to take the average of the four items (i.e., the mean) for each participant, so we will double-click on MEAN.

Computed variable menu. The function menu is expanded and mean is highlighted under functions.

Note. Single-clicking on any function will give you a short description of the function.

Next, we will click again and double-click on the variables we would like to use: Wellbeing_1, Wellbeing_2, Wellbeing_3_R, Wellbeing_4.

Computed variable menu. The function menu is expanded and Wellbeing_1 is highlighted under variables.

We will need to add a comma between each variable.

Computed variable menu. In the formula box it says: MEAN(Wellbeing_1,Wellbeing_2,Wellbeing_3_R,Wellbeing_4)

Jamovi will then automatically calculate the mean well-being score for each participant. Click ↑ and you’ll be able to see the calculated scores in Data or Analysis view.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License