Let’s look at a good case study now, involving

four factors, and two outcome variables. We’re stepping up the complexity

here a little bit. This is a good question, from the

textbook by Box, Hunter and Hunter. It’s a case study where we we are using

solar panels, with a storage tank. The outcome values were from

a computer simulation. Now just a quick piece of

advice when using simulations. Running simulations is often really easy. But there’s a temptation to

really do this inefficiently. I often see people just playing around with

the software, trying out different values, until they get an answer they like. You shouldn’t treat simulations

any differently from real life. Always use a systematic method. In this case we’re going to use a set

of experiments as our systematic method. There are two key advantages

though to using simulations. You can run the simulations in parallel

at very little cost and secondly, you don’t have to randomize

the order of experiments. And the reason for that is quite simple. When you repeat the simulation,

you get the same answer, so the need for randomization

isn’t there anymore, which was, minimize the impact of disturbances. Be careful though: certain computer experiments,

when repeated, don’t give identical results. So then you should randomize. In fact, I always recommend you randomize. The cost of doing so if very minimal, and

it guards against all sorts of problems. More on that in the next module though. Let’s go back to the solar panel system. There are four factors. A: the total amount of insulation or sunlight

received; B: the capacity of the storage tank; C: the water flow rate through the absorber;

and D: the intermittency of the sunlight. You can read more about these types

of systems, by following this link. The two outcome variables were

“y_1” the collection efficiency, and “y_2” the energy delivery efficiency. You should be able to quickly tell

how many experiments will be done, if each factor is operated at

the low level and the high level. You should have: two to the

power of four (2^4) which is 16. So 16 experiments were run, and I’ve put the

results and the R code here on the screen. They’re available on the course website. Copy and paste that code and follow

along with me for the rest of the video. So here we define the four factors:

A, B, C and D, and I’ve manually typed in the two outcome variables, “y_1” and “y_2”. This is what you would do in practice,

but to make things a bit simpler, and to avoid typing errors, you

can also use the PID package in R. In a prior video I showed how

you can download and install that package, to extend R’s capability. That package includes the numeric

results for this case study. And you can get that dataset by typing

the following command: data(solar). So since we ran 16 experiments, we

are able to estimate 16 parameters: there are four main effects

(one for A, B, C and D). There are 6 two-factor interactions,

there are 4 three-factor interactions, and then the single four-factor interaction. That’s a total of 15 parameters, and it

adds to 16 if you count the intercept. The software can create all of this for you, very compactly with the “lm(…)”

command, as shown here. The reason why this A*B*C*D concept works is

because of the principle of model hierarchy. Let’s take a simple example: if you

wrote just A*B, then R will expand that to include factor A

and factor B in the model. After all, you can’t have the two factor

interaction A*B if you don’t also have factor A and factor B. Similarly,

when R encounters A*B*C, it ensures that the AB interaction

is present, as well as factor C. But, we’ve already mentioned that the

AB will be expanded into factors A and B. So it will ensure the BC interaction

is present, and in a similar line of thinking, the AC interaction will also be present. So now you can understand why when

we write A*B*C*D here in the lm(…) command, R will recursively expand

this into all the main effects, all the two factor interactions,

all the 3 factor interactions as well as the 4 factor interaction. It is as if we had written it

all out by hand as shown here. But obviously that is tedious, and

error-prone, so let R do the work for you. Now let’s build those two separate linear

models: for the collection efficiency, “y1”, and for the energy delivery efficiency, “y2”. If you use the summary(…) command, as we’ve done before,

it might be fairly difficult to quickly locate what the important

factors are that influence y_1. Rather let’s use the Pareto plot to show

us what the important parameters are. Here it is: the grey bars represent

the terms with a negative sign. And black bars represent the

terms with a positive sign. The most important terms are the

B, the A, the AB interaction, and factor C. The other terms have

a diminishing effect on the outcome. The collection efficiency will

decrease when factor B is increased. In other words, as the storage tank capacity

is increased, the collection efficiency drops. This is the most influential

variable in the system. Next is the A factor, the amount of insolation,

has a positive on the collection efficiency. Now try answering this question

here on the screen: pause the video, and think about the AB interaction. The correct answer is the one that

use a high level for factor A, and a low level for factor B. We can see this

in the equation, and from the Pareto plot. In this case, setting factor B to a

negative sign, helps boost our objective, but it also makes the two factor

interaction work in our favour. So A, B and AB interaction are the three

most influential terms in the model. But you also notice that factor D

has little impact on the outcome. That’s a useful result as it

indicates we are relatively insensitive to the variation in the solar intermittency. If we were to run more experiments

in the future, we might leave factor D out of consideration. Similarly, when trying to optimize the

process for collection efficiency, y1, we can be confident that solar

intermittency won’t play a major role; at least according to this simulation system. Now let’s take a look at our second outcome

variable, y_2, the energy delivery efficiency. If you rebuild the model and look at the

Pareto plot we see extremely strong effects from factor A, and the two

factor factor interaction of AB. The other factors, C and D, are small. What you also notice here, and

this is a very common result, is that many of the higher level

interactions, such as the 3- and 4-factor interaction are small, or zero. I would like to point out an important

issue at this moment using this example. Take a look at factor B, it is small and

based on what we’ve done you might be tempted to conclude that factor B is not important. That’s not entirely correct. We cannot exclude factor B from consideration,

because AB interaction is very important. Remember what an interaction was defined as. In this example the AB interaction means that

the effect of factor A is dependant on the level of factor B. Alternatively, the effect

of factor B is dependant on the factor A. So because the AB interaction is strong,

we cannot ignore factor B. The level that factor B is set at is also important. And so we cannot remove factor

B from the model either. So let’s end off today’s class with

this question for you to think about. Can you maximize both y_1

and y_2 simultaneously? What would be the best combination of

settings of the factors to get that maximum? This is a question that we will

discuss in the course forums. Please go ahead and participate in

the forums, and discuss that issue. So that’s a wrap. In this module, and in the prior one,

you’ve seen how we can use pen and paper, or use computer software to analyze

experiments to make improvements. Now in the coming module we

start to get a little bit lazy. We want to do fewer experiments, but still extract the most

information we can from the system. Well, we are not actually being lazy,

we really just want to save money and time, because experiments are costly. So run as few experiments but extract

the most information we possibly can. I’m looking forward to one way we might do that. See you over there.