这是一份NUSW的R project作业代写的成功案例
Stats Central – Online Short Course: Intro to R, May 2-3 2022
Facilities Stats Central
2022 May 2-3
Course Overview
R is widely used and extremely powerful statistical software. This course assumes that you have never used R before. You will learn how to obtain and install R, which is open-source software, and RStudio, which is a versatile, user-friendly interface for using R. It is very useful to do this course before our introductory statistics course, Introductory Statistics for Researchers.
This course will be held over two half-days and will cover some basic features of R and lay the groundwork for you to improve your R skills independently. The course is self-paced and focused on developing practical skills.
Course Outline
This course will cover topics including:
- Basics of interacting with R – calculations, saving variables so you can reuse them, data types and structures, organising R code in scripts
- Tidyverse – a basic introduction to tidy R code
- Data – reading in and organising data (from spreadsheets) with dplyr
- Plotting – make beautiful figures with ggplot
Course Requirement: You will need a computer with administrator access (to install R and RStudio software before attending the course).
*This is a popular course and tickets are limited!*
Date: Monday 2 and Tuesday 3 May, 9.30am -1.00pm each day
Location: Online
You will receive a certificate of completion for the course.
- Compute sample size
1
- SiO2 analysis
- Automatic identification with ATD
a) create a dataframe with ALD
b) Boxplot for the data
$\mathrm{~ c ) ~ H i s t o g r a m ~ B o x p l o t ~ f o r ~ t h e ~ d a t a ~ . . . . . . . ~ . ~ . ~ .}$
d) Density and comparation $\quad 4 \cdot \frac{3 \cdot 2 \cdot 2 \cdot}{}$
e) Hypothesis test $\ldots \ldots . \ldots . \ldots \ldots . \ldots . \ldots . \ldots . \ldots . \ldots . \ldots$
$\mathrm{~ f ) ~ T w o – s i d e d ~ C I ~ f o r ~ a v e r a g e ~ A L D ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .}$
- Time needed to repair a rail break
a) probability plot $\ldots . . . \ldots \ldots$
b) Hypothesis test $\ldots \ldots . \ldots . \ldots . \ldots . \ldots \ldots \ldots \ldots . \ldots . \ldots . \ldots$
c) type 2 error $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$
- Nickel plates for test cells 7
a) Hypothesis test $\ldots . . . \ldots . . . \ldots . \ldots . \ldots . \ldots . \ldots . \ldots . \ldots . \ldots .$
b) Calculation of sample sizes $\ldots . . . \ldots . \ldots . \ldots . \ldots . \ldots . \ldots . \ldots .$
c) Sample size $\ldots . . \ldots \ldots . . \ldots \ldots . . \ldots . \ldots . \ldots . \ldots$
- Null hypothesis:
- Percentage of $\mathrm{SiO} 2$ in a certain type of aluminous cement is $5.5$
- Alternative hypothesis:
- Percentage of $\mathrm{SiO} 2$ in a certain type of aluminous cement is larger than $5.5$
sio2_sample $=$ sample_size(alpha $=0.01$, beta $=0.01$, delta $=5.6-5.5$, sigma $=0.3$ )
sio2_sample [1]
## [1] 195
5. Automatic identification with ALD
From the limited information revealed from the boxplot, we can assume that the ALD is normally distributed.
6 –
The histogram shows that the assumption of the normal distribution cannot be proved.
10. d) Density and comparation
ggplot (data = as.vector(data) , aes (sample $=\mathrm{X} .$ ALD. $))+$ geom_qq $($ col $=$ “blue” $)+$ geom_qq_line(col $=$ “red”) $+$ ” Normal probability plot for ALD
The Q-Q
plot is plausible enough to prove that ALD is normally distributed. However, the assumption of normality is not so necessary for calculating CI and testing hypotheses about the true average ALD due to the large number of samples (49). We can use the Large Number theory to solve the above two problems.
- Null hypothesis
- ALD is equal to 1.0.
- Alternative hypothesis
- ALD is less than 1.0.
t.test $($ data, mu $=1 \cdot 0$, alternative $=$ “less” $)$
#
## One Sample t-test
##
## data: data
$# # t=-5.7905, d f=48, \mathrm{p}$-value $=2.615 \mathrm{e}-07$
## alternative hypothesis: true mean is less than 1
## 95 percent confidence interval:
$-\operatorname{Inf} 0.8222677$
## sample estimates:
$# #$ mean of $x$
## $0.7497959$
The p-value is $2.615 \mathrm{e}-07$, much smaller than $0.01$, so we reject the null hypothesis, that is, the data does provide strong evidence for concluding that the true average ALD is less than 1.0.
12. f) Two-sided CI for average ALD
t.test (data, mu $=1.0$, alternative $=$ “less”, conf .level $=0.95)$
##
# One Sample t-test
\
data: data
$# # t=-5.7905$, df $=48$, p-value $=2.615 \mathrm{e}-07$
# alternative hypothesis: true mean is less than 1
## 95 percent confidence interval:
$\quad-\operatorname{Inf} 0.8222677$
## sample estimates:
mean of $\mathrm{x}$
## 0.7497959
13. Time needed to repair a rail break
a) probability plot
data4 $=$ data.frame $(\mathrm{x}=\mathrm{c}(159,120,480,149,270,547,340,43,228,202,240,218))$
ggplot (data $=$ data4, aes $(\operatorname{sample}=\mathrm{x}))+$ geom_qq $(\mathrm{col}=$ “blue” $)+$ geom_qq_line(col $=$ “red”) $+$ labs $(t i t l e=11$
Normal probability plot for time for repair
$500-$
$400-$
$\stackrel{(}{E} 300-$
$200-$
$100-$
Theoretical quantiles
We can conclude from the Q-Q plot that it is plausible that the repair time is normally distributed except some outliers.
- Null hypothesis
- Repair time is equal to $200 \mathrm{~min}$.
- Alternative hypothesis
- Repair time is more than $200 \mathrm{~min}$.
t.test (data4, mu = 200, alternative $=$ “greater”, conf. level $=0.95)$
##
One Sample t-test
##
data: data4
$# # t=1.1853, d f=11, p$-value $=0.1304$
## alternative hypothesis: true mean is greater than 200
## 95 percent confidence interval:
$174.4174$ Inf
## sample estimates:
mean of $\mathrm{x}$
$249.6667$
p-value is equal to $0.13$, larger than $0.05$, so we can accept the null hypothesis – there is no compelling evidence showing that the repair time exceeds $200 \mathrm{~min}$.
c) type 2 error
power.t.test $(\mathrm{n}=\mathrm{nrow}($ data4 $)$, delta $=100, \mathrm{sd}=150, \mathrm{sig} \cdot$ level $=0.05$, type $=$ “one.sample”, alternative $=$ “ol
##
## One-sample t test power calculation
##
$# # \quad \mathrm{n}=12$
$# # \quad$ delta $=100$
sd $=150$
sig.level $=0.05$
power $=0.6981908$
alternative $=$ one. sided
From the result we can compute that the type 2 error probability of the test used in a. is 1 – $0.698=0.302$.
15. Nickel plates for test cells
We can view this problem as a binominal distributed data. But the number of the sample times p0, that is $100 * 0.1$, is equal to 10 , so we can use the large samples test .
- Null hypothesis
- The blister probability is equal to 0.1.
- Alternative hypothesis
- The blister probability is more than 0.1. prop.test $(\mathrm{x}=14, \mathrm{n}=100, \mathrm{p}=0.1$, alternative $=$ “greater”, conf. level $=0.95$, correct $=$ FALSE $)$
#
#
## 1-sample proportions test without continuity correction
$# #$
## data: 14 out of 100 , null probability $0.1$
$# #$-squared $=1.7778, \quad d f=1$, p-value $=0.09121$
## alternative hypothesis: true $p$ is greater than $0.1$
95 percent confidence interval:
## $0.09237298 \quad 1.00000000$
## sample estimates:
$\mathrm{p}$
## $0.14$
For certain significance level of $0.05$, I also may commit the type 2 error.
b) Calculation of sample sizes
sample_size2 <- function (alpha, beta, p0,p) {
z.alpha = qnorm(alpha, lower.tail = FALSE)
z.alpha2 = qnorm (alpha/2, lower.tail = FALSE)
z.beta = qnorm (beta, lower.tail = FALSE)
$\mathrm{n} 1=$ round $(($ ( $. \mathrm{alpha} * \operatorname{sqrt}(\mathrm{p} 0 *(1-\mathrm{p} 0))+\mathrm{z} \cdot \mathrm{beta} * \operatorname{sqrt}(\mathrm{p} *(1-\mathrm{p}))) /(\mathrm{p} 0-\mathrm{p}))$ ) 2 , digits=0) #for ane-tail
$\mathrm{n} 2=$ round $((\mathrm{z} \cdot \mathrm{alpha} 2 * \operatorname{sqrt}(\mathrm{p} 0 (1-\mathrm{p} 0))+\mathrm{z} \cdot$ beta $ \operatorname{sqrt}(\mathrm{p} *(1-\mathrm{p}))) /(\mathrm{p} 0-\mathrm{p}))$ ) 2 , digits=0) #for a two-tail
$\operatorname{return}(c(n 1, n 2))$
}
Here $n 1$ is the apprpriate answer for the hypothesis test illustrated above.
c) Sample size
paste(“Sample size of the plates for the test is”, sample_size2(alpha $=0.05$, beta $=0.10$, po $=0.1$, p $=0.1$
## [1] “Sample size of the plates for the test is $362 “$
matlab代写请认准UprivateTA™. UprivateTA™为您的留学生涯保驾护航。