8.2: Describiendo...

Última actualización
Guardar como PDF

Page ID: 149937

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

En primer lugar, mira las características básicas de cada personaje:

Código\(\PageIndex{1}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
summary(data)

Dado que SEXO y COLOR son categóricos, la salida en estas columnas no tiene sentido, pero es posible que desee convertir estas columnas en datos categóricos “verdaderos”. Hay múltiples posibilidades pero la más simple es la conversión en factor:

Código\(\PageIndex{2}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
data1 <- data
data1$SEX <- factor(data1$SEX, labels=c("female", "male"))
data1$COLOR <- factor(data1$COLOR, labels=c("red", "blue", "green"))

(Para conservar los datos originales, los copiamos primero en nuevos datos de objetos1. Por favor, compruébalo ahora con resumen () usted mismo.)

summary () es aplicable no solo a todo el marco de datos sino también a caracteres individuales (o variables, o columnas):

Código\(\PageIndex{3}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
summary(data$WEIGHT)

Es posible calcular características a partir de summary () una por una. Máximo y mínimo:

Código\(\PageIndex{4}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
min(data$WEIGHT)
max(data$WEIGHT)

... mediana:

Código\(\PageIndex{5}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
median(data$WEIGHT)

... media para PESO y para cada carácter:

Código\(\PageIndex{6}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
mean(data$WEIGHT)

Código\(\PageIndex{7}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
colMeans(data)

... y también redondear el resultado a un decimal:

Código\(\PageIndex{8}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
round(colMeans(data), 1)

(Nuevamente, la salida de ColMeans () no tiene sentido para SEXO y COLOR.)

Desafortunadamente, los comandos anteriores (pero no summary ()) no funcionan si los datos han perdido valores (NA):

Código\(\PageIndex{9}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
data2 <- data
data2[3, 3] <- NA
mean(data2$WEIGHT)

Para calcular la media sin notar datos faltantes, ingrese

Código\(\PageIndex{10}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
data2 <- data
mean(data2$WEIGHT, na.rm=TRUE)

Otra forma es eliminar filas con NA de los datos con:

Código\(\PageIndex{11}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
data2 <- data
data2.o <- na.omit(data2)

Entonces, data2.o estará libre de valores faltantes.

A veces, es necesario calcular la suma de todos los valores de caracteres:

Código\(\PageIndex{12}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
sum(data$WEIGHT)

... o la suma de todos los valores en una fila (intentaremos la segunda fila):

Código\(\PageIndex{13}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
sum(data[2, ])

... o la suma de todos los valores para cada fila:

Código\(\PageIndex{14}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
apply(data, 1, sum)

(Estos ejercicios de resumen están aquí solo con fines de entrenamiento).

Para los datos categóricos, es sensato mirar cuántas veces cada valor aparece en el archivo de datos (y eso también ayuda a conocer todos los valores del personaje):

Código\(\PageIndex{15}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
table(data$SEX)
table(data$COLOR)

Ahora transforma las frecuencias en porcentajes (100% es el número total de errores):

Código\(\PageIndex{16}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
100*prop.table(table(data$SEX))

Una de las características más importantes de la variabilidad de datos es la desviación estándar:

Código\(\PageIndex{17}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
sd(data$WEIGHT)

Calcular la desviación estándar para cada columna numérica (columnas 3 y 4):

Código\(\PageIndex{18}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
sapply(data[, 3:4], sd)

Si quieres hacer lo mismo con datos con un valor perdido, necesitas algo como:

Código\(\PageIndex{19}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
data2 <- data
sapply(data2[, 3:4], sd, na.rm=TRUE)

Calcular también el coeficiente de variación (CV):

Código\(\PageIndex{20}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
100*sd(data$WEIGHT)/mean(data$WEIGHT)

Podemos calcular cualquier característica por separado para machos y hembras. Medios para pesos de insectos:

Código\(\PageIndex{21}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
tapply(data$WEIGHT, data$SEX, mean)

¿Cuántos individuos de cada color hay entre machos y hembras?

Código\(\PageIndex{22}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
table(data$COLOR, data$SEX)

(Las filas son colores, las columnas son machos y hembras).

Ahora lo mismo en porcentajes:

Código\(\PageIndex{23}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
100*prop.table(table(data$COLOR, data$SEX))

Finalmente, calcule los valores medios de peso por separado para cada combinación de color y sexo (es decir, para machos rojos, hembras rojas, machos verdes, hembras verdes, etc.):

Código\(\PageIndex{24}\) (R):

data <- read.table("data/bugs.txt", h=TRUE)
tapply(data$WEIGHT, list(data$SEX, data$COLOR), mean)