Quantcast
Channel: Trending Questions - Cross Validated Meta
Viewing all articles
Browse latest Browse all 364

Is there a preferred way to present R code?

$
0
0

I often find myself wondering about the best way to present R code, and output.

The issue is in R itself, in the console, we will see this type of thing:

> X <- rnorm(100)> Y <- X + rnorm(100)> lm(Y ~ X) %>% summary()Residuals:    Min      1Q  Median      3Q     Max -3.0024 -0.6662  0.0044  0.7071  2.2079 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  -0.0247     0.1073  -0.230    0.818    X             0.9865     0.1091   9.041 1.46e-14 ***---> lm(Y ~ -1 + X) %>% summary()Residuals:     Min       1Q   Median       3Q      Max -3.02849 -0.68970 -0.02017  0.68257  2.18443 Coefficients:  Estimate Std. Error t value Pr(>|t|)    X   0.9855     0.1085   9.082  1.1e-14 ***

So it seems perfectly natural to present this code and output exactly as above. It has the advantage of clearly marking code (prefaced with > as per the R console) from the output of the code.

But the problem with that is, for a user that wants to actually try the code out for themselves, they have to manually remove the > from each line. I have had a number of my posts edited by other users to remove them. So one alternative is to present it like this:

X <- rnorm(100)Y <- X + rnorm(100)lm(Y ~ X) %>% summary()Residuals:    Min      1Q  Median      3Q     Max -3.0024 -0.6662  0.0044  0.7071  2.2079 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  -0.0247     0.1073  -0.230    0.818    X             0.9865     0.1091   9.041 1.46e-14 ***---lm(Y ~ -1 + X) %>% summary()Residuals:     Min       1Q   Median       3Q      Max -3.02849 -0.68970 -0.02017  0.68257  2.18443 Coefficients:  Estimate Std. Error t value Pr(>|t|)    X   0.9855     0.1085   9.082  1.1e-14 ***

This might be fine for experienced R users, but it blurs the distinction between code and output. So we might break it up like this:

X <- rnorm(100)Y <- X + rnorm(100)lm(Y ~ X) %>% summary()

which produces

Residuals:    Min      1Q  Median      3Q     Max -3.0024 -0.6662  0.0044  0.7071  2.2079 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  -0.0247     0.1073  -0.230    0.818    X             0.9865     0.1091   9.041 1.46e-14 ***---

and then we fit the model:

lm(Y ~ -1 + X) %>% summary()

which produces :

Residuals:     Min       1Q   Median       3Q      Max -3.02849 -0.68970 -0.02017  0.68257  2.18443 Coefficients:  Estimate Std. Error t value Pr(>|t|)    X   0.9855     0.1085   9.082  1.1e-14 ***

which takes more time to write and makes the post longer and more verbose.

Maybe I am being a little too pedantic, but I just wondered if others have had similar thoughts or if there is an alternative, or indeed if one of the above approaches is considered better in general ?


Viewing all articles
Browse latest Browse all 364

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>