The packages dplyr uses the SQL concepts of left, right, full, inner, semi, anti and cross joins. The following table gives the correspondence between the Stata commands merge and joinby with the dplyr functions:
| Stata | merge v1, keep(master matched using) |
| dplyr | full_join(df1, df2, by = "v1") |
| Stata | merge v1, keep(master matched) |
| dplyr | left_join(df1, df2, by = "v1") |
| Stata | merge v1, keep(matched using) |
| dplyr | right_join(df1, df2, by = "v1") |
| Stata | merge v1, keep(matched) |
| dplyr | inner_join(df1, df2, by = "v1") |
| Stata | merge v1, keep(matched) keepusing(v1) |
| dplyr | semi_join(df1, df2, by = "v1") |
| Stata | merge v1, keep(master) keepusing(v1) |
| dplyr | anti_join(df1, df2) |
| Stata | crossby |
| dplyr | cross_join(df1, df2) |
Wen there are multiple matches both in the master and using datasets, these functions create all combinations of rows, similarly to Stata joinby (and contrary to Stata merge m:m).
To append one dataset to another:
| Stata | append using "using.dta" |
| dplyr | row_binds(df1, df2) |
When string columns are append too numeric columns, Stata returns an error (and with the option force, it replaces the using dataset to missing values).
In contrast, in R, columns with different types are coerced to their higher type (logical < strings < int < real < character).
The package tidyr includes the functions pivot_longer and pivot_wider that alternate between wide and long datasets.
From wide to long
| dfwide | –> | dflong |
|---|---|---|
| id stub1 stub2 1 4.1 4.5 2 4.5 3.0 |
–> | id variable value 1 stub1 4.1 1 stub2 4.5 2 stub1 3.3 2 stub2 3.0 |
| Stata | reshape long stub, i(id) j(variable) string |
| tidyr | pivot_longer(dfwide, starts_with("stub"), names_to = "variable", values_to = "value") |
The names in red specify the name of new variables. While reshape long requires to specify both the variable to reshape and the key variables, picot_longer only requires to specify the variables to reshape - the key variable are inferred to be the remaining ones.
From long to wide
| dflong | –> | dfwide |
|---|---|---|
| id variable value 1 stub1 4.1 1 stub2 4.5 2 stub1 3.3 2 stub2 3.0 |
–> | id stub1 stub2 1 4.1 4.5 2 4.5 3.0 |
| Stata | reshape wide value, i(i) j(variable) string |
| tidyr | pivot_wider(dflong, names_from = variable, values_from = value) |