R 's data sorting functions

**sort**and

**order**, the data filtering function

**which**, vector accessing operators

**[]**, vector and matrix manipulation functions

**cbind**and

**rbind**, and other functions and keywords make data analysis easy in much situations. SQL (Structered Querying Language) is used for storing, adding, removing, sorting and filtering the data in which saved on a disk permenantly or memory.

The R package

**sqldf**builds a

**SQLite**database using an R

**data.frame**object. A

**data.frame**is a matrix with richer properties in R. In this blog post, we present a basic introduction of

**sqldf**package and its use in R.

First of all, the package can be installed by typing:

> install.packages("dftable")

After installing the package, it can be got ready to use by typing:

> require("dftable")

Loading required package: sqldf

Loading required package: gsubfn

Loading required package: proto

Loading required package: RSQLite

Loading required package: DBI

Now lets create two vectors with length of 100:

> assign("x", rnorm(100))

> assign("y", rnorm(100))

> assign("mydata", as.data.frame(cbind(x,y)))

We can see first 6 rows:

> head(mydata)

x y

1 -1.9357660 0.2784369

2 -0.6976428 1.4646022

3 0.1913628 0.1578977

4 0.3049607 0.6055087

5 2.3773249 1.1800434

6 0.4641791 1.7143130

Let's perform some

**SQL**statements on this data frame using**sqldf**:**Averages of x and y**

> sqldf("select avg(x), avg(y) from mydata")

avg(x) avg(y)

1 0.0790934 0.220756

**Number of cases**

> sqldf("select count(x), count(y) from mydata")

count(x) count(y)

1 100 100

**First Three Cases**

> sqldf("select x,y from mydata limit 3")

x y

1 -1.9357660 0.2784369

2 -0.6976428 1.4646022

3 0.1913628 0.1578977

**Minimum and Maximum Values**

> sqldf("select min(x),max(x),min(y),max(y) from mydata")

min(x) max(x) min(y) max(y)

1 -2.155768 2.377325 -1.75477 2.531869

**First 3 Cases of Ordered Data**

> sqldf("select x,y from mydata order by x limit 3")

x y

1 -2.155768 0.6614813

2 -1.935766 0.2784369

3 -1.837502 0.1073177

> sqldf("select x,y from mydata order by y limit 3")

x y

1 0.7665811 -1.754770

2 0.3373319 -1.736727

3 0.6199159 -1.335649

**Insert into**

dftable does not alter the data frame. After inserting a new case, a new data.frame is created and returned. In the example below,

**sqldf**takes a vector of two**sql**statements as parameters and the result is in accessable with the name**main.mydata**rather than**mydata**.
> tail (sqldf(

+ c(

+ "insert into mydata values (6,7)"

+ ,

+ "select * from main.mydata"

+ )

+ )

+ )

x y

96 1.58024523 1.3937920

97 -1.79352203 0.2105787

98 0.02632872 -1.0567890

99 -0.60934162 -0.1359667

100 1.43393159 -0.9396326

101 6.00000000 7.0000000

**Delete**

> sqldf(

+ c(

+ "delete from mydata where x < 0 or y < 0"

+ ,

+ "select * from main.mydata"

+ )

+ )

x y

1 0.19136277 0.15789771

2 0.30496074 0.60550873

3 2.37732485 1.18004342

4 0.46417906 1.71431305

5 1.16290585 1.17154756

6 0.49335335 0.19904607

7 1.45769371 0.08291387

8 0.78473338 1.07769098

9 0.69043300 1.35040512

10 1.47893118 1.01057351

.....

Have a nice read!