sd -- a drop-in replacement for `cd'

Check-in [cf34cbb665]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:tidy up continued.
Timelines: family | ancestors | descendants | both | mksh
Files: files | file ages | folders
SHA1: cf34cbb665216690d4d47417d7c694f331972f63
User & Date: vdh 2019-12-22 17:10:54
Context
2019-12-22 17:13
take over the tidy up from mksh branch. check-in: a28d3150c6 user: vdh tags: ksh
2019-12-22 17:11
tidy up continued. check-in: 652cbec2de user: vdh tags: mksh
2019-12-22 17:10
tidy up continued. check-in: cf34cbb665 user: vdh tags: mksh
2019-12-22 17:08
tidy up: start to remove stuff from branches that should (or need) only be present on trunk. check-in: 6468d8b10d user: vdh tags: mksh
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Deleted www/man.md.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260

man page
========
Below follows  a transcript of the **sd** man page. 
See also the [quistart guide](quick.md).

NAME
====

sd - switch between directories using a dynamic directory stack

SYNOPSIS
========

**sd** \[*pattern*|*pathname*|*-*\]

**sdirs** \[ **-hmpfsicw** \] \[ **-e** num \] \[ **-l** num \] \[
**-d** pattern \] \[*pattern*\]

DESCRIPTION
===========

The **sd** utility consists of a set of **ksh** functions enabling rapid
navigation between recently visited directories. **sd** keeps track of
all recent **cd** activities with the help of a logfile. This logfile is
analyzed to generate a "frecency" distribution (combining frequency and
recency) of the visits to different directories. The directory stack can
be queried by further **sd** commands using pattern matching. A sliding
window (containing a limited number of trailing entries from the
logfile) is used. This leads to a dynamically changing stack which is
updated after each **sd** action. While the logfile is shared by all
running shells, the directory stack is not. Therefore, modifications of
the logfile (and stack) by **sd** actions in one shell do not
immediately affect the stack in another already running shell. **sdirs
-f** can be used to enforce an update (rescan of the logfile). **sdirs
-w** can be used to enforce an update of the logfile prior to
termination of the presently running shell.

Some care is taken to maintain compatibility with **bash** and **zsh**.
**sd** can be used in combination with **tcsh**, too, through the
auxiliary ksh scripts **sdcmd** and **sdirs**. However, functionality is
somewhat restricted in this case.

In all shells, interaction with **sd** proceeds solely via **sd** and
**sdirs**.

USAGE
=====

After correct setup **sd** can be used as a replacement for **cd**.
**sd** takes one or more arguments that together define a string
*pattern* (possibly containing blanks) which is first tried as a
pathname. The special cases **'sd'** and **'sd -'** work as expected. If
pathname interpretation fails, *pattern* is used for a lookup in the
last **sdlines** directories visited (not counting your home directory).
In this case *pattern* can be any valid regular expression. Characters
special to the shell might need quoting and so do characters with
special meaning to regex matching. Thus a lookup of a verbatim **a.b**
requires the pattern **a\\\\.b** in order to get the quoting through.

The search is performed top-down starting at the most relevant
directory. The working directory is changed to the first match found. If
this match is not the correct one (and presuming there actually are
multiple matches), repeatedly executing the same **sd** command again
(typically by recalling it from the shell history) will cycle through
all matches thus allowing to quickly reach the second (or third, etc.)
match. Alternatively, the search pattern might be refined. Frequently,
trailing substrings from the full pathname, notably the basename, work
fine. It is also possible to switch between directories with **'sd
=rank'** where *rank* is the rank index displayed by **sdirs**.
**'sd ='** is equivalent to **'sd =**1**'**. Quoting of the **=** might
be necessary with older versions of **tcsh** and with **zsh**.

If **sdselect=0**, **sdirs** lists the directory stack in the four
column format *frecency frequency rank directory\_name* sorted top-down
according to "frecency", i.e. taking into account frequency and recency
of directory visits according to current value of **sdpower**.
Specifying a pattern as argument restricts the display to matching
entries. If **sdselect=1**, **sdirs** uses a three column format *rank
index directory\_name* and queries for the index value of the desired
directory and than switches to that directory.

**sdirs** also serves as user interface for performing other tasks
according to chosen option as detailed below.

OPTIONS
=======

**sdirs** accepts these options:

**-e**  
Set the exponent **sdpower** of the power law used for time weighting
the logfile entries when computing the stack. Fractional values are
allowed. The current value is **3**. A value of 0 eliminates time
weighting (stack sorted by frequency of visits). A sufficiently large
value (e.g. 1000) enforces stack sorting by most recent visits.

**-h**  
Show short usage note.

**-m**  
Show manpage.

**-p**  
If specified before **-m**, convert manpage to postscript and send to
stdout.

**-f**  
Forces a refresh of the directory stack. This might be helpful if
**sdlines** or **sdpower** are modified interactively or if several
shell incarnations are running in parallel.

**-s**  
Displays the elements in the directory stack alphabetically instead of
according to search order. Might be helpful for locating some
ill-remembered pathname to find out its rank for a **sd =***rank*
command.

**-i**  
Show status info for logfile and stack.

**-c**  
Clean up logfile: remove stale entries no longer pointing to an existing
directory and update the directory stack.

**-w**  
write updated list of visited directories to logfile. Happens
automatically when the shell terminates.

**-l num**  
Change the number of recently visited directories considered when
constructing the directory stack. As a special case, when the provided
argument is zero or non-numeric, the complete currently available
logfile content is used (so the easiest call to achieve this would be
**sdirs -ll**).

**-d pattern**  
Delete all entries matching the pattern from the logfile (deletion is
performed only after a final confirmation by the user) and update the
directory stack.

Options **-e** and **-l** are not available when used with *tcsh*. You
have to modify the corresponding environment variables directly (see
section *Setup for tcsh* below).

HANDLING OF NON-MATCHING PATTERNS
=================================

Whether a **cd pattern** command fails, i.e. whether *pattern* does not
match any entry in the current directory stack, partly depends on the
chosen value of **sdlines** (which might be modified in the
corresponding resource file or interactively via **dirs -l num**).

As an attempt to improve handling of such initially failing
**cd** actions,  **sd** implements  the following strategy (also
covering the case of "stale" entries, i.e. entries pointing to a
directory that has been deleted in the meantime):

> 1. If **cd   pattern** fails, first try to find another matching entry
> further down on the stack. If this fails too, temporarily increase
> **sdlines** to the total number of entries in the logfile (usually
> several 1000).
>
> 2. Recreate the stack (which then contains *all* directories found in
> the logfile) and try again to find a matching entry (skipping over
> stale entries).
>
> 3. Reset **sdlines** and recreate the stack.

In this way the chance of complete failure is distinctly reduced but it
is of course not guaranteed that the top-most "hit" on the extended
stack is the desired one. For this reason, by default a list of all hits
on the extended stack is displayed, too, in order to enable the user to
refine the search pattern if need be. To avoid this output define this
shell variable: **sdsilent=1**.

SHELL VARIABLES
===============

There are a few user-settable shell or environment variables recognized
by **sd**. Regarding meaning of **sdmax** and **sdlines** see *INITIAL
SETUP* section. **sdlines** can be changed with **dirs -l num**.
**sdpower** can be changed with **dirs -e num**.

INITIAL SETUP
=============

Setup for ksh, bash, and zsh
----------------------------

\1. Put the file **sd.ksh** in a directory on the search path.

\2. Insert a **source sd.ksh** command at or near the top of your startup
file. The source command can be preceded by assignments to the variables
**sdmax** and **sdlines.** **sdlines** defines the number of previous
**cd** actions which are analyzed by **sd**. **sdmax** is the maximum
number of **cd** actions logged in the file **~/.sd/dirv**. If this
limit is reached the file is pruned to the **sdmax\*9/10 or**
**sdlines** most recent **cd** actions, whichever is larger. The current
values are **sdmax=8192** and **sdlines=512**. **sdlines** can be
modified interactively at any time to alter the "time window" accessible
via the stack.

If a **cd** function is not already existing at the time **sd.ksh** is
sourced, **sd** defines it as

    function cd {
       sd "$@"
    }

which effectively aliases **cd** to **sd**. Using "$@" (rather than
 $\* or $\*) is the right thing to do here considering the possibility
of multiple blanks in (quoted) patterns or an IFS=$'\\n'. If you have
your own **cd** function you might want to include the above **sd** call
in that function. **sd** furthermore defines

    alias dirs=sdirs
    alias ds=sdirs

The former alias overrides the **dirs** builtin in **bash** and **zsh**.
You probably don't want to use the builtin in parallel to **sd** anyway,
but if you need it simply issue

    unalias dirs

after sourcing **sd.ksh**.

Bash users
----------

Note that **sd** sets the non-standard option *shopt -s extglob* which
activates **ksh** like extended shell glob patterns. If this interferes
with your setup, don't use **sd.**

zsh users
---------

Note that **sd** sets the non-standard options *set -o KSH\_GLOB*
(which activates **ksh** like extended shell glob patterns) and *set -o
POSIX\_BUILTINS* (which enables the **command** builtin to execute shell
builtins). If this interferes with your setup, don't use **sd.**

Setup for tcsh
--------------

\1. Put files **sd.ksh**, **sdcmd**, and **sdirs** in a directory on the
search path. Make the last two of these executable.

\2. Add the following lines verbatim (including the quotes) to your
startup file:

    alias cd   'eval "`sdcmd \\!*`"'
    alias dirs 'sdirs'
    alias ds   'sdirs'

Note that the second alias overwrites the **dirs** builtin of tcsh. If
you don't like this, omit this alias definition.

To modify variables relevant to **sd**, you have to export them to the
environment of your shell with **setenv**.
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<








































































































































































































































































































































































































































































































































Deleted www/mkman.sh.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# NOTE: this script needs to be sourced in order to ensure correct
# shell variable expansion.

# this is a q&d attempt at automating the translation of the sd
# manpage source to markdown for inclusion of the manpage as embedded
# documentation into the project and its web ui.

# rerun by and then to check whether `man.md' has changed due to
# changes to `nroff -man' source integrated into `sd.ksh'.

cat ../sd.ksh |\
awk '
   /^\.TH/, /^HERE$/ {
      if ($0 !~ /^HERE$/) print
   }
' |\
awk '
   {
      gsub(/\$sdpower/, "'$sdpower'", $0)
      gsub(/\$sdmax/, "'$sdmax'", $0)
      gsub(/\$sdlines/, "'$sdlines'", $0)
      gsub(/\\\$/, "$", $0)          # undo shell-related quoting 
      gsub(/\\\\\\n/, "\\\\n", $0)   # undo shell-related quoting 
      print
   }
' |\
pandoc -f man -t markdown_strict |\
(
# add a preamble
print "
man page
========
Below follows  a transcript of the **sd** man page. 
See also the [quistart guide](quick.md).
"
# post process the result to remove residual formatting errors
sed -E '
   s/(^[1-9]\. )/\\\1/g
   s/\*\*\*([^*]+)\*\*\*/\1/g
   s/"\*\*([^*]+)\*\*/ \1 /g
   s/\*"/*/g
   s/"([^*]+[^!]\*)/ \1/g
   # this escapes more general patterns:
   s/\*\* sdcmd\*\*/ **sdcmd**/g
'
) >| man.md
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<




























































































Deleted www/quick.md.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# Quick Start Guide #

---
## Preliminary Remarks ##

We presume you are using either *ksh* or a mostly *ksh* compatible shell
(i.e. *bash* or *zsh*). **sd** can be used with *tcsh*, too, but there are
some minor restrictions and somewhat reduced performance.

After successful installation (see [Initial Setup](man.md#setup)),
the file `sd.ksh` has to be sourced from within a running shell. 
For routine use, the standard way to achieve this is to include the line

    . sd.ksh

verbatim (including the leading ". ") in your resource file (e.g. `.kshrc`
or `.bashrc`). `sd.ksh` defines some shell functions whose names all start
with the string "sd". Relevant for interactive usage are only two of them,
namely

    sd
    sdirs (aliased to "dirs")

where **sd** is responsible for executing the desired **cd** command while
**(s)dirs** acts as interface to the other functions. Presuming there is no
shell function **cd** defined at the time `sd.ksh` is sourced, such a function
is automatically defined as

    function cd {
       sd "$@"
    }

so that **sd** can be used transparently instead of **cd**. Altogether,
usually you only use the commands **cd** and **dirs** to interact with **sd**.

---
## Usage ##

If a valid path to a directory is specified as argument to **sd** (or to the
`cd` function as defined above) it acts just as the builtin **cd** command
(including the cases where the argument is omitted or equal to `-`).

If the argument is not a valid path, it is interpreted as a regular expression
pattern and matched against a stack of directory names of previously visited
directories ordered by "frecencies" of visit. The first match found (if any)
is then used for the **cd** action. By definition this is most probably
correct (simply because that directory was recently visited most frequently 
of all the directories matching the pattern) but is obviously not always what
you want. In this case the most direct strategy is to use a more specific
pattern.

Full regular expressions can be used but usually they are not necessary. *If*
they are used, take care to use adequate quoting. The most useful patterns
are either just substrings of the desired directory name (e.g. its basename
or a trailing part of the full path name). In order to enforce a match at the
end of the stack entries use something like

    cd basename$

Instead of making the pattern directly match the directory you are
interested in, it's sometimes easier to issue

    dirs pattern

where `pattern` is a usually non-unique match for the respective directory.
This command will display a list of all "hits" of the specified pattern
together with a rank index. This rank in turn can then be used to go to the
desired directory:

    cd =rank_index

With `zsh` you have to escape the equality sign: `cd \=rank_index`.

Issuing `dirs` without an argument displays the complete content of the
currently available directory stack. Detailed usage information can be
obtained by issuing

    dirs -m

---
## Technicalities ##

### The directory stack ###

The directory stack used by **sd** for lookup of directories is changing
dynamically and is generated/updated as follows:

0. At startup the stack is initialized from the trailing `$sdlines` (default:
\512) lines/directory names in the logfile of previously visited directories
(default logfile: `~/.sd/dirv`) by computing a "frecency" distribution of all
unique names. 

0. The resulting list constitutes the directory stack which is queried top-down
in order of decreasing "frecency" when looking for an entry matching the
pattern specified for a **cd** action.

0. After each successful **cd** action, the name of the new working
directory is appended to the list of the **cd** history loaded initially
from the logfile and the stack is regenerated from the (now slightly
changed) `$sdlines` trailing entries. Naturally, if a certain directory
is visited sufficiently often, over time it will move up the directory
stack.

Obviously, the stack content and order is influenced by the chosen
values of `$sdpower` (see [sdirs -e](man.md)), and `$sdlines`. The
latter value indirectly defines the effective time window inspected by
**sd**: by default the last 512 *cd's* are tracked which, at moderately
heavy use, does correspond to about 1-2 weeks of work.

Increasing the value of `$sdlines` extents the effective time window and thus
includes more distinct directory names in the stack. However, the stack
ordering does no longer change as rapidly. Rather, for large values of
`$sdlines` it approaches a nearly static sorting order. The bottom
line: there is to be made a choice between including a sufficiently large
number of directories in the stack and a stack that adjusts its sorting order
rapidly to a change of focus of the ongoing work (accompanied by frequent
visits to a different group of directories).

A further point to realize is the following: the directory stack is maintained
in a shell variable (`$sdlist`). The stack is thus internal to the
respective shell process. Especially, it does not change if another 
shell instance modifies it's own incarnation of the stack and/or the logfile.
In order to "synchronize" the stack across different shells (usually in different
terminals) logfile update an be enforced via `dirs -w` and stack
regeneration can be enforced via `dirs -f` (it might be easier to just
open another terminal, though).

### The logfile ###

The logfile size is limited to `$sdmax` entries/lines (default: 8192).
If this limit is reached, the file is pruned to the `9/10*$sdmax`
trailing/most recent entries. Thus, the accessible history fluctuates
between these two figures. This approach ensures that pruning occurs
only vary rarely (every few months, probably) and that the minimum time
window is that of the last `9/10*$sdmax` **cd** actions (many months,
maybe over a year). `dirs -i` reports the time of the last pruning.


### Handling of non-matching patterns ###

Whether a 

    cd pattern

command fails, i.e. whether `pattern` does not match any entry in the current directory
stack, partly depends on the chosen value of `$sdlines` (which might be modified in
the corresponding resource file or increased interactively via "`dirs -l num`").

As an attempt to improve handling of such initially failing **cd** actions,
**sd** implements the following strategy
(also covering the case of "stale" entries, i.e.
entries pointing to a directory that has been deleted in the meantime):

0. If `cd pattern` fails, 
first try to find another matching entry further down
on the stack. 

0. If this fails too,
temporarily increase `$sdlines` to the total number
of entries in the logfile (usually several 1000).

0. Recreate the stack (which then contains _all_ directories found in the
logfile) and try again to find a matching entry
(again, skipping over stale entries).

0. Reset `$sdlines` and recreate the stack.

In this way the chance of complete failure is distinctly reduced but it is of
course not guaranteed that the top-most "hit" on the extended stack is the
desired one. For this reason, a list of all hits on the extended stack
is displayed, too, in order to enable the user to refine the search pattern 
if need be.  To avoid this output define this shell variable:
`sdsilent=1` .
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<