Skip to content

Commit 0c7ecf1

Browse files
committed
Initial draft of encoding rfc
1 parent d8d4d76 commit 0c7ecf1

File tree

1 file changed

+91
-0
lines changed

1 file changed

+91
-0
lines changed

1-Draft/DefaultFileEncoding.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
RFC: Default File Encoding
3+
Author: James Truher
4+
Status: Draft
5+
Area: FileSystem
6+
Comments Due: 4/16/2017
7+
---
8+
9+
# Default file encoding which optionally includes Byte Order Mark (BOM)
10+
11+
Ensuring file creation is proper for the platform, including whether the BOM should be written.
12+
13+
## Motivation
14+
15+
Current PowerShell behavior is that a BOM is created by default when a file is created for those encodings where the BOM is needed.
16+
This is a problem for Linux systems where the default encoding is UTF8 but a BOM is not written when a file is created.
17+
Creating files on Linux with a BOM makes it difficult to interact with the native tools, as the following example illustrates.
18+
19+
```PowerShell
20+
PS> "ĝoo" > file.txt
21+
PS> get-content file.txt
22+
ĝoo
23+
PS> exit
24+
james@jimtru-ops2:~$ cat file.txt
25+
▒▒oo
26+
27+
```
28+
This is due to the BOM being written into the file:
29+
```powershell
30+
PS /home/james> format-hex file.txt
31+
32+
Path: /home/james/file.txt
33+
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
34+
00000000 FF FE 1D 01 6F 00 6F 00 0A 00 .þ..o.o...
35+
```
36+
The native tools on Linux try to render the BOM as actual content, which harms the output.
37+
If the BOM could be written when the platform expects it, interaction with native tools will be less problematic.
38+
39+
## Specification
40+
41+
A new global variable `$PSDefaultFileEncoding` shall be available which allows the user to define the encoding for their system.
42+
The allowed values for this variable shall be defined by
43+
44+
We should take this opportunity to rationalize our use of the `Encoding` parameter, and change the cmdlets which use Encoding as `string` or `System.Text.Encoding` type to use `Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding`.
45+
The following cmdlets use various types for the parameter `Encoding`
46+
47+
```PowerShell
48+
PS> Get-Command -type cmdlet | ?{$\_.parameters} |?{$\_.source -match "microsoft"}|ft name,{$\_.parameters['encoding'].ParameterType}
49+
50+
Name $_.parameters['encoding'].ParameterType
51+
---- ---------------------------------------
52+
Add-Content Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding
53+
Export-Clixml System.String
54+
Export-Csv System.String
55+
Export-PSSession System.String
56+
Get-Content Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding
57+
Import-Csv System.String
58+
Out-File System.String
59+
Select-String System.String
60+
Send-MailMessage System.Text.Encoding
61+
Set-Content Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding
62+
```
63+
64+
`Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding` shall be extended to include:
65+
66+
* UTF8NoBOM
67+
68+
which result in the membernames of this enum being:
69+
* Ascii
70+
* BigEndianUnicode
71+
* BigEndianUTF32
72+
* Byte
73+
* Default
74+
* Oem
75+
* String
76+
* Unicode
77+
* Unknown
78+
* UTF32
79+
* UTF7
80+
* UTF8
81+
* UTF8NoBOM
82+
83+
The default on Windows systems shall remain unchanged, non-Windows platforms shall be defaulted to `UTF8NoBOM` via the `$PSDefaultFileEncoding` variable.
84+
If the `$PSDefaultFileEncoding` is not set, `UTF8NoBOM` shall be the default for non-Windows systems, and the current behavior () on Windows.
85+
86+
### Examples
87+
88+
### Commentary
89+
90+
UTF8NoBOM is, of course, not an encoding but neither are a number of the other values for `FileSystemCmdletProviderEncoding`.
91+
However, it _is_ descriptive of what we are doing.

0 commit comments

Comments
 (0)